Professional Documents
Culture Documents
on
Course Director
Arjava Sharma, Director
ICAR-National Bureau of Animal Genetic Resources
Karnal-132001 (Haryana), INDIA
Course Coordinator
R.S. Kataria, Principal Scientist
ICAR-National Bureau of Animal Genetic Resources
Karnal-132001 (Haryana), INDIA
Course Joint-Coordinator
S.K. Niranjan, Senior Scientist
ICAR-National Bureau of Animal Genetic Resources
Karnal-132001 (Haryana), INDIA
Sponsored by
South Asian Association for Regional Cooperation
Organized by
ICAR-National Bureau of Animal Genetic Resources, Karnal, India
&
SAARC Agriculture Centre, Dhaka, Bangladesh
Training Faculty
MESSAGE
It is a pleasure to know that ICAR-National Bureau of Animal genetic Resources (NBAGR),
Karnal is organizing a SAARC Regional Training on Molecular Genetic Characterization
of Farm Animal Genetic Resources during 20-26 April, 2015.
NBAGR has made significant contribution by way of characterization and conservation
of indigenous livestock and poultry biodiversity. With experienced and competent faculty
in the area of genetic characterization, the Bureau is the right institution to organize
such training programme. I am sure the trainees from participating countries will gain
knowledge and be able to implement the FAO recommended genetic characterization of
their respective farm animal genetic resources. Exchange of ideas through discussions
during the training course will also go a long way in exploring new areas of international
collaborations and scientists will get benefitted from sharing of their knowledge and
experiences.
I wish the training programme a grand success.
[S Ayyappan]
MESSAGE
I am happy to learn that a SAARC Agriculture Centre (Dhaka) sponsored short course on
Molecular genetic characterization of farm animal genetic resources, is being organized
during April 20-26, 2015 at ICAR-National Bureau of Animal Genetic Resources, Karnal,
a leading institute engaged in genetic characterization of livestock breeds. As all SAARC
countries are predominantly agricultural based economies, the farm animal genetic
resources play very important role in ensuring livelihood as well as nutritional security
of farming communities. Genetic characterization of the livestock resources is necessary
for their conservation and sustainable utilization. NBAGR is a leading institute engaged
in genetic characterization of livestock breeds. I am sure the participants from SAARC
countries will get an opportunity to learn the latest techniques being employed for
molecular characterization of livestock breeds. I am happy to note that organizers have
given special emphasis on the hands on training, which the participants will be able to
utilize on their return for characterization and conservation of their native germplasm.
I wish to convey my gratitude to SAARC Agriculture Centre for financial support and
all co-operation in organizing this training programme and look forward to more such
future ventures.
I convey my best wishes to the organizers and participants of the training programme.
[K.M.L. Pathak]
Place : New Delhi
Dated : 8th April, 2015
MESSAGE
I am delighted to write a brief foreword for the compendium of the regional training
programme on Molecular Genetic Characterization of Farm Animal Genetic Resources
jointly organized by SAARC Agriculture Centre (SAC), Bangladesh and ICAR-National
Bureau of Animal Genetic Resources (NBAGR), Karnal.
SAARC Agriculture Centre (SAC), under the framework of SAARC has been working
for strengthening agriculture research and technology transfer through regional networks
among agricultural research/extension institutions and policy makers in the SAARC
member countries. ICAR-NBAGR is one of the reputed institutions in India undertaking
research and development activities to protect and conserve indigenous Farm Animal
Genetic Resources for sustainable utilization and livelihood security. This regional training
programme would provide hands-on and theoretical knowledge to the participants from
different SAARC member countries to strengthen research and extension activities in their
respective countries. I am extremely happy to see the contents covered in this training
programme. This compendium is a store of information related to characterization and
documentation of farm animal genetic resources. This book is unique and surely a work to
treasure for anyone who is interested in characterization of farm animal genetic resources.
I wish all the success for this regional training programme and for future endeavors.
FOREWORD
India is blessed with large genetic bio-diversity in its domestic animals which contributes significantly to
the needs of worlds second highest populated country. National Bureau of Animal Genetic Resources
is a premier research institute of Indian Council of Agricultural Research, with a broad mandate of
characterization and conservation of animal genetic resources of India. The institute, since its establishment
in 1984 has shown tremendous progress to achieve its mandate by working in close association with the
stakeholders of poultry and livestock genetic resources of the country. During last three decades, Bureau
has developed strength through its well-trained scientific faculty, most of them having exposure to working
in international laboratories abroad. Besides, it has very well equipped laboratories with the facilities of
automated Sanger DNA sequencer, microarray, real-time PCR, High performance computer system etc.
I am happy that South Asian Association for Regional Cooperation (SAARC), has chosen our Bureau for
imparting training to the participants of member countries in the field of molecular characterization of farm
animal genetic resources. The programme has been designed with emphasis on hands-on-training and also
interactions among participants and the faculty. Laboratory exercises will be based on FAO recommended
tools for the genetic characterization of farm animals. I am sure that the topics covered during the training
programme will enhance the knowledge of the participants and they will be able to apply these skills and
techniques after returning back to their respective country. The faculty and the coordinators have put a lot of
efforts in bring out this compendium of lectures and I am sure this document will serve as a useful resource
material to the participants.
I convey my sincere thanks to Director and Senior Program Officer (Livestock) of SAARC Agriculture
Centre, Dhaka (Bangladesh) for their guidance, cooperation and financial support extended for the training
programme. I acknowledge the support extended by Deputy Director General (AS), ADG (AP&B), ADG
(IR) and other officers of ICAR and wish this training programme a grand success.
[ARJAVA SHARMA]
Course Director
PREFACE
Sovereignty over its animal genetic resources (AnGR) for each country is endorsed under Convention on
Biological Diversity (CBD). During last few decades, there has been increasing trend in erosion of indigenous
livestock populations due to various reasons, worldwide. In an era of globalization, protection of its valuable
indigenous AnGR by describing and cataloguing, is the need of hour for its sustainable utilization. Therefore,
it is imperative to characterize the unique germplasm and their important traits at phenotypic as well as
genetic levels. Importantly, during recent decades, newer and more effective molecular tools for genetic
characterization have come up with an advantage to identify the populations having gene pool of unique
alleles, biomolecules. Such tools are also useful in identification of threat level to a particular breed by
defining its status, thus helping in designing conservation strategies.
This SAARC Regional Training on Molecular Genetic Characterization of Farm Animal Genetic Resources
organized by ICAR-NBAGR, Karnal, India and SAARC Agriculture Centre, Dhaka, Bangladesh, during 2026 April, 2015 is much needed effort to propagate and share valued information and knowledge among the
SAARC members. A total of eighteen participants from Bangladesh, Maldives, Nepal, Pakistan, Sri Lanka and
India are expected to participate in this training programme. We hope, back home, the training programme
will help in characterizing and documenting the indigenous AnGR for their management across member
countries. Further, it will help in prioritizing the populations for conservation as well as value addition.
Compendium of lectures and practicals prepared for the training programme will be a useful document in
updating the knowledge in the field of molecular genetic characterization of native livestock and poultry of
the participants.
We are highly thankful to the South Asian Association for Regional Cooperation, Secretariat, Kathmandu
(Nepal), for the financial support to the training programme. We are also grateful to Dr. Abul Kalam Azad,
Director and Dr. Md. Nure Alam Siddiky, Senior Program Officer (Livestock), SAARC Agriculture Centre,
Dhaka for all time cooperation and guidance while preparing for the training programme. We sincerely
thank Dr. Arjava Sharma, Course Director & Director, ICAR-NBAGR for taking keen interest and providing
guidance throughout the training programme. We gratefully acknowledge the assistance received from
members of various committees and all the scientific faculty of the Bureau, for their contribution to the
training programme.
Course-Coordinators
List of Participants
S.
Name of Participant
No.
Address
01
02
03
04
DNA Analyst
Maldives Police Service, Maldives
E-mail: int.affairs@police.gov.mv
Phone: +960 7967949
05
DNA Analyst
Maldives Police Service, Maldives
E-mail: int.affairs@police.gov.mv
Phone: +960 9996687
06
07
08
09
Department of Biosciences
COMSATS Institute of Information Technology
COMSATS Road, G. T Road,
Sahiwal, Punjab, Pakistan
E-mail: ahmadali@ciitsahiwal.edu.pk
Phone: +92 40 4305001
S.
Name of Participant
No.
Address
10
11
12
Mr. L. J. Ekanayake
Lecturer (Probationary)
13
14
15
16
17
18
CONTENTS
Title
Sl.
Page No
Theory Lectures
1.
1-6
2.
7-16
3.
17-21
4.
22-25
5.
26-29
6.
Cytogenetic and Molecular Methods for Screening of Major Genetic Defects in Livestock
- S K Niranjan and R S Kataria
30-35
7.
36-47
8.
48-54
9.
Mitochondrial DNA as a Marker for Genetic Diversity and Evolution in Farm AnGR
- Monika Sodhi, Amit Kishore and Manishi Mukesh
55-64
10.
Y- Chromosome Based Genetic Diversity in Farm Animal Genetic Resources with Special
Reference to Bovine
- Indrajit Ganguly, Monika Sodhi, Suchit Kumar, Sanjeev Singh and K N Raja
65-74
11.
75-82
12.
83-92
13.
93-105
14.
High Throughput Techniques for Transcriptome Analysis in Farm Animals with Special
Reference to Expression Microarrays
- Manishi Mukesh and Monika Sodhi
106-112
15.
113-117
16.
118-121
Practicals
1.
123
2.
124-130
3.
131-135
4.
136-143
5.
144-148
6.
149-154
7.
155-171
8.
172-179
9.
180-184
10.
185-192
11.
193
1
Indian Livestock Diversity and its Conservation
Arjava Sharma
ICAR- National Bureau of Animal Genetic Resources, Karnal (Haryana)
________________________________________________________________________________________
Livestock sector in developing countries accounts for almost 25-40 percent of overall agricultural
output, serving as source of food, such as milk, meat and eggs; shelter and protection based on fiber
and hides; energy in the form of animal draught and transport; fuel and fertilizer utilizing animal
manure; savings based on the cash value of animals; and as part of cultural and traditional values.
These are also the best insurance against the vagaries of nature like drought, famine and other
natural calamities. Estimates for 2012-13 indicate that this sector contributed 132.4 million tonnes of
milk, 69.7 billion eggs, 46.05 million kg wool, and 5.95 million tonnes of meat in India.According to
estimates of the Central Statistics Office (CSO) of India, the value of output from livestock sector at
current prices was about 4,59,051crore during 2011-12 which is about 24.8% of the value of output
from total agricultural and allied sector at current price and 25.6% at constant prices (2004-05). Milk
is the main output of livestock sector accounting around two third (67%) of the total output by
livestock sector. Meat and egg share 18.2% and 3.9% of the value of livestock.
Animal genetic resource scenario in India
India has traditionally been a mega biodiversity center and rearing of domesticated animals of
different species viz. cattle, buffalo, sheep, goat, pig, camel, horse, donkey, yak and mithun by
livestock keepers has been practiced since time immemorial. In poultry, apart from chicken,
domesticated strains of avis such as ducks, geese, quails, turkey, pheasants and partridges also exist
in India.
According to the Livestock Census (2012), the country had 512.6 million livestock population
comprising mainly of 190.9 m cattle, 108.7 m buffalo, 65 m sheep, 135.2 m goat and 10.3 m pig, 0.32
m donkey, 0.63 m horses and ponies, 0.19 m mules, 0.40 m camel, 0.077 m yak and 0.298 m mithun
besides 692.65 m chicken and 23.54 m ducks.The vast and varied population of animals that country
possesses is indigenous while a very small to sizably high proportion is represented by crossbreds
between exotic germplasm and native stock. In pigs and cattle, proportion of crossbreds is
relatively high whereas in sheep it is only about 5%. There are very few animals belonging to exotic
breeds in the country which are maintained mostly in organized farms. Large proportion of farm
animal population is of non-descript native animals, which so far have not been characterized
systematically.
Presently, there are 151 registered breeds of livestock and poultry in India which include 39
breeds of cattle, 13 of buffalo, 40 of sheep, 24 of goat, 6 of horse and ponies, 9 of camel, 3 of pig, 1 of
donkey and 16 of poultry in addition to many more not characterized and accredited so far.
Populations of other species like mules, yaks, mithuns, ducks, quails, etc. are yet to be classified in
to well descript breeds.
SWOT analysis of Indian AnGR
Strengths:
Mega livestock biodiversity with existence of almost all major domesticated farm animal
species.
Large number of breeds in each farm animal species adapted to the specific agro-climatic
conditions.
Diversified draft, milch and dual purpose cattle breeds. The draft breeds can significantly
contribute in agricultural operations to save fossil fuels.
Adaptability of germplasm to diverse changing climatic conditions of hot arid, humid tropical
and temperate climates and better resistance to parasites and diseases.
Capability to survive and produce on coarse and poor quality feed and fodder resources (low
input).
Availability of best breeds of buffaloes, a multipurpose farm animal species.
Large network of Research Institutes, State Agricultural/Animal Science Universities, State
Animal Husbandry Departments, Livestock Development Boards and NGOs engaged in
conservation and development of AnGR.
Large amount of ITK available with the livestock keepers for management of AnGR.
Seasonal migration of nomadic pastoralists help overcome adverse conditions especially
during winter and rainy seasons which enable them to sustain the breed population
maintained by them.
Weaknesses:
Lack of reliable breed wise livestock census data.
Low productivity of indigenous livestock.
Poor implementation of breeding policies.
High population density vis--vis inadequate feed and fodder resources, and pasture land
availability.
Lack of performance and pedigree recording at farmers level.
Inadequate number of superior/proven bulls/bucks/rams/semen for AI and natural mating.
Inadequate funding for conservation of AnGR.
Insufficient patronage to native breeds.
Lack of local institutions like breed societies or herders groups/association.
Poor marketing system for animals, animal products and by products.
Inadequate insurance coverage of livestock and poultry.
Lack of legal support for registration of livestock breeds and protection of farmers/ livestock
keepers rights.
Poor orientation for characterization and conservation of AnGR.
IPR issues not clearly defined in case of AnGR.
Lack of harmony and coordination among different agencies.
Opportunities:
Integral part of agriculture with synergistic relationship.
Substantial contribution to GDP.
Gainful employment, particularly to rural women and youth.
Excellent potential of indigenous AnGR for low cost conversion of poor quality roughages
into animal protein to cater the fast growing dietary demand of human population.
Large export potential for animal germplasm including semen/embryos adapted under
tropics, animal products and by products.
Presence of large genetic variability within breeds for bringing genetic improvement in traits
of economic and environment importance.
Availability of technologies like genomics, phenomics, nano-biotechnology, cloning, etc for
faster genetic improvement in AnGR.
Exploitation of animal draught power for better efficiency in farm operations.
Scope for allele mining for biotic and abiotic stresses in indigenous AnGR.
2
There is a need for immediate action for systematic conservation, genetic enhancement and
sustainable utilization of indigenous breeds.
World-wide discussion on conservation of genetic resources in animal production started much
later than in plant production. The need for conservation of animal genetic resources has been
accepted globally for sustainable development. Several international and national agencies have
taken up conservation of rare and dwindling breeds of domestic animals in various parts of the
world.
Fifties of
20th century
Sixties
1972
1974
1980
1985
1992
1993
1997
1999
2002
2003
2003
2004
2007
2009
2011
2011
2011
2012
2013
been established at NBAGR, Karnal which has collection of genomic DNA from 130
breeds/populations of livestock and poultry. It also has buffalo mammary gland EST library.
Strategies for future development
The onus for achieving goals of the national programme on conservation, sustainable management
and use of animal genetic resources lays with many players, such as farmers and livestock owners,
ministries, govt. departments, institutes, non-profit-making social and charitable NGOs, breeding
organisations, researchers, etc. Conservation and utilization of AnGR can be best achieved through
a joint approach by involving all stakeholders. These should understand and participate in all
activities relating to management of AnGR like implementation of improvement and conservation
programmes, animal identification, performance recording, marketing and branding of animal
products, development of pasture lands, fodder production, etc.
Breeding plans for long term conservation and continuous genetic improvement of indigenous
breeds need to be undertaken by establishing elite herds of the breed in its native habitat for
production of superior young males for breeding. Large number of government and nongovernment owned livestock farms exist in each state. Many of these farms maintaining indigenous
livestock breeds are not in good condition and are on the verge of closing down because of
inconsistent breeding plans; and inadequate availability of funds, manpower and other facilities.
These farms have the basic infrastructure which can be strengthened further for maintaining proper
herd size and implementing conservation, breed improvement, germplasm multiplication,
demonstration and utilization programmes. There should be effective networking of satellite
breeding flocks/herds of respective breeds with the established nuclear breeding units for
exchange of elite germplasm, multiplication, dissemination of germplasm, providing breeding
services in the breed tract and also supporting training and capacity building to livestock keepers.
2
Designing Field Strategies for Characterization of Farm Animal Genetic
Resources
P K Singh and Karuna Asija
ICAR- National Bureau of Animal Genetic Resources, Karnal (Haryana)
________________________________________________________________________________________
The term Animal Genetic Resources (AnGR) is referred to those animal species and the populations
within each species that are used, or may be used for the production of food and agriculture. The
population within each species can be classified as wild and feral populations, landraces and
primary populations, standardized breeds, selected lines, varieties, strains and any conserved
genetic material- all of which are currently categorized as breeds. Thousands of years of natural and
human selection, genetic drift, inbreeding and crossbreeding have contributed to todays AnGR
diversity and allowed the development of sustainable livestock production in various agroecological zones and production systems. The 40+ livestock species contributing to todays
agriculture and food production are shaped by a long process of domestication and development.
Genetically diversified livestock populations provide a greater range of options for meeting future
challenges in changing environment, disease threat, nutritional requirement, market and human
demands. Among the worlds 148 non-carnivorous species weighing more than 45 kg, fifteen could
be domesticated. Thirteen of these species are from Europe and Asia and 2 originate from South
America. Six species (Cattle, sheep, goat, pig, horses and donkeys)are found in all the continents
while remaining 9 (dromedaries, Bactrian camel, llamas, Alpacas, reindeer, water buffalo, yak, bali
cattle, mithun) are important in the limited areas of the world. The proportion is even lower in case
of birds (other than ornamental and recreational species) with only 10 species (chicken, domestic
ducks, Muscovy ducks, domestic geese, guinea fowl, ostriches, pigeons, quails and turkeys) could
currently be domesticated out of 10,000 avian species.
The breed has been defined by FAO as Either a sub-specific group of domestic livestock with
definable and identifiable external characteristics that enable it to be separated by visual appraisal from other
similarly defined groups within the same species, or a group for which geographical and/or cultural
separation from phenotypically separate groups has led to acceptance of its separate identity.
Characterization of livestock biodiversity
Understanding the diversity, distribution, basic characteristics, comparative performance and the
current status of animal genetic resources is essential for their efficient and sustainable use and
conservation. Complete national inventories, supported by periodic monitoring of trends and
associated risks are basic requirements for effective management of animal genetic resources.
Without such information some breed population and unique characteristics they contain may
decline significantly, or be lost, before their value is recognized and measures taken to conserve
them. Major difficulty in completing the inventory of farm animal breeds result from the fact that
livestock breeds generally dont corresponds to the notion of herd book breed and are not pure
breeds with identifiable characteristics but is the result of haphazard breeding programmes under
the field conditions.
Keeping this in view the National Bureau of Animal Genetic Resources, has initiated a structural
programme for phenotypic characterization and development of breed descriptors for animal
genetic resources.Technical programme of the phenotypic characterization and development of
breed descriptors is quite comprehensive and should envisage conducting of scientific surveys by
following modern sampling designs and suitable formats, descriptors and questionnaires for
collecting all possible relevant information for a particular breed inhabiting in a defined
7
zoo-geographical zone. Such surveys of breeds/animal types must ensure mandatory recording of
the following types of information:
(i)
Demographical and geographical distribution
(ii) The native environment
(iii) Enumeration of breeds in terms of age, sex in a population
(iv) Management practices and utility
(v)
Qualitative and quantitative characterisation of breeds in relation to morphological
traits, production potential and reproductive status etc
(vi) Qualitative and quantitative description of unique animals, elite producers and rare or
unusual characteristics in certain specimens
Survey plan
The survey would be conducted preferably in three districts of the breeding tract of breed under
consideration. Each district would have one supervisor and four enumerators. On the assumption
that the breeding tract of a breed is spread over adjoining/contiguous districts in one or more
states, stratified two stage sampling design may be adopted. Different zones within a district
would be identified, which may constitute the different strata. Villages within the stratum may
constitute the first unit and houses within the village, the second unit. Totally, three districts;
within each district four strata and five villages within each strata would be randomly selected.
Demographical and geographical distribution of a breed
In the first quarter, the supervisor and enumerators would be engaged in determining
demographical and geographical distribution of the breeds. From each stratum, five villages may be
randomly selected for complete enumeration for the purpose of deriving demographic distribution
of the breed. This study would cover the following information:
a. Age wise and sex wise distribution.
b. Group enumeration for calves/ kids/ lambs, young stock and adults (milking females, dry
females, working males, stud bulls etc.).
c. Geographical distribution of the breed.
Complete information is obtained by stratified survey on data regarding group-wise, sex- wise
and breed-wise total population in the breeding tracts. During survey if individual animals with
exceptionally high producing capacity or with rare genetic variation are located, they would be
brought under organisational support or purchased for further studies.
Information would be recorded on 3000 animals covering three districts of the breeding tract. In
each district, 200-250 animals under each of the group would be studied for aspects given against
the group (Table 1).
Thus, there would be 1000 animals in a district which would be randomly selected from 4
randomly selected zones. National Bureau of Animal Genetic Resources has formulated five to six
questionnaires separately for different livestock and poultry species for collecting the required
information for phenotypic characterization of the breed, which may be used during the survey for
collection of information on different aspects of phenotypic characterization.
Native environment
Some important metrological parameters are to be recorded for the breeding tract of a breed. These
include temperature, humidity, rainfall, in terms of maximum and minimum along with their
respective months and average of last 10 years. Annual duration of flood and draught along with
their months, maximum and minimum elevation of land, sub soil water depth during summer and
rainy season are also recorded. Other information like soil description, forest area (in sq. kms.), wet
cultivated area, dry cultivated area, uncultivated area, main cultivated cereals, main cultivated
pulses, other crops are recorded. Area of the pasture available for the grazing of animals along
8
with classification of the pastures as (mountaineous/ sub moutaineous/ plains- irrigated/ rain
fed/ sandy) are also to be recorded.
Table 1:Group classification and study coverage for phenotypic characterization
Species
Cattle and
Buffaloes
Group
Calves (up to 1year)
Stock ( 1 3 years)
Milking Females
Working males
Breeding bulls
Sheep
Goat
Pig
Poultry
Stud Bucks
Piglets (0-2 months)
Young Stock
(2-8 months)
Sows
Boars
Cockrels
(up to 5 months)
Pullets
(up to 5 months)
Cock
(above 5 months)
Hen
(above 5 months)
Study coverage
Physical traits, feeding, management practices
Physical,growth traits, feeding andmanagement practices traits
Physical traits, feeding and management practices, reproduction,
production and growth traits
Physical traits and feeding and management practices.
Physical and reproductive traits and feeding and management
practices
Physical traits, feeding, management practices and growth traits
Physical traits, feeding, management practices and growth traits
Physical, productive, reproductive traits, feeding and management
practices
Physical, reproductive traits and feeding, management practices
Physical traits, feeding, management practices and growth traits.
Physical traits, feeding, management practicesand growth traits
Physical, reproductive traits, feeding, management practices and
growth traits
Physical, productive, reproductive traits, feeding and management
practices
Physical, reproductive traits and feeding, management practices
Physical traits, feeding, management practices and growth traits.
Physical traits, feeding, management practices, growth traits
Physical, productive, reproductive traits, feeding and management
practices
Physical, reproductive traits, feeding and management practices
Physical traits, feeding, management practices and growth traits
Physical traits, feeding, management practices and growth traits
Physical, reproductive traits, feeding, management practices and
growth traits
Physical traits, feeding, management practices, utility, egg
production and growth traits
Enumeration of breed
Population statistics is important for classification of breeds as per their risk status classes.
Therefore data is required for the estimation of population status of breed under consideration.
FAO (2013) has laid out criteria for classification of breeds according to their risk status. For high
reproductive capacity species like Pig, dog, rabbit and all avian species, the criteria would be as
follows:
Risk
class
status
Total
number
of
breeding females mated
to males of same breed
Overall population
Total number
of
breeding
males
Rate
inbreeding
generation
of
per
Extinct
Cryopreserved
only
Critical
No living male or female but sufficient cryo-preserved genetic material for reconstitution of
breed.
Criticalmaintained
Endangered
Endangeredmaintained
Vulnerable
100
<20% CB
80andincreasing trend OR
5
3% or higher
120& decreasing or static
trend
As for Critical but for which active conservation programmes are in place or populations are
maintained by commercial companies or research institutions.
100 - 1000
<20% CB
80-800& increasing
trend OR >5 and 20
1-3%
120-1200& decreasing or static
trend
As for endangered but for which active conservation programmes are in place or
populations are maintained by commercial companies or research institutions.
1000 and 2000
<20% CB
Not at Risk
Unknown
For low reproductive capacity species e.g. cattle, buffalo, sheep, goat, equine, camel, yak, mithun
etc.; all the figures would be three times than high reproductive species as mentioned in the above
table. If any breed is falling short on any of the above criteria, it will be kept in respective
class.Therefore, it is important to get number of breeding males, breeding females, population
trend, percent of breeding females bred to males of the same breed and overall population size of
the breed so as to classify the breeds as per their risk status. During the survey in the selected
villages, these figures may be collected and estimated population may be obtained by extrapolation
these figures on census data. Along with the population data average herd size and age wise/ sex
wise classification within the herd is also required.
General information about livestock keepers
Communities responsible for rearing of breed and description of communities (farmers/nomads/
isolated tribal/ any other) who are the keepers of the breed in question, is to be recorded along with
some socio-economic parameters on them. Information are to be collected about the livestock
keepers only once during the survey. These information include agricultural land holding
(Irrigated/ non-irrigated), feed and fodder grown during different seasons, profession, total annual
income in rupees, income generated through animal husbandry, number of family members,
number of literate members, number of members engaged in animal husbandry practices
(man/women/children), mode of sale and purchase of animals and animal products and utility of
the livestock/ poultry reared by him.
Management Practices
Housing and Hygiene: Duration of housing of the animals like during day/night/ both day and
night/ none is to be recorded. Nature of animal houses like open/closed; kutcha/ pucca; separate/
part of residence; kutcha floor/ pucca floor; full walled/ half walled; well ventilated or not;
10
sanitary condition (good or poor) and drainage system of house are to be recorded. Hygiene of
feeding/water trough and cleaning of milk utensils as well as animals etc may also be recorded.
Wallowing practices, if any, are also to be recorded indicating place and duration of the wallowing.
Feeding:Feeding practices for calves would be recorded every month in cattle and buffaloes and
every fortnight in sheep and goats. Feeding of mothers would be done once in three months.
However, for feeding and management practices for rest of groups, recording would be done once
in every 3 months. Grazing practices along with the distance and time covered is to be recorded in
morning as well as evening. Stall feeding is to be recorded in terms of individual or group feeding
and quantum of the feed offered as follows:
Green fodder
Dry fodder
Concentrate
Minerals
Name
Morning
Qty (kg)
Name
Noon
Qty (kg)
Name
Evening
Qty (Kg)
In pigs, stall feeding/semi stall feeding/ scavenging alone/ scavenging with supplementation of
kitchen waste and vegetable waste is to be recorded. While recording the stall feeding the
supplements name like cake/ concentrate/ mineral mixer/ green fodder is also to be recorded. The
feeding practices of piglet are also needed. In poultry, the feeding through scavenging/ scavenging
with supplement feed/ free ranging/ free ranging with supplement feeding/ feeding with local
feeds/ feeding with branded concentrates are the major feeding practices which are to be recorded
in the management of breed under study.
Water: Adequate/inadequate along with the quantity and water source
Hay/ Silage making practices for preserving the fodder is also to be recorded.
Breeding: Natural/ artificial insemination along with information about the breeding males is used.
Treatment and prophylactic measures of the diseases:The type of diseases along with the treatment given
to the animals (herbal/ allopathic or local) is to be recorded. The prophylactic measures taken in
terms of de-worming and vaccination etc. are also to be recorded. During the visits in each season,
reproductive and disease management aspects would be recorded by observations as well
asinteraction with the farmer. Breeding bulls might not be available in sufficient number and
therefore studies would be limited to whatever available in the area of coverage.Prenatal and post
natal mortality at different age groups is to be recorded in all the species.
11
Horns
Ears
Head
Body
Hump
Dewlap
Naval flap
Penis
sheath flap
Basic
temperame
nt
Tail:
Udder&
Teat
Beard
Wattles
others
cattle/buffalo
Colour of coat,
skin,
muzzle,
eyelids, tail and
hoofs
Colour
Size
Shape(Straight/
curved)
Orientation
Orientation
(horizontal/
drooping)
length
Forehead
(Convex/conca
ve/straight)
Sheep
% surface area in
coat colour with
distinctive colour
markings
Colour
Size
(small<15/mediu
m 15-25/large> 25
cm)
Shape (Straight/
curved)
Orientation
Orientation(erect/
pendulous/
horizontal)
Goat
% surface area in
coat colour with
distinctive colour
markings
Pig
% surface area in
coat colour with
distinctive colour
markings
Colour
-Size (small <15/ -medium 15-25/
large > 25 cm)
Shape (Straight/
curved)
Orientation
Orientation(erect
/
pendulous/
horizontal)
--
Forehead(straight/
convex/slightly
convex)
Forehead(straigh
t/
convex/
slightly convex)
L/ M/S
L/M/S
L/M/S
L/M/S
L/M/S
------
------
Snout
profile
(straight/
convex/ slightly
convex/ concave)
------
(docile/
moderate
/tractable/wild
)
Length(L/M/S)
Colour
of
switch
-Shape (bowl/
round/ trough/
pendulous)
Fore-udder
size(L/M/S)
Rear-udder size
(L/M/S)
Teat
shape
(cylindrical/
funnel/ pear)
Teat
tip
(pointed/
round/ flap)
Milk
vein
(L/M/S)
----
(docile/ moderate
/tractable/wild)
(docile/
moderate
/tractable/wild)
(docile/ moderate
/tractable/wild)
Length (L/M/S)
Type
---
---
Shape
--
---
---
--
--
--
--
--
--
--
-Orientation(erect/
pendulous/
horizontal)
Poultry
Plumage
colour:
White/
Black/
Blue/
Red/
Brown/
Gold/
Others (specify)
Pattern:
Solid/
Dull/
Stripped/
Patchy/ Spotted/
Barred/
Others
(specify)
Skin
colour:
White/
Yellow/
Blue/ Black/ Other
Shank
colour:
White/
Yellow/
Black/ Blue/ Green
Earlobe
colour:
White/
Red/
Black/ White &
Red/
Others
(specify)
Comb
colour:
Black/
Red/
Others (Specify)
Eye colour: Grey/
Black/
Brown/
others (specify)
Comb
type:
Single/ Pea/ Rose/
Walnut/ cushion/
strawberry/
Duplex/v-shaped/
double
Number of teats
and teat position
--
----
--
--
(present/ absent)
(present/ absent)
Coat Type (hair/
wool),
Fineness
(fibre
diameter)
(fine
<
21
(present/ absent)
(present/absent)
Coat Type (hair/
cashmere/pashm
ina/mohair);
Fineness (fibre
12
--Coata.
Bristle
(long/medium/sh
ort);
dwarfism,
feathered
legs,
naked neck, silky
frizzle,
multiple
Top line
-L- Large, M- Medium, S- small
/medium
22-26
coarse >26 micrometers),
Length
(12mo
fleece)
(short
<5/medium
5-10/long>10 cm),
Lustre
(lustrous/non-lust
rous),
Crimp/curl(straig
ht/low crimp = <
4 / high crimp = >
4 cm.),
Wool
cover
(covered/bare)- a.
Head b. Face c.
Belly d. Legs are to
be recorded
--
diameter);
cashmere/
pashmina/down;
mohair
b.
Fineness
(bristle diameter)
spurs, etc.
--
Straight/concave
--
Morphometric traits: Chest girth, body length and height at withers are important morphometric
parameters in mammalian livestock at different age and sex classes. Physical measurements for the
mothers would be recorded during the first/second of 8-10 month of lactation in cattle and
buffaloes and recorded during the first/second and 6th month of lactation in sheep and goats. For
calves/kids/lambs, measurements would be taken for every month up to 6 months and thereafter
every six month. For stock (13 years) body measurements would be recorded once in every 6
months and for others only once in cattle and buffaloes. For young stock, body measurements
would be recorded once in every 3 months in sheep and goats, every two months in pigs and for
others only once. In pigs, neck girth along with above mentioned morphometric traits is also
recorded.
Production Performance
Body Weight:
Characters
Body
Weight (kg)
cattle/buffalo
Birth weight,
Pre-weaning
weight, 12m
weight, 24m
weight, weight
at first mating
and weight at
first calving
Sheep
Birth weight, Preweaning weight, 3m
weight, 6m weight,
12m weight, weight
at first shearing,
weight
at
first
lambing and body
weight at marketing
with age
Goat
Birth weight,
Pre-weaning
weight,
3m
weight,
6m
weight, 12m
weight,
weight
at
slaughter,
weight at first
kidding
Pig
Birth weight,
Pre-weaning
weight,
3m
weight,
6m
weight,
12m
weight, body
weight
at
slaughter and
at
first
furrowing
Poultry
Hatching,
8,12
week of age and
at slaughter, Body
weight gain/ kg
feed (weeks) for
the periods 08week, 8-12 week
and 8-20 weeks
are also recorded
with the feed
conversion
efficiency
Dairy performance: Dairy performance of cattle, buffalo, goat and sheep are recorded for the first
four order of lactations. Milk recording would be done once in a month from the first month of
lactation to the end in cattle and buffaloes; at fortnightly intervals for full lactation in goat; and on
7th and 50th day of lactation in sheep. Milk fat and SNF would be estimated every day from
morning milk only. The trait considered for cattle and buffalo include daily milk yield, peak milk
yield, days to reach peak yield, lactation length, lactation milk yield, fat%, SNF%, milking rate
13
(litres/min.), productive life span (month), dry period, feed conversion for milk, percentage of
animals in different lactations. In case of sheep and goat the traits peak milk yield, days to reach
peak yield, milking rate (litres/min.), and percentage of animals in different lactations are not
required. However, in case of sheep the traits like dry period, feed conversion for milk are also not
required. Abnormality of teats may also be recorded while recording the milk production
performance.
Other Production Traits:
Goat
Mohair production: Sampling site (shoulder/mid-side/thigh), number of sheerings per year, average
Greasy fleece weight, clean fleece weight, staple length, fibre diameter (true
mohair/heterotypes/kemps), fleece colour and feed conversion for wool are to be recorded in
males and females for the first and later sheerings during the year.
Cashmere/Pashmina production: Age at combing/collection, weight of fibre per combing /collection,
clean yield%, fibre length, fibre diameter
Hair production: Average weight of clipping (kg), hair length and hair diameter are to be recorded
along with age at clipping to be presented in the tabular form.
Skin production: Average skin weight, skin length, skin width is to be recorded in kids and adult.
Sheep
Wool production: Information on sampling site (shoulder/mid-side/thigh), number of sheerings per
year and processing type (carpet/crossbred/merino wool) are recorded. Average greasy fleece
weight, clean fleece weight, staple length, fibre diameter (true mohair/heterotypes/kemps), fleece
colour and feed conversion for wool are to be recorded in males and females for the first and later
sheerings during the year.
Pelt production: Pelt weight, pelt length, pelt width is to be recorded in foetus and lamb.
Pig
Bristle production: It is recorded in both the sexes in terms of number of cuttings per year and
average weight, length, diameter and colour of bristle in each cutting.
Carcass characters like carcass weight (kg), age at slaughter (d), weight (Hot/Cold), length,
dressing % (hot/cold), skin %, meat: bone ratio, fat thickness, lean %, bone %, fat % are recorded for
goat, sheep and pig.
Poultry
Egg production characteristics in terms of age at first egg, egg numbers, age at 50% production, age
at culling.
Egg quality traits like albumin index, yolk index, haugh unit, shell weight, albumin weight, yolk
weight, specific gravity, egg weight (g) (40/50/60/70), shell colour (white/brown/cream or
tinted/other), shell strength (shell thickness and breeding strength), albumin quality, egg inclusion
bodies (Blood spots/meat spots) are to be recorded.
Qualitative and quantitative descriptions of individual animals other than the above which are
given in the breed descriptor would be covered once.During survey if individual animals with
exceptionally high producing capacity or with rare genetic variation are located, they would be
brought under organisational support or purchased for further studies.
Reproduction Performance:
Males:
(i) Age at first ejaculation (days) in case of cattle and buffalo only, (ii) Age at first mating
(days) (iii) If breed is under artificial insemination, semen quality parameters should also be
recorded.
14
Females: Age at first oestrus, oestrous cycle duration (days), oestrus duration (hrs), age at first
mating,
age
at
first
calving/kidding/lambing/furrowing
etc.,
calving/kidding/
lambing/furrowing interval, gestation length and range, twinning percentage all are recorded in all
mammalian species. In cattle and buffalo it is desirable to record interval from calving to first
conception, conception rate, number of services per conception, service period, and range, dystocia
percentage, Placental retention (%), abortions (%), still births (%), post gestational mortality (%). In
case of sheep, goat and pig seasonality, litter size, lifetime number of kidding/lambing/furrowing
are to be recorded. In pigs it would be desirable to record litter weight, litter size at weaning. The
infectious and non-infectious abnormalities, abortions and still birth and pre-weaning and adult
mortality should also be recorded.
Poultry: in terms of age at first egg, Broodiness (usual/ sometimes/rare/other), fertility (%) and
hatchability (%) on fertile eggs basis and on total eggs basis.
Mortality (%) in poultry: a) 0-1 weeks b) 1-8 weeks c) 8-20 weeks d) n-n weeks
Draft ability- type of work:Parameters like purpose of draft(ploughing, threshing, power etc.),
capacity for work (Hard/medium/light) and average duration of work/day(hrs) are recorded to
know the draft ability of the animal.
Physiology and diseases: Rectal temperature, pulse rate, respiration rate are to be recorded in
males and females. Drought tolerance and heat tolerance are graded (1 to 5) from lowest to highest.
Common diseases and parasites, measures against diseases including prophylactic measures
against diseases along with the resistance to infectious diseases and parasites in the breed are to be
recorded.
Documentation of Livestock biodiversity
For documentation of indigenous livestock breeds, the NBAGR, State Animal Husbandry
Departments and many other Agricultural/ Veterinary Universities are publishing breed
monographs and upto now approximately 100 monographs have been published by different
organizations. NBAGR is also publishing the breed descriptors of different livestock and poultry
breeds in the Indian Journal of Animal Science as special feature. NBAGR has also released the
breed charts/ calendars for cattle, buffalo, sheep, goat and chicken species in which one male and
one female animal of all the breeds of that species have been depicted. Therefore, it is important to
document the information collected for the phenotypic characterization of the breed in terms of
monograph, leaflet or video documentary. By doing so we may keep the information in public
domain and avoid any kind of biopiracy.
Brief description of livestock breeds- Breed descriptors
The breed descriptor of a breed includes the minimum information in a summarized form so as to
describe the breed in all respects. The breed descriptors generally have five major parts i.e. General
description, Physical characters, Performance traits, Physiology of the animal and Diseases. Under
general description the name of the breed, its origin, communities rearing thebreed with their socio
economic status, different kind of management practices including feeding, grazing, housing,
breeding and health management, the native tract of its distribution along with the native
environment and the population status of the breed is described. Physical characters include the
qualitative and quantitative physical traits and biometric observations of the animals belonging to
different ages and sex. The performance of the animals is generally recorded in terms of growth,
production, reproduction and draft ability of the animals besides these factors, the uniqueness of
the animals and adaptative traits should also be mentioned in the breed descriptors. It will also be
appropriate, if the photographs of a typical mature breeding male and female is given along with
the breed descriptor.
15
Epilogue
If any population of any livestock species available in the particular geographical area, fulfilling the
status of a breed and kept under uniform management and utility, it should be studied,
characterize and registered as the details given in this paper. By doing so we will be able to
complete the inventories of our animal genetic resources and also reducing the proportion of large
non-descript population of different livestock and poultry species. After recognition of a population
as a breed suitable breeding and developmental strategies may be framed for the genetic as well as
overall development of the breed, thereby improving the livelihood of livestock keepers. It is
advised to develop sets of questionnaires for every species along with the typical format of breed
descriptors before taking up the job of phenotypic characterization.
Reference
FAO,2013. In vivo conservation of Animal Genetic Resources.FAO Animal Production and Health Guidelines
No. 14. Rome.
16
________________________________________________________________________________________
India possesses huge as well as diverse livestock population distributed over a large range of
geographical, ecological and climatic regions, and is globally acknowledged as one of the largest
livestock diversity center. Farm animal population comprises of 512 million of livestock and 729
million of poultry (Livestock Census, 2012). Only 20 percent of this population belongs to well
defined 151 registered indigenous breeds in the country and remaining 80 percent belong to many
animal populations that are not assigned to any recognized breed.The populations which have not
been characterized and accredited so far, are commonly referred to as non-descript or
traditional. Even though parts of these non-descript populations are known to be multiple
crosses of recognized breeds, some animals may belong to homogenous groups distinguishable
from other populations on the basis of identifiable and stable phenotypic characteristics that
warrant their being distinguished as separate breeds.
The advent of new era of national sovereignty over genetic resources under Convention on
Biological Diversity (CBD) requires a new approach to describe and catalogue livestock and poultry
breeds. The objective of sustainable use of genetic resources as one of the main goals of the CBD as
well as sustainable development could be achieved only through ensuring wide access to animal
genetic resources, for farmers, herders, breeders and researchers. To this end frameworks for
access, and for equitable sharing the benefits derived from genetic resources, need to be put in place
The global scenario of World Trade Organisation (WTO) and Intellectual Property Rights needs
protecting the local animal genetic diversity and provide recognition to the developers of new
improved animal breeds. This in turn demands an authentic national documentation system of
valuable sovereign genetic resources with well defined characteristics.
Registration is nothing but a documentation of the knowledge, skills and techniques (KST), and
biological resources of local communities. The registration process is a critical pathway for public
description and documentation of genetic materials. Of utmost importance, once registered, these
genetic materials are incorporated into the public domain.
Recognizing the need for an authentic national documentation system of valuable sovereign
genetic resource with known characteristics, Indian Council of Agricultural Research (ICAR)
initiated a mechanism for Registration of Animal Germplasm at National Bureau of Animal
Genetic Resources (NBAGR), Karnal. This would provide protection to the valuable animal genetic
diversity and facilitate its access for genetic improvement of animal breeds. This mechanism is the
sole recognised process for registration of Animal Genetic Resources material at national level.
Guidelines for registration
Registration of new breeds
The registration of Indian livestock and poultry genetic resources revolves around the concept of a
breed. Distinct populations within species are usually referred to as breeds. Cultural and ecological
aspects of livestock keeping also serve as a means of identifying populations that merit being
treated as separate breeds.It is difficult to exactly define a breed. The broad definition of the term
breed used by FAO is a reflection of the difficulties involved in establishing a strict definition of
the term. According to this definition, the breeds are either
17
(a) a sub-specific group of domestic livestock with definable and identifiable external
characteristics that enable it to be separated by visual appraisal from other similarlydefined groups within same species; or
(b) a group for which geographical and/or cultural separation from phenotypically similar
groups has led to acceptance of its separate identity.
Eligibility Criteria for Registration:
1. Populations of domesticated animals, which are unique, stable and uniform, and has potential
attributes of academic, scientific or commercial value can be registered as breeds.
2. Any population having at least 1000 animals will be considered for registration as a breed.
These animals may be maintained by the applicant/ breed society/ NGO/ Govt. Agency/
farmers in field conditions.
3. All claims concerning the material submitted for registration should accompany scientific
evidence for uniqueness, reproducibility and value in the form ofi. Publication in standard peer reviewed journal (a copy of reprint to be submitted).
AND/ OR
ii. Evaluation data for at least three years under research programmes like All India Coordinated Research Project (AICRP), Network Project, Adhoc Schemes, etc. supported with
relevant extracts of the documents or verification by concerned Director/Project Director
(PD)/Project Coordinator (PC)
AND/ OR
iii. Publication of information on potential value of germplasm in institute annual report or any other
such reports
AND/ OR
Recommendation of the State Animal Husbandry Department/Livestock Development
Board regarding the novelty and uniqueness of the breed claimed.
Who can Apply: Application can be submitted by any citizen of India / breed society registered as
per constitution of India / NGO / Govt. agency.
Validity of Registration: The period for validity of registration shall be 25 years.
Notification of Registered Materials: All breeds approved for registration would be officially notified
to the applicants along with Registration Number. A certificate will also be issued to this effect to
the applicant. Official Notification will be published along with brief description of not less than
one page in the subsequent issue of
i. Indian Journal of Animal Sciences - Published by I.C.A.R., New Delhi 110 012
ii. An abstract form of the registered breed will also be published in following publications:
a.
NBAGR Newsletter, Published by the Director, NBAGR, Karnal-132 001
b.
ICAR News - Published by the Publication and Information Division, Krishi
Anusandhan Bhavan, ICAR, New Delhi 110 012
c.
NBAGR, ICAR Website
De-notification: De-notification shall be done by the Registration Committee in case of false claim(s)
or disputed IPR claim. Appeal for counter claim, if any, should reach the Registration Committee
within a period of three months or the publication of Notification in Indian Journal of Animal
Sciences - Published by the I.C.A.R.
Procedure for Submission of Proposal for Breed Registration:
1. Submission of Application and Material: All applications for registration of proposed breeds
should be submitted to the following address:
18
The Director, National Bureau of Animal Genetic Resources, P.O.Box. 129, Karnal 132001,
Haryana. Phone: 0184-226 7918, Fax: 0184-226 7654, Email: director.nbagr@icar.gov.in
2. The applicant should submit 3 copies of the application along with relevant documents,
literature, no matter how small (even one page), for the proper evaluation of the breed and
softcopy of the application, descriptor and photographs (original).
3. The application must be signed by the applicant and countersigned by Director, Department of
Animal Husbandry of the concerned state or his representative with rubber seal.
4. The application must be accompanied by complete description of the breed using standard
descriptors (as per concerned species).
5. Submit a detailed history of the breed.
6. List the difference, distinction and details that are specific for that breed in comparison to other
breeds in the vicinity or elsewhere.
7. Submit representative photographs of the breed (male, female, young ones and herd /flock).
8. Submit a list of the registered animals of the breed that are conforming the breed standards laid
out by the applicant or his organization.
9. The breed must have completed a minimum of 10 generations.
10. Submit letters from at least three different breeders/owners of the breed, explaining:
Why they believe it should become a recognized breed?
How long they have been breeding the breed?
Spell out the reasons for reorganization of the breed as a separate identity.
What has been done to establish this breed- breeding strategies, parental stock etc?
What are the suggestions to further improve this breed in a long term perspective?
What makes this breed clearly different and distinctive from all other breeds?
Registration of Varieties/Strains/Lines of chicken
2.
3.
4.
5.
6.
7.
8.
9.
The Director, National Bureau of Animal Genetic Resources, P.O.Box. 129, Karnal
132001,
Haryana.
Phone:
0184-226
7918,
Fax:
0184-226
7654,
Email:
director.nbagr@icar.gov.in
The applicant should submit 3 copies of the application along with relevant documents,
literature, no matter how small (even one page), for the proper evaluation of the
Variety/Strain/Line and softcopy of the application, descriptor and photographs (original).
The application must be signed by the applicant(s) and countersigned by the Head of the
Organisation with rubber seal.
The application must be accompanied by complete description of the Variety/Strain/Line using
prescribed descriptors.
Submit a detailed history of the development of the Variety/Strain/Line.
List the distinctiveness characteristics of the Variety/Strain/Line in comparison to other
Varieties/Strains/Lines available in the country.
Submit representative photographs of the Variety/Strain/Line (male, female, young ones and
flock).
The Variety/Strain/Line must have completed a minimum of 8 generations.
The Applicant must certify that:
The Variety/Strain/Line is distinct from other Lines/Strains whose existence is a matter of
common knowledge at the time of filing of application
It is sufficiently uniform and stable
19
20
Breed
Home Tract
Accession number
INDIA_CATTLE_1526_MOTU _03031
CATTLE
01
Motu
02
Ghumusari
Odisha, Chhattisgarh
and Andhra Pradesh
Odisha
03
Binjharpuri
Odisha
INDIA_CATTLE_1500_BINJHARPURI _03033
04
Khariar
Odisha
INDIA_CATTLE_1500_KHARIAR _03034
05
Pulikulam
Tamilnadu
INDIA_CATTLE_1800_PULIKULAM_03035
06
Kosali
Chhattisgarh
INDIA_CATTLE_2600_KOSALI _03036
07
Malnad Gidda
Karnataka
INDIA_CATTLE_0800_MALNADGIDDA_03037
08
Belahi
Haryana and
Chandigarh
Uttar Pradesh and Bihar
INDIA_CATTLE_0532_BELAHI _03038
09
Gangatiri
BUFFALO
INDIA_CATTLE_1500_GHUMUSARI _03032
INDIA_CATTLE_2003_GANGATIRI_03039
01
Banni
Gujarat
INDIA_BUFFALO_0400_BANNI_01011
02
Chilika
Odisha
INDIA_BUFFALO_1500_CHILIKA_01012
03
Kalahandi
GOAT
Odisha
INDIA_BUFFALO_1500_KALAHANDI_01013
01
Konkan Kanyal
Maharashtra
02
Berari
Maharashtra
03
Pantja
Katchaikatty
Black
Tamil Nadu
INDIA_SHEEP_1800_KATCHAIKATTYBLACK_
14040
01
PIG
Kharai
Gujarat
INDIA_CAMEL__0400_CAMEL_02009
01
Ghoongroo
West Bengal
INDIA_PIG_2100_GHOONGROO_09001
02
Niang Megha
Meghalaya
INDIA_PIG_1300_NIANGMEGHA_09002
03
Agonda Goan
Donkey
Goa
01
Spiti
Chicken
Himachal Pradesh
INDIA_DONKEY__0600_SPITI_05001
01
Rajasthan
Sheep
01
Camel
S.N.
Mewari
Name
Developed by
Lines Registered
Accession number
Chicken-Synthetic
01
PD1
(Vanaraja
Male Line)
ICAR-Directorate
of
Poultry
Research,
Hyderabad
INDIA_CHICKEN_001_PD1_13001
21
4
Conservation Strategy through Network Programme
M S Tantia and Rekha Sharma
ICAR- National Bureau of Animal Genetic Resources, Karnal (Haryana)
________________________________________________________________________________________
The realization that animal genetic resourcesare at risk of being lost has stimulated
nationallivestock conservation efforts. The need for conservation isbased on economic, cultural, and
ecological values; unique biologicalcharacteristics; shifts in market demand; and research needs. A
first step in assessing geneticconservation needs is development of baseline information
onpopulation and genetic relationships. It is clear that livestock breeds are not biological taxa but
rather represent the outcome of social processes. They are therefore unlikely to survive outside the
social contexts and production systems that formed them. However, these losses weaken the
potential of breeding programs that could improve hardiness of livestock. Traditional pastoralists
have often tended to foster biodiversity, in both plants and animals. Many pastoral societies have
developed elaborate systems that result in the preservation of genetic resources. Pastoralists have
deliberately developed livestock to meet different needs and conditions.
Network project on Animal Genetic Resources is fully funded ICAR project being coordinated
by NBAGR wherein different organizations working in the field of livestock and poultry research
and development are being loomed in a network approach. Various agencies which are having
infrastructure as well as manpower like state animal husbandry departments, state
veterinary/agricultural universities, livestock development boards, animal science institutes,
NGOs, etc. are the partners in this project. In netwokmode, the specific agency which is located in
the breeding/home tract of the targeted breed/population is approached with a well defined
technical programme to undertake conservation activities for the designated breed/population. The
animal keepers of the targeted breed/population are also motivated and involved in the
conservation activities.
Commercial breeds of livestock possess greater genetic variability than most crop varieties do.
This diversity allows intensification of selection within breeds to be a fruitful approach for
improving livestock productivity. However, if continued emphasis on breed replacement and
increasing selection intensity (e.g. for greater productivity) take place at the expense of maintenance
of genetic diversity, including the advantages of disease resistance and environmental adaptation,
there may be significant long-term costs. As an example, Holstein cattle have become the preeminent dairy breed world-wide and have enjoyed sustained improvements in milk production
potential, but only at the cost of declining genetic diversity within the breed.
The indigenous breeds are considered hardy and well adapted tothe environment. The hardiness
of the indigenous breeds is believed to haveresulted from natural selection under the management
practices of the Native breeders/herders and from the adverse feed conditions. Indigenous breeds
show a high level of fertility and reproduction. In situ management of animal genetic resources can
only be successfullyaccomplished through breeder actions.
Factors affecting conservation
Indigenous breeds are populations that are the product of breeding or selection carried out by
farmers, either deliberately or not, continuously over many generations. They tend to contain high
levels of genetic diversity and to be adapted to specific environments, being especially important in
environmentally marginal areas. Developing countries typically rely on landraces for much of their
production. They are important genetic resources, representing an insurance policy against
uncertain markets and environmental conditions for food and agriculture in the future.
22
The characteristics of the indigenous breeds (low growthrate, lower level of production) imply
thatthe potential for altering gross income is lower than more prevalentbreeds under current
marketing conditions. However, adaptationto the environment and reproductive performance may
alter thissituation. Short-term ownership negatively affects breedconservation by creating an
unstable situation for maintainingor increasing animal numbers. However, it is doubtful that
anyeffective selection will be implemented; therefore, the populationmay behave as if it is a
randomly mated population, with minimalloss of alleles due to selection.With the relatively small
total population size and small individualflock sizes, genetic drift is an important factor
affectingwithin-breed genetic diversity. With the small flock/herd sizes,one should expect random
gene frequency changes that are cumulativeover generations.
Given the above conditions, there are two areas in which tobase conservation efforts. These
consist of developing a conservationinfrastructure (a public service) and breeder actions (a privatesectoractivity). Nongovernmental organizations have to play a key rolein the conservation of
indigenous breeds, and their engagement is likelyto continue by assisting breeders with technology
transfer.Conservation infrastructure consists of a set of actions takenby the public sector for the
public good. These actions includedevelopment of cryopreserved germplasm reserves that can
beused to regenerate the breed, reduce inbreeding levels, anduse molecular genetic tools to
evaluate genetic diversity and/orgenes of interest. A sufficient quantity of semenand, potentially,
embryos should be collected to regeneratethe breed if necessary and to relieve potentially high
levelsof inbreeding.
In-situ maintenance of the genetic diversityis the responsibility of the breeders. To aid in
conserving indigenous breeds,there is a need to develop market for indigenous breeds that
provides breederswith an economic incentive for raising respective breed. Breeder, participation in
the breed association provides a linkage for technology transferand marketing activities.
Some of the biotechnologies offer tremendous potential to address real problems facing farmers
in developing countries. For example, the area of genomics, allowing the identification and
characterization of individual genes influencing traits such as disease or stress resistance, growth
rate or yield, promises to be of great value. The genetic material (genomes) of several hundred
species, including mammals, plants, fish, bacteria and viruses, has already been sequenced or
sequencing is in progress and the information generated from genomics studies in other fields, such
as human medicine or basic science, may also be useful for the application of genomics to food and
agriculture.
Causes of geneticerosion in domestic animals
Three factors are considered as being largely responsible for the declining genetic diversity of
livestock:
Destruction of the native habitats of livestock breeds;
The development of genetically uniform livestock breeds;
Farmer and / or consumer preferences for certain varieties and breeds (and changes in these
consumer preferences over time).
Among these, commercial interests are considered as the most important pressure on livestock
diversity. Important factors in determining the direction and nature of change include: growth
performance (productivity), pest and disease resistance, ease of handling, adaptation to current
levels of technology, and to a relatively minor extent consumer choice.
23
Cause
Inappropriate Aid
Product-focused
selection
Changes in land use
Changes in
knowledge
Change in
Technology
Change in Economy
Intensification
Cross-breeding
Storage
Conflict
Disaster
Description
Lack of appreciation of the value of indigenous breeds and their importance in
niche adaptation.
Incentives to introduce exotic and more uniform breeds from industrialised
countries
Undue emphasis placed on a specific product or trait, leading to the rapid
dissemination of one breed of animal at the expense of others
Conversion of rangelands and mixed farming systems foragriculture, game parks,
and industrial use
The idea that "modern/imported is best" has led to the loss of knowledge about
traditional livestock husbandry practices and to the erosion of domestic animal
diversity
Replacement of animal draught and transport by machinery, leading to permanent
change of farming system, artificial insemination and embryo transfer leading to
rapid replacement of indigenous breeds
Decline in economic viability of traditional livestock production systems
Livestock populations that rely on veterinary services and on improved feeding
conditions. Heavy investment in preventative and curative veterinary measures,
and in feeding, housing and management.
Multipurpose local species and breeds replaced by those with higher milk, meat,
egg production (including cross-breeds and pure-bred exotics)
Predominance of sires from a few selected breeds in widespread cross-breeding
programmes can lead to loss of features expressed by specialised breeds
Failure of cryopreservation equipment (used to freeze semen, ova and embryos) or
lack of refrigerant, inadequate maintenance of frozen semen from breeds that are
not in demand
Wars and other forms of socio-political instability can lead to livestock owners
moving their stock out of their usual area, thus increasing the possibility of mixing
with other breeds thereby potentially losing a location-specific breed
Natural disasters such as floods, drought or famine can result in whole breeds
dying out
Conservation strategies
The conservation strategies being followed are ex- situ and in-situ. The ex- situ conservation is either
having live animals of a breed at nucleus farm or cryopreserving the germplasm for longterm
storage. In-situ conservation is maintaining the sizable population of a breed in the tract with the
livestock keepers.
Present status of conservation of Animal Genetic Resources
The economically important species/breeds are being maintained by the livestock keepers and are
being improved continuously. The population of these breeds is either growing or available in
sufficient numbers with sufficient genetic diversity. The breeds which are not economic to the
farmers need intervention. In this regard most of these breeds are being maintained in the cryocans.
In majority of cases the semen of these breeds has been preserved. In few cases other biological
material like ova, embryo or even somatic cells are being preserved.
Breeds which are facing extinction:Most of the draft cattle breed like Krishna Valley, Nagori, Khilar,
Bargur, Amritmahal, Punganur, Ponwar, etc. Many of the buffalo breeds like Bhadawari, Toda,
Surti are facing threat as Murrah is being used as improver breed throughout the country due to
increased demand of liquid milk. Due to very little value for the wool from the Indian breeds and
24
scarce grazing resources most of sheep breeds are losing ground. The sheep are being maintained
as meat animal but has to compete with goat which are more prolific and have an advantage over
sheep for value of meat in large part of country. Almost all the native breeds of chicken face
extinction due to over emphasis on commercial chicken farming. The pack animal species like
camel, equines, Yak etc face threatened due to their very limited utility and changing production
systems.
Which is more important- Conservation of breeds, genes or unique character: The conservation is of
paramount importance as the livestock resources contribute significantly to the rural economy and
especially for the down trodden population. The livestock is more evenly distributed than the land
resources and has great potential as a resource for poverty alleviation programme. The genes are
functional in a combination of large number of genes involved in different gene networks and the
breeds which possess the desired/unique characters have these genes in right combination. The
present day research to find out the unique alleles of various genes which have evolved over the
long period due to adaptation of the indigenous livestock resources are also very important. These
have become more important due to increased concerns for Global warming as our resources are
more adapt to sustain their production in harsh climate and scarce feed resources. The research
efforts are required to identify these alleles in different resources which can boost the economic
values of indigenous livestock.
Effectiveness of the programme on conservation:Various conservation programmes being executed by
the agencies is yielding little of the desired results as these are not able to improve the profits from
the uneconomic breeds/species of livestock. The conservation is a long term activity and the
benefits are not generally appreciated by the planners and the masses as one cannot account these
benefits in short term foreseeable time. Thus conservation activities have to be undertaken with
long term commitment in form of finances as well as continuity of the programmes. Most of the
time the conservation/improvement in livestock resources is compared with plant resources which
are entirely different and due to long generation interval per year gains are very nominal.
Suggestions/recommendations
The best way of conservation is to sustainably utilize the resources in their ecological niches so that
these are continuously evolved to produce in changing environs. The effectiveness of these
programmes can only be enhanced if the developmental agencies like state animal husbandry
departments are sensitized and the region/area specific long term plans are implemented for
genetic enhancement of resources with involvement of stakeholders/farmers.
25
5
Conservation of Genome Resources- Concept of Gene Bank
Rajeev A K Aggarwal
ICAR- National Bureau of Animal Genetic Resources, Karnal (Haryana)
________________________________________________________________________________________
Introduction
The genetic resources of farm animals in India are represented by a broad spectrum of native
breeds of cattle, buffaloes, goat, sheep, swine, equines, camels and poultry. The genetic biodiversity
among this livestock has developed and stabilized over millions of years of evolution and endowed
the indigenous breeds with capabilities to withstand hostile climate, epidemic pests and diseases,
and to survive on inadequate quantities of feed, fodder and water. However, over the years due to
many reasons the population size of many breeds is declining. As genetic diversity equips farmers
and breeders to utilize a wide range of production environments and develop diverse products to
meet the needs of local communities, the unavailability of such diversity in future may hamper
sustainable development. Hence, the need for conservation of animal genetic resources has been
accepted in India as well as globally.
Conservation methods
Conservation methods can be broadly categorized as in situ and ex situ. In situ conservation means
that animals are kept within their production system, in the area where the breed developed its
characteristics. Ex situ conservation applies to situation where animals are kept outside their area of
origin (herds kept in experimental farms, farm parks, within protected areas or in zoos) or more
often, when genetic material is conserved and stored in gene banks in the form of semen, ova,
embryo or DNA. Conservation through any of these methods has its own merits and demerits.
1. Organized flocks/herds: Maintenance of small population at a place away from the main breeding
tract of the breed is the ex situ conservation of the live animals. This may be in the form of
organized herd maintained in a research institution, bull mother farm, state owned livestock
farm, zoo or breed park. This population can be used in regeneration of endangered breed, new
breed development and DNA studies.
2. Cryopreservation of embryos: This is ideal for breed improvement, conservation and revival of lost
breed. Its main importance is due to its diploid nature and containing all genes. However
conserving embryos finds limited use, as its production and transfer require highly skilled
manpower and large resources.
3. Somatic cell banking: Somatic cells can be used as genetic material for conservation of endangered
animal genetic resources. They are diploid cells and contain full genetic code of an animal. Cost
of maintenance of these cells is very low and can be sampled quickly even from remoter area at
low cost. They can be used for production of therapeutic proteins also. However somatic cells
also find less preference in conservation programme as the success rate of cloning is still very
low.
4. Epididymal sperms banking:Epididymal spermatozoa particularly caudal spermatozoa are mature
and have full competence to undergo normal fertilization and cause fetal development. In vitro
fertilization (IVF) experiments have revealed that epididymal semen possesses binding sites for
important zona pellucida proteins. Collection of cauda epididymal semen from slaughtered
animals would be a rapid and cheap alternative of sperms conservation as it would obviate the
requirement of time consuming and extensive training of males for semen donation. Hence,
epididymal spermatozoa cryostorage is promising methodology of conservation especially in
small ruminants and further research efforts are needed in this direction.
26
5. Cryopreservation of embryonic stem cell lines: This can be excellent biological material for producing
live animals and producing genetically modified animals. This also finds usage in gene and cell
therapies, and for producing vital therapeutic proteins. However, so far they have limited usage
as stable embryonic stem cell lines have not been successfully generated in farm animals except
in human and rodents.
6. Cryopreservation of spermatogonial stem cell lines:The Spermatogonial Stem Cells (SSCs) are adult
stem cells, which transmit genetic information to the next generation and create foundation for
spermatogenesis. Transplantation of spermatogonial stem cells from a donor mouse testis into
the seminiferous tubules of a recipient mouse testis results in donor-derived spermatogenesis.
SSCs transplantation has also been demonstrated in goats, dog, cow, pig, baboon and bovine
spermatogonial stem cells shown to be capable of colonizing recipient mouse seminiferous
tubules. An in vitro system that supports the proliferation and maintenance of spermatogonial
stem cells could be used to preserve and expand spermatogonial stem cell numbers as well as
aid in genetic modification. However much needs to be done in farm animals before its potential
could be utilized in domestic livestock diversity preservation.
7. Storage of DNA: Cryogenic storage of DNA is another method of preservation of genetic material.
It has several advantages over the live germplasm as it is very easy to obtain, store, transport at
low cost with no chance of disease transfer. The DNA may find use in gene conservation
through their introgression by transgenesis or knock out technology, and can help in recreation
of lost breeds by cross checking of different populations or genetic material used. However this
has limitations due to the fact that genome maps of different farm species are not yet available
and life can not be created from DNA alone.
8. Frozen Semen: This is ideal for genetic resources utilization activities, providing sample half of
the genetic material of preserved breeds in a form that permits convenient introgression into
recipient population. However, regeneration of a cryopreserved breed from frozen semen in one
generation is possible only if living females of that breed are available, otherwise several
generation of up gradation are required to reestablish a conserved breed. In spite of this
limitation, availability of established semen freezing technology especially in cattle and buffalo,
and presence of semen freezing infrastructure across the country makes it method of choice for
conserving indigenous livestock biodiversity. The National Bureau of Animal Genetic Resources
(NBAGR) is playing a pivotal role in ex situ conservation through semen cryo-storage of
indigenous livestock for posterity by establishing a National Semen Bank at Karnal.
Conservation priority
High costs of collection and limited use of preserved material restricts development of ex situ
collection. Hence it may be appropriate to prioritize breeds for undertaking them in ex situ
programme and evaluation of many factors may make basis of such prioritization. To implement
the conservation programme it is thus essential to have breed-wise livestock census along with
their population and production trends. However many a times the data is available species wise,
there is a need to explore quick population estimates and undertake conservation efforts for
threatened breeds. The unique genes possessed by a breed and the likelihood of its extinction may
be an important parameter to set the priority of conserving a breed.
The quantification of relatedness among breeds can group them in different sets, each set
consisting of genetically closer/ relatedness breeds which are different than breeds of another set.
Such arrangement will drastically reduce the conservation costs as conserving a single breed in a set
will represent all breeds of respective set. Such phylogenetic differentiation of breeds is possible by
mapping the genes in livestock species using microsatellite markers. The usefulness of these
markers for estimation of genetic distances among closely related population in different species of
livestock has been documented by numerous studies (Bowcock et al.,, 1994, Buchanan et al., 1994,
27
Cianpolini et al., 1995, Bradley et al., 1996, Mac Hugh et al., 1997). Food and Agricultural
Organization (1996) has well laid detailed technical programme for large scale international
conservation project using microsatellite markers under MODAD project.
Based upon
abovementioned considerations, some breeds of different species have been undertaken for ex situ
conservation programme and for keeping their frozen semen in National Semen Bank at NBAGR
(Table 1). Simultaneously ex situ conservation in form of DNA and Somatic cells has also been taken
at NBAGR.
Table 1: Germplasm (Frozen semen) stored in National GeneBank of NBAGR, Karnal
Cattle
Buffalo
Goat
Sheep
Equine
Yak
Camel
Cattle
750
7,500
1,000
10,000
Sheep
1,500
15,000
500
10,000
Goat
500
5,000
200
10,000
Pig
150
200
10,000
Horse
1,000
5,000
10,000
Reference
Alderson (1981)
Simak (1991)
Maijala (1982)
FAO
Table 3: Population size of indigenous breeds for their status of endangerment (,000)
Species
Cattle
Buffaloes
Sheep
Goats
Camels
Horses
Pigs
Cattle
Buffaloes
Normal
25
30
50
30
20
20
10
>30
>35
Insecure
15-25
20-30
30-50
20-30
15-20
15-20
5-10
20-30
25-35
Vulnerable
Endangered
5-15
10-20
15-30
10-20
5-15
5-15
1-5
10-20
15-25
2-5
5-10
8-15
5-10
205
2-5
0.5-1.0
5-10
10-15
Critical
<2
<5
<8
<5
<2
<2
<0.5
<5
<10
Reference
Nivsarkar et al., 1994
may be desirable in preserving a recessive gene. The preservation of quantitative variation within a
population or breed would require about 100 units of semen from each of 10 to 20 unrelated males
(CAST, 1984). As per Smith 1984, conserving collection of frozen semen from 25 sires would be
adequate for all species. However it is appropriate to have frozen stores, which are large enough to
provide a good representation of the conserved stock and to prevent much genetic drift or
inbreeding.
Conclusion
India is a repository of a large segment of biodiversity in livestock germplasm, with nearly 39
breeds of cattle, 13 of buffaloes, 40 of sheep, 24 of goats, 9 of camel, 6 of horses and ponies, 16 of
poultry, 3 of pig, 1 of donkey and many populations of other livestock like Yak, Mithun etc. Having
so large animal diversity and spread over large territory of country, it becomes a gigantic task to
conserve even those breeds where populations are decreasing. Further the divergence in methods
for undertaking ex situ conservation programme also complicates it. This situation necessitates the
selection of a cost effective ex situ conservation method and involvement of many agencies in
undertaking ex situ conservation programme working in a network at national level for preserving
the indigenous farm resources.
References
Alderson L. 1981. The conservation of Animal Genetic Resources in United Kingdom. FAO Animal
Production and Health Paper No. 24, pp 53-76. FAO, Rome.
Bowcock A.M., Ruiz-Lineares A., Tonfohrde J., Minch E., Kidd J.R. and Cavalli-Sforza L.L. (1994). High
resolution of human evolutionary trees with polymorphic microsatellite. Nature 368, 455-457.
Bradley D.G., MacHugh D.E., Cunning-ham P. and Loftus R.T. 1996. Mitochondrial diversity and the origins
of African and European cattle. Proceedings of the National Academy of Sciences of the USA 93, 5131
Buchanan F.C., Adams L.J., Littlejohn R.P. et al. 1994. Determination of evolutionary relationships among
sheep breeds using microsatellites. Genomics 22, 397-403.
CAST 1984. Animal germplasm preservation and utilization in agriculture. Published by Council for
Agricultural Science and Technology. Report No. 101, 30-31.
Cinapolini R., Moazani-Goudarzi K., Vaiman D. et al. 1995. Individual Multilocus genotypes using
microsatellite polymorphisms to permit the analysis of the genetic variability with in and between Italian
beef cattle breeds. Journal of Animal Science 73, 3259-68.
Food and Agricultural Organisation of the United Nations (FAO) 1996. Global projects for the maintenance of
Domestic Animal Genetic Diversity (MoDAD).
Mac Hugh, D.E., Shriver, M.D., Laftus, R.T., Cunningham, P., Bradley, D.G. 1997. Microsatellite DNA
variation and the evolution, domestication and phlogeography of taurine and zebu cattle (Bos taurus and
Bos indicus ). Genetics 146, 1071-86.
Maijala K., 1982. Preliminary report of the working party on animal genetic resources in Europe. In
Conservation of Animal Genetic Resources. Session 1. Commission of Animal Genetics, EAPP, G.I.@
Leningrad.
Nivsarkar, A.E., Gupta, S.C., Vij, P.K. and Sahai R. 1994. Identification and conservation of endangered breeds
of livestock- strategies and approach. Proceedings of the National Symposium on Livestock Production and
Management held at Gujarat Agricultural University, Anand, 21 to 23 February 1994.
Nivsarkar A E, Vij P K, Tantia M S 2000. Strategies for conservation. In Animal Genetic Resources of India Cattle
and Buffalo. pp 318-333, ICAR, India
Simak E. 1991. The conservation of rare breeds in West Germany. In Genetic Conservation of Domestic Livestock.
(Ed.) Lawrence Alderson. Pp. 65-69. CAB International, London.
Smith, C. 1984. Economic benefits of conserving animal genetic resources. Animal Genetic Resources
Information. 3: 10-14.
29
6
Cytogenetic and Molecular Methods for Screening of Major Genetic
Defects in Livestock
S K Niranjan and R S Kataria
ICAR- National Bureau of Animal Genetic Resources, Karnal (Haryana)
________________________________________________________________________________________
Abnormality in specific parts of genetic materialcauses genetic diseasein anindividual.These
abnormalities may be a minor change in form of point mutation at nucleotide level to anlarge
alteration at chromosome level.Although, all of the genetic defects do not terminate into a disease,
however, it depends on the type, location and intensity of such genetic defect. Sometime, individual
looking normal may possessgenetic defect, which may remain unnoticed throughout the life time.
For example, mutation at nucleotide level, in heterozygous condition may not culminate into a
disease asanother normal copy of the genetic material compensates the effect. However, such
individual act as carrier to inherit the defect to next generations and may produce progenies
homozygous to the mutation after mating with similarcarrier or mutant homozygous individual.
Similarly some of the genetic diseases are also expressed at later stage of life, however, before that
individual can inherit the defected copy of the gene or chromosome to the next generation.
In most of the cases, genetic disorders are inherited from the parents; however some may be
acquired de novo due to mutation in genetic material. Ifthe genetic defect has been occurred in the
germline cells then it passes to the next generation through gamete. Other genetic defects in somatic
cells can not pass on to the next generation, however capable to cause a genetic disease in that
individual. Genetic diseases due to single gene defect as point mutation has Mendelian or
monogenetic inheritance. Mostly genetic defects are rare in nature because of continuous natural
selection against them. There are about 6000 known single gene disorders in human.Some of
theimportant diseases like cystic fibrosis, sickle cell anaemia, Huntingtons disease are found to
have a genetic basis of occurrence. Certain multifactorial diseases like cancer, which are also
supposed to be caused by either defect or presence of specific seemingly undesirable alleles in a
number of genes or loci. However, some of the other factors like environment also play an
important role in precipitating suchkind of diseases. In fact, the inheritance of these disorders is not
simple Mendelian type. Another kind of genetic disorder like mitochondrialencephalopathy a kind
of dementia is caused by mutation in the mitochondrial DNA. Mitochondrial inheritance occurs
from female parent only.
Inheritance of the chromosomal abnormality is not clear, however, some of the minor
chromosomal changes mayinherit in Mendelian mannerto the next generation.Majority of
individuals with chromosomal defect havevery less survivability and/or fertility; therefore, their
contribution to the next generation is naturally ended.
In such circumstance, any individual with genetic defect may inherit the defective gene or
chromosome to a larger number of progenies; thereby have more economic concern in livestock
industry. Because most genetic diseases are inherited from the carriers, which generally produce no
noticeable indications, the undesirable trait can proliferate extensively in absence of screening of
genetic defects.
During recent time, we are now able to diagnose the genetic defect in the individuals. Now,
biotechnology offers to diagnose genotypes, such as normal, carrier, or affected individuals.
Understanding the molecular basis of a defect, the direct detection of the heterozygous carriers is
thus possible even during embryonic stage. In livestock, genetic screening has become much
30
essential in view of intensive selection in dairy and meat industry, which has predisposing only few
of the high valued males.Now a day, cytogenetic and molecular screening of all breeding males has
been made essential in the new National Programme on Cattle and Buffalo breeding (NPCBB) to
keep our farm animals free from genetic defects aroused by any chromosomal abnormalities or
nucleotidemutations.
Cytogenetic methods
Each species of domestic animals has specific chromosomes, regarding the number as well as the
form. Following table is showing normal chromosome numbers in different livestock species.
Table 1. Chromosome numbers in different livestock species
Species
Scientific name
Cattle
River buffalo
Swamp buffalo
American bison
Mithun
Camel
Dog
Cat
Donkey
Goat
Horse
Pig
Sheep
Yak
Chromosome
Number
60
50
48
60
60
74
78
38
62
60
64
38
54
60
Chromosome structure depends on the stage of mitosis. The chromosomes can be set up
pairwise, when individual pairs can be identified. They are arranged according to size and/or the
position of the centromere. Shortest arm of the chromosome is called the p-arm and the longest the
q-arm. When the chromosomes are presented, the q-arm is always turned downwards. Centromere
assumes relatively at constant position at the chromosome. Therefore, the ratio of two arms length
is remain as constant and is important for identification of different chromosome. Position of
centromere and ratio of arms length classify the chromosomes in four categories- Metacentric,
submetacentric, acrocentric and telocentric. Different species in same family have similar kind of or
homologous chromosome. Cattle possess 29 pairs of autosomes acrocentric and X (submetacentric)
and Y (metacentric in taurine and acrocentric in indicine cattle) sex chromosome. In buffalo, there
are 5 pairs of autosomes are submetacentric, 19 pairs of autosomes and X and Y sex chromosome
are acrocentric. Five submetacentric chromosomes of buffalo derived from centric fusion of 10 pairs
of autosomes i.e. 1/27, 2/23, 8/19, 16/29, and 5/28 of cattle. Swamp buffalo has another fusion of 4
and 9 chromosomes of riverine buffalo. Like cattle, goat also has 60 chromosomes, which are all
nearly identical with those in the cattle, except for the sex-chromosomes X and Y. The Xchromosome in the goat is acrocentric and the Y-chromosome is much smaller than the cattle. In
sheep the same differences in the sex-chromosomes are found, but in addition there are three
centromere fusions. The sheep possess 54 chromosomes due to fusion of 3 pairs (1/3, 2/8 and 5/11)
of chromosomes of goat.
31
Chromosomal abnormalities
Chromosome abnormalities usually occur when there is an error in cell division. Mitosis and
Meiosis, both processes, the correct number of chromosomes is supposed to end up in the resulting
cells. However, errors in cell division can result in cells with too few or too many copies of a
chromosome. Errors can also occur when the chromosomes are being duplicated. Other factors that
can increase the risk of chromosome abnormalities are maternal age and environment. Generally in
mammals, female is born with all the eggs. The age of eggs also increase with the age of female
therefore, older females are more at risk of giving birth to babies with chromosome abnormalities
than younger. In males, sperms are newly produced throughout the life; therefore, it does not
increase risk of chromosome abnormalities. Some time, specific environmental factors can also
cause chromosome abnormalities. It is also important to note that some races of human being and
some breeds in livestock or from specific region may have higher incidence of genetic defects, at
chromosomal or DNA level.
Abnormality in chromosomal numbers:Euploidy is the condition of having a normal number of
structurally normal chromosomes. Aneuploidy is any deviation from euploidy, having less than or
more than the normal diploid number of chromosomes.It is the most frequently observed type of
cytogenetic abnormality. Monosomy is lack of one of a pair of chromosomes. A common
monosomy seen in many species is X chromosome monosomy and is commonly lethal during
prenatal development.Trisomy is having three chromosomes of a particular type. Another type of
aneuploidy is triploidy. A triploid individual has three of every chromosome, that is, three haploid
sets of chromosomes. A triploidcattle would have 90 chromosomes (3 haploid sets of 30). Triploid
commonly occurs by fertilization of one ovaby two sperm. However, birth of a live triploid is
extraordinarily rare and such individuals are quite abnormal.
Abnormality in chromosomal structure:A chromosome deletion occurs when the chromosome breaks
and a piece is lost. This of course involves loss of genetic information. A related abnormality is a
chromosome inversion- a break or breaks occur and that fragment of chromosome is inverselyrejoined. Inversions, thus do not involve loss of genetic material, however, breakpoints may disrupt
thegene. Generally, individuals carrying inversions have a normal phenotype. In chromosomal
translocation, chromosome(s) break and the fragments re-join to other chromosome(s). There is no
loss of genetic material, although the breakpoint can cause disruption of a critical gene or may
create fusion gene. Translocation is manifested as reductions in fertility or some time some disease
conditions like cancer. When two non-homologous chromosomes break and exchange fragments, it
is termed as reciprocal translocations. Individuals carrying such abnormalities may have a normal
phenotype, but may show subnormal fertility. A centric fusion is a translocation in which the
centromeres of two acrocentric chromosomes fuse to generate one large metacentric chromosome.
32
They are also often called Robertsonian translocations. The karyotype of an individual carrying a
centric fusion has one less than the normal diploid number of chromosomes. The best known is the
1/29 centromere fusion of chromosome 1 and 29 in cattle.
Chromosomal abnormalities may originate during gametogenesis or during or after fertilization.
Majority of aneuploids result from defective or abnormal gametogenesis, however, most of
haploids and polyploids occur during or after fertilization. About 25% of the abnormalities can be
attributed to errors during meiosis, while rest of the abnormalities occurs around the time of
fertilization. Chromosomal abnormalities may account for approximately one fifth of the total
embryonic and fetal loss. It has been seen that development rate is comparatively slow in
chromosomally abnormal embryos compared to normal diploid embryos. Major deviations are
rarely compatible with survival, and such individuals usually die prenatally.
Incidence and Significance
Both the overall incidence and the occurrence of specific abnormalities clearly depend upon when
the data are collected relative to development. This bias is clearly understood by considering the
effect on survival of minor versus major genetic lesions. For example, when newborn children are
screened, it is found that roughly 1 in every 200 has a chromosomal abnormality. Some of these
children are phenotypically normal, while others have obvious, sometimes severe manifestations of
disease. By definition however, these children have chromosomal disorders at the "mild" end of the
spectrum because they are compatible with survival to term.
A much higher incidence of chromosomal disease is seen if one looks earlier in gestation.
Approximately half of the human fetuses that are spontaneously aborted during the first trimester
are chromosomally abnormal, reflecting chromosomal disorders severe enough to disrupt prenatal
development. If one looks at the chromosomes in pre-implantation embryos, even higher numbers
of abnormalities are seen: 5-10% of viable blastocysts collected from cattle and pigs were
cytogenetically abnormal. Finally, some chromosomal abnormalities are essentially never seen,
presumably because they are so profound as to cause death shortly after fertilization.
The concepts on incidence presented above refer to the broad spectrum of chromosomal
disorders. It is important to recognize that certain abnormalities can reach a very high and
important prevalence in small populations of animals. This has been vividly observed with certain
types of translocations, which reduce fertility yet cause little if any disease in carriers. A classic
example is the 1/29 centric fusion in cattle, which has at times reached a prevalence of up to 30% in
certain breeds within a particular country.
Multiple congenital malformations are seen with many types of chromosomal abnormalities,
particularly deletions and aneuploidy. Animals with a balanced set of chromosomes will generally
be normal phenotypically. If an individual does not have a balanced set of chromosomes, this will
normally be visible through more or less deviation of phenotype from normality. Animals with a
non-balanced set of chromosomes will most often be sterile and have low vitality. Chromosome
deviations, in animals with a normal phenotype, are normally detected due to low fertility or
complete sterility.The trisomies are very rare in animals, but they occasionally occur. In cattle,
normally the foetuses carrying trisomy of chromosome 28 are aborted or die straight after birth.
Such animals show cleft palate and heart abnormalities.In most domestic animals less severe
chromosome errors occur. The subfertility is caused by problems in chromosome pairing and
segregation during meiosis. In general, however, it shows a substantial, often greater than 50%
reduction in fertility. Chromosomal fusion in heterozygote form causes a slightly lower fertility.
The karyotype of a bull with low fertility has shown having a 1/8 translocation. In livestock, the
defects of sexual chromosomes usually influence the development and function of reproductive
system. In buffaloes and some cattle reduced fertility revealed the structural and numerical
aberrations of the chromosomes more frequent, specifically chromosomal gaps and deletions in
33
autosomal and sex chromosomes as well chromatid breaks and centric fusions in autosomal
chromosomes. Chromosomal disorders such as XO, XXY, translocation reported in livestock can
reduce the fertility or hamper the breeding of animals. However, defects in autosomes are usually
lethal except mosaicismand translocations.
In buffaloes, mosaicism of sex chromosomes has been observed in heterosexual and in some of
homosexual twinning cases. XX/XY is found in intersex female and XO/XX/XY is observed in cotwined bull. On the other hand, XO/XY and XO/XX were recorded in one male and in one female
of homosexual twins respectively. In twinning of foetus with different sex, a mixture of stem cells is
established for the white and the red blood cells by mixing the blood in the early foetal stage. If the
mixing is too extensive the heifer in a mixed twin pair gets abnormal sexual organs and is infertile
and called Free martins. The bull birthed from such twining generally has normal fertility, however,
might show the genotype of the other twin.
Cytogenetic screening
By studying the chromosomes, we generally study the inheritance pattern from one generation to
another. It also gives an opportunity to locate the genes and their arrangement on the
chromosomes, which become important for the linked loci. Generally, chromosomes are extended
during interphase of the cell, however, condensed desired shape is achieved during the metaphase
of the cell division. Science applies for studying the chromosomes for their structure, function,
anomaly, and establishing the relationships with phenotype is called cytogenetics. It also includes
routine analysis of chromosomes, their banding. Now a days molecular cytogenetics like
fluorescent in situ hybridization (FISH) and comparative genomic hybridization (CGH) has also
been come out, which is analysing the chromosome with more refinement, however, limited use is
there in routine due to high cost.
Chromosomal banding
Chromosomal banding is mainly based on the staining chromosomes with a specific dye. Most
commonly used bandings are G (Giemsa), R (reverse), Q (quinacrine) and C (centromere) banding.
During staining some part of the chromosome is strongly stained compared to others, forming band
like patterns. These darkly stained bands are referred as positive or respective G, R, Q and C band.
Each of these techniques produces a pattern of dark and light (or fluorescent versus nonfluorescent) bands along the length of the chromosomes. Importantly, each chromosome displays a
unique banding pattern like bar codes, which allows it to be reliably differentiated from other
chromosomes of the same size and centromeric position. G- banding technique preferentially stains
the regions that are rich in adenine (A) and thymine (T). R- banding is reverse to the G-banding in
which region rich in A and T are light in staining. C banding stains the heterochromatin areas.
NOR-staining identifies genes for ribosomal RNA in necleolar organizing region.
Molecular methods
Genes are located on chromosomes. Sometime, any individual may be chromosomally sound but it
may have the genetic defect at DNA level. Although mutations at DNA level, particularily point
mutation are more frequent in nature but mutation at a functional part of the genome may cause
genetic disorder. Such defects are also inherited to next generation. In contrary to chromosomal
defects, the individuals with gene defect(s) may survive and fertile enough atleast if the defect is in
heterozygous condition.For example, frequency of several genetic diseases like BLAD,
citrullinemia, deficiency of factor XI, and most importantly the infertility (among crossbred males
as well as females) has also increased in view of crossbreeding in cattle, leading to major
production losses.
34
B. M. Marron, J. L. Robinson, P. A. Gentry and Beever J. E. 2004. Identification of a mutation associated with
factor XI deficiency in Holstein cattle. Animal Genetics, 35: 454.
Dennis, J.A.; Healy, P.J.; Beadudet, A.L.; O'brien, W.E. 1989. Molecular Definition of Bovine
ArrgininosuccinateSynthetase Deficiency. Proceedings of the National Academy of Sciences ofthe United
States of America 86: 7947.
King W.A. (1990).Chromosome abnormalities and pregnancy failure in domestic animals.Advances in
Veterinary Science and Comparative Medicine, 34: 229.
Prakash, B., Balin, D.S., Lathwal, S.S. 1992. A 49,XO sterile Murrah buffalo (Bubalus bubalis) Veterinary record
130: 559.
Prakash, B., Balain, D.S., Lathwal, S.S. and Malik, R.K. 1995. Infertility associated with monosomy-X in a
crossbred cattle heifer. Veterinary-Record. 137: 436.
Schwenger, B., Schber, S. and Simon D. 1993.DUMPS Cattle Carry a Point Mutation in the Uridine
Monophosphate Synthase Gene. Genomics 16: 241.
Shuster, D.E.; Kehrli, M.E.; Ackerman, M.R.; Gilbert, R.O. 1992. Identification and prevalence of genetic defect
that Causes Leucocyte Adhesion Deficiency Diseases in Holstein Cattle. Proceedings of the National
Academy of Sciences of the United States of America 89: 9225.
35
________________________________________________________________________________________
Animal Genetic Resources (AnGR) includes both livestock and poultry resources available for food
and agriculture. As per FAOs Global Databank, there are nearly eight thousand livestock breeds in
the world. Among these, about 20% of the breeds are classified as at risk. Moreover, a large fraction
has not been properly characterized and genetic similarity between them is largely unknown. Such
knowledge is useful in designing conservation programs as well as in formulating breeding
programs. The selection of breeds or strains of livestock for conservation or improvement programs
can be hampered by an inadequate description of population structure both within and between
populations. The choice of appropriate populations for conservation or improvement should be
based on a combination of phenotypic and genetic data. Geographical isolation over time has built
up a plethora of genetic types but the magnitude of genetic differentiation has rarely been
quantified. Indiscriminate crossbreeding further clouds the situation. A key element of a
conservation strategy for animal genetic resources must be the characterization of breeds and
strains to provide an overall picture of genetic diversity. Though it is difficult to characterize the
difference between the breeds in terms of agriculturally important genes, but general genetic
variability is the most suitable criteria for identifying the breeds for genetic uniqueness, which is an
important criterion that can be used when breeds are selected for conservation. The underlying
assumption being that breeds which are taxonomically distinct (Hall and Bradley, 1995)are most
likely to have special adaptation and gene combination not found in other breeds. By selecting for
conservation those populations with unique evolutionary histories a maximum amount of diversity
could be preserved. Genetic characterization of the native breeds is a first step in the conservation
programme, as it will help the decision makers to identify genetically unique breeds so that they
may be prioritized for breed conservation purposes. In order to facilitate and rationalize the
maintenance of domestic animal diversity, it is essential that simple assays be quickly developed
taking advantage of molecular genetic tools now available. At present, arrays of DNA based
techniques to type polymorphic loci for detecting diversity at DNA level are available and are being
exploited globally to construct genetic profile for different populations/breeds/strains of farm
animals.
Molecular tools in diversity analysis
Traditionally, phenotypic characterization based on morphological features, physical body
measurements, production traits, reproductive traits and adaptive traits was used in the description
of breeds. A number of reports are available (Acharya and Bhatt, 1984; Nivsarkaret al., 2000)
wherein assignments of indigenous breeds was based on phenotypic/subjective data and
information generated from the local sources. Additionally, efforts were also made on the
characterization of genetic variation. Early reports on the detection of genome variation were
focused on the analysis of protein and blood group type variation. These biochemical markers were
used extensively, but were not very effective for characterization, as they often express low level of
polymorphism, and also are sex limited and age dependent. To overcome the limitations of
biochemical markers, several DNA based markers were employed for genetic characterization. The
exploitation of DNA polymorphism as molecular markers has opened many vistas in genetic
characterization, improvement and molecular evolution studies in farm animal. The molecular
36
markers include microsatellite markers (simple tandem repeat, STR), single nucleotide
polymorphism, tandem repeats (VNTRs), random amplified polymorphic DNA (RAPD), single
strand conformation polymorphisms (SSCPs), amplified fragment length polymorphisms (AFLPs),
and restriction fragment length polymorphisms (RFLPs). Additionally, in diversity and phylogeny
studies, specific mtDNA and Y chromosome markers are used for the identification of maternal and
paternal lineages. Out of the number of molecular markers available, the analysis of microsatellite
typing is one such method of choice.
Genetic marker
Among the available neutral molecular markers, microsatellite markers/ STR (short tandem
repeats), maternally inherited mitochondrial DNA (mtDNA) as well as paternal based Ychromosomal variations have been extensively utilized to reveal the genetic structuring,
domestication events and male/ female demographic patterns among the livestock species. The
approach has been extensively demonstrated in Human races to understand the gene flow, genetic
structure and population ancestry.
Amongst these, besides several unique properties, STRs are highly sensitive to population
bottlenecks and selection. mtDNA may be a poor indicator of overall genomic diversity because it
is a single locus and is an extra-nuclear genetic marker with specific evolutionary dynamics. Also, it
is maternally inherited and does not detect male-mediated gene flow, which has a powerful
influence on the evolution among few species, such as pig, in modern times. The Y-chromosome is
paternally (male mediated) inherited and despite being low polymorphic within a species, due to
nonrecombining part of the Y chromosome (NRY) it maintains the original arrangement of
mutation events enabling to trace the male lineages both within and among population.
Therefore, DNA based Microsatellite markers are most preferred for genetic characterization.
With the automation in sequencing and genotyping technologies, it has now become much easier to
genotype microsatellite loci in large number samples. With the availability of high-through put
systems, the most frequently used markers in genetic diversity studies are the microsatellite
markers. Most of these loci are selectively neutral which makes them compatible with the
assumptions of most population genetic theory. They remain unaffected by the environmental
factors, and generally do not have pleiotropic effects on quantitative trait loci (QTL).These are
simple tandem repeated (STRs) motifs of 1-5 nucleotides that are densely and evenly distributed
throughout the genome and often exhibit substantial variation/polymorphism due to site specific
length variation. Their short lengths make them amenable to amplifications by PCR and subsequent
separation by polyacrylamide gels with the resolution of alleles differing by as low as single base.
Additionally, with the automation in sequencing and genotyping technologies, it has now become
much easier genotype microsatellite loci in large number samples.
Microsatellite DNA markers
With the availability of high-through put systems, the most frequently used markers in genetic
diversity studies are the microsatellite markers. These are simple tandem repeated (STRs) motifs of
1-5 nucleotides that are densely and evenly distributed throughout the genome and often exhibit
substantial variation/polymorphism due to site specific length variation, as a consequence of the
occurrence of different number of repeat units. The difference in repeat number can be reliably
distinguished, and the variants are inherited as alleles at each locus. The polymorphic nature of this
type of locus, with variations many times more common than is non-repetitive sequence makes
microsatellite ideal for examining genetic variation within a species. Microsatellites occur at a
frequency of 1 SSR per 10kb DNA and numbering a total of about 50 100 thousand in the
mammalian genome. Their short lengths make them amenable to amplifications by PCR and
subsequent separation by polyacrylamide gels with the resolution of alleles differing by as low as
37
single base. Additionally, with the automation in sequencing and genotyping technologies, it has
now become much easier genotype microsatellite loci in large number samples.
The FAO has formulated an integrated programme for the global management of genetic
resources of various livestock species using species-specific lists of microsatellite loci (about 30 per
species) for cattle, chicken, sheep swine and buffalo to be used in diversity studies and a number of
projects were carried out worldwide for diversity studies. The advisory group of FAO MoDAD
project has compiled a list of 25-30 highly polymorphic microsatellite markers to be used for
analysis of genetic distances for each species. To select appropriate microsatellite the working
group of MoDAD project has issued the following criteria.
- The microsatellite marker should be in public domain.
- Wherever possible, microsatellite loci that have been identified in mapping studies should
be used and should preferably be known to be unlinked
- The microsatellite variants should be shown to exhibit Mendelian inheritance.
- Each microsatellite locus should exhibit at least four alleles.
- There should be information on the microsatellite loci in a published report.
- The microsatellite loci suitable for several related species (heterologous markers) should be
preferred
- Microsatellite markers to be used should be suitable for multiplexing with automated
DNA sequencer.
These criteria were agreed in a meeting of the EU-AIR concerted action group on Analysis of
genetic diversity in cattle to preserve future breeding option which was held in Dublin in 1995. List
of microsatellite markers was compiled as per above recommendations for universal use in
molecular genetic characterization of breeds so that joint analysis of future data from different
laboratories would be possible for prioritizing breeds for conservation in terms of genetic
uniqueness.
International Society of Animal Genetics(ISAG)FAO Advisory Group on Animal Genetic
Diversity have recommended different panels of 30 microsatellite markers for nine major livestock
species-cattle, buffalo, sheep, goat, horse, donkey, camel, pig and chicken (Molecular Genetic
Characterization of Animal Genetic Resources). The list of these is also available at websitewww.globaldiv.eu/docs/Microsatellite%20markers.pdf.
NBAGR hasstandardizedapanel of 25 markers for cattle, buffalo, sheep, goat, camel,equines and
23 markers for pig for genetic characterization. This approach not only yields more accurate data
than using a subset of the markers, but also offers more opportunity for comparisons with results
from previous studies.
Sampling design
For designing for the sample collection, consider the structure of the production system, geographic
locations and pedigree relationships. For genetic characterization, it should be ensured that samples
are drawn in such a way that it should cover most of the genetic variability in the population. For
the sample collection, consider the structure of the production system, geographic locations and
pedigree relationships.
Sample should be collected preferably from the areas (breeding tract) that are closest to the site
of the development of the breed. Samples should also reflect different agro-climatic zones, where
the breed is found. Typically not more than 10 percent of any one herd or village population should
be sampled and in any case not more than five animals should be sampled from any herd. Always
avoid sampling from animals with common ancestors at least for three generations.
If it seems that there are genetic subdivisions within breeds, then it is desirable to collect the
samples that represent all the subtypes. Further, also keep the records of the animals and types,
which are sampled. For breeds that are of hybrid origin (via introgression, upgrading or the
38
planned creation of a synthetic breed) it is essential to have data from parental breeds. For breeds
having a recent history of intense selection and/or inbreeding, sampling of animals from previous
generations which may be available in the form of cryopreserved semen samples may be
appropriate.
For genetic characterization based on mitochondrial DNA (mtDNA), sampling of animals with
common maternal origin should not be taken. Similarly, for Y-chromosomal markers based
characterization, samples belonging paternal origin should be avoided.
Sampling material
Blood is most preferable tissue for sampling. Generally 10-15 ml of blood should be collected as a
sample from an individual. Other samples like semen, hide, bone, tissue (e.g. ear tissue), faeces,
fossils, plucked hair with root cells and feathers can also be used.
Number of samples
For reliable estimation of allele frequencies, at least 25 and preferably 50 animals per breed should
be typed for genetic characterization. More than 50 animals should be collected in view of possible
losses, mistyping or missing. If there are population subdivisions, different subtypes or agro
climatic zones, sampling a larger number of animals is recommended.
DNA extraction
Standard protocol (Phenol: chloroform method) should be followed for DNA extraction from blood
or any other tissues. Several reliable protocols for DNA extraction are available. Most commonly
used protocols are based on proteinaseK/SDS lysis of cells, organic extraction and alcohol
precipitation. Kit based DNA extraction can be done as per protocol given by manufacturer.
Every sample of genomic DNA should contain a minimum of 100 g at which it is used. The
quality of the DNA should be as follows: A260/A280=1.7-2.0; A260/A230>1.5. One agarose gel
electrophoresis photo with at least one size marker should be submitted.
Genotyping
Microsatellite genotyping can be performed either manually (running Urea-PAGE polyacrylamide
gels followed by silver staining) or through automation (amplification using Fluorescent dye
labeled primers and genotyping by automated DNA sequencer). However, the time consuming and
cumbersome technique of manual genotyping is not very successful because of the difficulty in i)
accurate determination of the allele size, ii) comparison of data across the different breeds and, iii)
its reproducibly. All these factors hinder the comparative studies across the various livestock and
poultry breeds. With the advances in sequencing technologies and amenability of microsatellites to
automation, the switch from manual genotyping to fragment analysis using sequencing technique
was feasible and presently is the most widely used methodology.
Automated microsatellite genotyping i.e. amplification using Fluorescent dye labelled primers
and genotyping by automated DNA sequencer should be preferred over manual genotyping
through running Urea-PAGE polyacrylamide gels followed by silver staining technique.
While carrying out microsatellite based genotyping, at least one reference sample should be
included in each experiment so as to cross-validate successive genotyping experiments. It is
preferable that one laboratory performs all typing for a given marker in order to exclude
laboratory-dependent scoring. Include at least one reference sample in each experiment so as to
cross-validate successive genotyping experiments. Use the FAO recommended microsatellite panel
and if possible, include international reference samples in order to link your data to other datasets.
Multiplexing of PCR products while performing fragment length analysis with automated DNA
sequencer can reduce the cost. However, care should be taken to multiplex amplicons of different
sizes and labeled with different dyes (FAM, VIC, NED or PET).
39
PCR products of different sizes and dyes can be pooled for maximizing the throughput. It is
important to pool PCR products together at the correct ratios, in order to get similar fluorescent
intensities across all loci in the pooling. The fluorescent dyes are detected with different efficiencies.
The pooling ratio, or amount of each dye-labeled product added with respect to the other products
in the pool, should be adjusted to ensure an appropriate detection of all the loci.
Post PCR multiplexing
To maximize the thoroughput, products amplified by different primers with different dyes were
pooled for one capillary injection. This is based on fact that ABI PRISM3100 DNA Analyzer can
automatically analyze PCR products of different sizes and dyes. In order to get similar fluorescent
intensities across different loci, it is important to pool PCR products using correct ratios. Hence, the
pooling ratio or amount of each product added with respect to the other products in the pool is
crucial to ensure an appropriate detection of all the alleles. After optimization of pooling ratios, the
products with different fluorescent labels are mixed in the following ratio.
FAM labeled PCR product- 1.0 l
VIC labeled PCR product1.5 l
NED labeled PCR product- 2.0 l
PET labeled PCR product 2.0 l
GeneScan-500 LIZ Size Standard (Applied Biosystems) is used as the internal standard for
fragment sizing. This size standard yields size fragments between 50 to 500 bases providing 16
single-stranded labeled fragments of 35, 50, 75, 100, 139, 150, 160, 200, 250, 300, 340, 350, 400, 450,
490, and 500 bases. Each of the DNA fragments is labeled with a proprietary fluorophore, which
results in a single peak when run under denaturing or native conditions. Internal lane size standard
is run with every sample for accurate sizing.
Data analysis
Microsatellite markers should address questions related to within breed or between breed diversity
based on various parameters.
Intra-population analysis. (allele diversity, gene diversity, deficiency of heterozygotes).
Inter-population analysis (Genetic distance and analyses of molecular variance)
Numbers of methods are available for analysis of data recorded as genotype designations for each
individual across the microsatellite loci using many software packages with different analytical
methods that can be downloaded from internet.
The multi locus genotypes of individuals can be used to analyse the assignment accuracy of
individuals to their respective population using software programme. The individuals can be
assigned to the population in which the likelihood of their genotype is highest and to the
(genetically) closest population.
Numbers of methods are available for analysis of data recorded as genotype designations for
each individual across the microsatellite loci using many software packages with different
analytical methods that can be downloaded from internet. Appropriate software can be used to
assess the within and between breed diversity .Some of the software packages most commonly
used in population genetics are follows:
POPGENE (http://www.ualberta.ca/~fyeh/)
AMOVA (Analysis of Mol. Var.)
Arlequin ( http://lgb.unige.ch/arlequin/)
GenAlEx (http://www.anu.edu.au/BoZo/GenAlEx /)
GENEPOP ( http://wbiomed.curtin.edu.au/genepop/)
GDA (http://lewis.eeb.uconn.edu/lewishome)
GENETIX (http:// lotka.stanford.edu/microsat/microsat.html).
40
Microsatellite ( http://oscar.gen.tcd.ie/~sdepark/ms-toolkit/
FSTAT( http://www.unil.ch/izea/softwares/fstat.html)
Phylip (http://evolution.genetics.washington.edu/phylip/getme.html)
TreeView ( http://taxonomy.zoology.gla.ac.uk/rod/treeview.html)
The following checks should be carried out while data analysis in order to minimize the error rate Identify and critically evaluate samples with identical results, which may indicate errors
during sampling (related samples) or processing of samples.
Examine unusual alleles, which may result from incorrect interpretation of electrophoretic
patterns.The reason may be the bleed-through from other colors because of off-scale data/
primers not fully optimized.
Check for an excess of apparent homozygosity in samples with low DNA concentration
because of allele dropout (i.e. the inability of the assay to detect certain alleles).
Standardize allele-calling with other laboratories, particularly for microsatellites.
Compare allele frequencies with data from breeds that are likely to share the most frequent
alleles in order to detect inconsistent allele sizing.
Check for absence of laboratory-dependent clustering of breeds, which may result from
systematic differences in allele calling. One cause of laboratory-dependence may be labdependent differentiation of microsatellite alleles that only differ by one bp in length.
Determine if any pairs of markers are in linkage disequilibrium (LD). Markers in LD in all
populations are probably genetically linked and thus provide less information about genetic
variability than would two markers that are independent.
Check for markers that diverge from Hardy-Weinberg (HW) equilibrium. Markers that in
most breeds are not in HW may have null alleles or be linked to loci under selection, hence
breaking the assumption of neutrality. Within single breeds, divergence from HW may
indicate the presence of inbreeding or assortative mating.
Calculation of within breed diversity indices
The observed number of alleles (N o ), effective number of alleles (N e ), observed heterozygosity
(H obs ) and expected (H exp ) heterozygosity, Polymorphism Information Content and frequency
distribution at each locus can be calculated using POPGENE software. The allele frequency data is
further used to calculate the number of private alleles (alleles specific to one breed) as well as
number of shared alleles using software GDA (http://lewis.eeb.uconn.edu/lewishome) can be
used.
Hardy-Weinberg and linkage equilibrium test
Three different tests chi square (2), likelihood ratio (G2) and exact test can be applied to analyze the
deviation from Hardy-Weinberg equilibrium (HWE). In Chi Square (2) test, observed and expected
genotypic frequencies were compared while G2 measure likelihood ratio. The Fishers exact test is
applied using Markov chain procedure to compute unbiased estimates of the exact probabilities (P
value).
Ewens Watterson neutrality test
The neutrality of markers markerscanbe checked with POPGENE software by applying Ewens
Watterson test. The test calculates the quantity F which is equal to the sum of squared allele
frequencies.
Estimation of bottleneck in cattle breeds
To estimate the bottleneck events in the investigated breeds, two different approaches can be
followed. The first approach based on the heterozygosity excess consisted of three tests: sign test,
standardized differences test and a Wilcoxon sign-rank test. These methods test for the departure
41
from mutation drift equilibrium based on heterozygosity excess or deficiency. The probability
distribution is established using 1000 simulations under three models infinite allele (IAM),
stepwise mutation (SMM) and two-phase mutation model (TPM). The test can be conducted using
bottleneck v1.2.02 software (http://www.ensam.inra.fr/URLB). Another test is based on graphical
representation of mode-shift equilibrium. It assumes that in bottlenecked populations one or more
of the common allele classes have a higher number of alleles than the rare allele class.
Measurement of F-statistics and gene flow
The degree of population differentiation amongst the breeds can be estimated using variance based
method of Weir and Cockerham (1984). Different, F-statistics estimates viz., f (within-populationinbreeding estimate), F (total inbreeding estimate) and (measurement of population
differentiation) that are analogous to F IS, F IT and F ST respectively, can be estimated using FSTAT
version 2.9.3.2 computer programme. Means and standard deviations of F-statistics parameters are
obtained across breeds by the Jackknifing procedure over loci. The level of significance (P< 0.05) is
determined from permutation test with the sequential Bonferroni procedures applied over all loci.
Wrights F ST assess the degree of genetic differentiation between populations as this classical
estimator is considered most appropriate as genetic drift is assumed to be the main factor in genetic
differentiation among closely related populations. The effects of migration and gene flow (N e m) on
the genetic structure of populations is estimated between each pair of population. N e m values
indicating the average number of effective migrants exchanged per generation, were calculated
according to the formula:
Determination of genetic divergence and relationships
The genetic divergence between each pair of breeds can be calculated using various genetic
distance estimates based on different assumptions. Genetic distance methods can be clustered into
three groups: I) genetic distances based on infinite allele model: II) genetic distances that assume a
step-wise-mutation model III) Genetic distance based on proportion of shared alleles- a non metric
method. In addition, inter-individuals genetic distance based on proportion of alleles shared
averaged over loci can also be calculated.
The genetic distance matrices between the breeds is then used to reconstruct the tree according
to Neigbour Joining (NJ) and unweighted pair group methods with arithmetic averages (UPGMA)
algorithms making use of the PHYLIP package. The robustness of the tree topology was obtained
by 1000 bootstrap resampling of loci.
Multivariate correspondence analysis
The pattern of population differentiation can be evaluated by factorial correspondence analysis
(FCA) of the individual multi-locus scores using GENETIX software (http://www.univmontp2.fr/~genetix/genetix/genetix.htm). This multivariate correspondence analysis method is
analogous to the principle component analysis and can condense the information from large
number of alleles into fewer synthetic variables appropriate for discrete variables. The factorial
analysis can lead to a simultaneous representation of breeds and loci as a cloud of points in a metric
space. For this approach, the allele frequencies at all the loci are used as variables, and the
population clusters were identified graphically.
Breed assignment
The multilocus genotypes of individuals can be used to analyze the assignment accuracy of
individuals to their respective population using the GENECLASS2 software. This program includes
two types of methods: likelihood-based methods and genetic distance-based methods. In the first
type of methods, individuals are assigned to the population in which the likelihood of their
genotype is highest. In the second type, individuals are assigned to the (genetically) closest
population.
42
important to ensure that the sequence of the DNA obtained from the sample originates from the
sample and not from exogenous DNA.
DNA extraction
Standard procedure (phenol-chloroform method) should be followed for DNA extraction.
Genotyping
Standard polymerase chain reaction (PCR) procedure should be followed to amplify small amount
of DNA, which should be sequenced further.
Data analysis
The nucleotide sequences obtained from sequencing should be analyzed further for sequence
alignment, identification of nucleotide variations, generation of haplotypes, Estimation of
population indices, such as, gene diversity, nucleotide diversity and pairwise nucleotide
differences, calculation of within breed and among breed differences through AMOVA,
determination of demography, determination of population expansion, estimation of phylogenetic
relationship among different breeds of a species, identification of ancestral and descendent
haplotypes and estimation of coalescence age using estimator (rho) for time divergence by using
various software programmes.
Comparative analysis of data should be performed essentially to define the new breeds and
assess the population structure at regular intervals in order to take necessary steps for the
prioritization of conservation.
Other Markers
Y-chromosomal markers
Y-chromosomal variation is a powerful tool with which to trace gene flow by male introgression. It
is the most powerful marker in human population genetics and is used more and more in domestic
animal species.
Single-nucleotide polymorphisms (SNP)
As the name indicates, a SNP is a DNA sequence variation that occurs through a change in the
nucleotide at a single location within the genome of a species or breed. SNP usually have only two
alleles. Generally, SNP can occur throughout the genome and may represent either neutral or
functional genetic diversity.
Copy number variations (CNV)
Genetic studies of the human genome indicate the presence of variation in copy number of certain
chromosomal segments, as well as a relationship between copy number and phenotypic variation.
It is anticipated that this category of genetic variation will also prove to be relevant for studying the
diversity of livestock.
Genome sequencing
Next-generation genomic technologies, several of which have already passed the proof-ofprinciple stage, will expand further the scope of molecular studies and likely allow in the near
future the affordable whole-genome sequencing of individual animals. Predictably, this will open
new avenues of research that lead to new insights into diversity and the estimation of conservation
values. Most notably, dense genetic maps allow the demarcation of footprints or signatures of
selection, while the growing amount of knowledge on genotypephenotype relationships will also
reveal novel aspects of functional diversity. Clearly, this will require new software and hardware
for extracting and storing meaningful information for the huge amount of DNA sequence.
44
associated with traits of interest are relevant for breed utilization and conservation for future
production needs.
Conservation Genetics for Selection of Breeds
Large numbers of indigenous breeds are in danger of extinction and hence needs conservation, but
the available resources are limited. Making a decision about which breeds should be targeted for
conservation is challenging. A conservation strategy of animal genetic resources has to be directed
towards maintenance of maximum genetic diversity in the global gene pool (Maijala and Kolstad
1992, Barker 1994) while maintaining within breed diversity to reduce inbreeding and preserve
genetically differentiated groups. Diversity is generally measured as 1-f [f = average kinship or co
ancestry in a sub population] or 1- F [F = average inbreeding in a sub population]. In order to
manage genetic diversity it is best to minimize the average kinship in a population.
Maximization of genetic diversity for the next generation is achieved by minimizing the average
relatedness of the parents. The principle of minimizing relatedness not only applies to the choice of
parents for producing the next generation in breeding programs, but also to the choice of candidate
breeds for conservation. Eding and Meuwissen (2001) worked out the principles to estimate
average relatedness between different breeds based on microsatellite markers and to determine the
optimal contributions of different breeds to a gene pool. Eding et al. (2002) also quantified the
contribution of each breed to the maximum amount of genetic diversity and to identify important
breeds for the conservation of genetic diversity in Northern European cattle breeds. In contrast to
kinship approach, Weitzmans methods for selecting breeds for conservation give higher priority
for inbred population. However, both methods rank the breeds as per the diversity content.
Piyasatian and Kinghorn (2003) developed a method to balance genetic diversity, genetic merit and
population variability in the establishment of conservation programmes. The method gives
appropriate balance to the three issues: diversity, merit and variability. To sum up, conservation
programmes should be based on wider information; in particular it should be based on a relatively
large set of genetic marker loci by targeting good coverage of the genome. Further, other indicators
of genetic variability and genetic uniqueness should also be included. There is a need to complete
the work of detailed genetic characterization and select the breeds for conservation by pooling the
marker data with available information on degree of endangerment, traits of economical and
ecological values, specific adaptive features, presence of unique genes/phenotypes and cultural
and historical values. The decision for conservation of breeds can be taken on the basis of
independent selection principle i.e. a breed can be conserved if it reaches a maximum value for at
least one of the above mentioned criteria.
Conclusion
Genetic variability is a major concern to define any livestock breed and to preserve the maximum
amount of genetic diversity.Generally, by genetic characterization, we assess the genetic
constitution of a breed or population of a species. It assesses the genetic uniformity, admixture or
subdivisions, inbreeding, or introgression in the population. It is also helpful in providing insights
into breed formation, informing about closest wild ancestral species and localization of the site of
domestication. Phylogenetic relationships of populations based on genetic analysis unravel the
evolutionary history of the breeds or populations. Therefore, it is important to characterize different
breeds so that we can know how unique or different a breed is from other native populations. The
genetic characterization is a further step to answer questions on taxonomy, evolution,
domestication processes, management of genetic resources and setting conservation plans for their
effective utilization. Through this, we can prioritize the breeds for conservation using molecular
data and monitor its status in the defined geographical region.
46
Reference
Acharya R M and Bhat P N. 1984.Livestock and poultry genetic resources in India.IVRI Research Bulletin
No.1. Indian Veterinary Research Institute, Izatnagar.
Barker J S F. 1994.Animal breeding and conservation genetics. In Loeschcke V., Tomiuk J., and Jain, S.K. Eds.,
Conservation Genetics, BrikhauserVerlag, Basel, Switzerland 381.
Eding H, Crooijmans P M A, Groenne, M A M and Meuwissen T H E. 2002. Assessing the contribution of
breeds to genetic diversity in conservation schemes. Genetic Selection Evolution 34: 613.
Eding J H and Meuwissen T H E. 2001. Marker based estimates of between and within population kinships
for the conservation of genetic diversity. Journal of Animal Breeding Genetics 118:141.
FAO.2007. The State of the Worlds Animal Genetic Resources for Food and Agriculture, edited by Barbara
RischkowskyandDafydd Pilling. Rome.
Hall S J and Bradley G. 1995. Conserving livestock breed diversity. Tree 10: 267.
Maijala K and Kolstad N. 1992. Gene banks for livestock conservation. In Sandlund O.T., Hindar K., and
Brown A.H.D. Eds., Conservation of Biodiversity for Sustainable Development. Scandinavian University
Press, Oslo. pp. 230.
McKay S D, Schnabel R D, Murdoch B M, Mutukumalli L K, Aerts J, Coppieters W, Crews D, Neto E D, Gill C
A, Gao C, Mannen H, Wang Z, vanTassel C P, Williams J L, Taylor J F and Moore S S. 2008. An assessment
of population structure in eight breeds of cattle using a whole genome SNP panel. BMC Genetics 9: 37.
Muir W M, Wong G K, Zhang Y, Wang J, Groenen M A M, Crooijmans R P M A, HendrikJan M, Zhang H,
Okimoto R, Vereijken A, Jungerius A, Albers G A A, Lawley C T, Delany M E, MacEachern S and
Cheng H H. 2008. Genomewide assessment of worldwide chicken SNP genetic diversity indicates
significant absence of rare alleles in commercial breeds. Proceedings of the National Academy of Sciences
USA 105: 17312.
Nivsarkar, A E, Vij P K and Tantia M S. 2000. Animal Genetic Resources of India Cattle and Buffalo.pp. 5054
and 135139.Directorate of Information and Publications of Agriculture, Indian Council of Agricultural
Research, New Delhi.
Piyasatian N and Kinghorn B P. 2003. Balancing genetic diversity, genetic merit and population viability in
conservation programmes. Journal of Animal Breeding Genetics 120: 137.
47
8
Microsatellite Markers for Genetic Diversity Analyses of Farm Animals
Reena Arora
ICAR-National Bureau of Animal Genetic Resources, Karnal (Haryana)
________________________________________________________________________________________
Genetic diversity plays a very important role in survival and adaptability of a species. It is required
to meet current production needsin various environments, to allow sustained genetic improvement,
and tofacilitate rapid adaptation to changing breeding objectives (Notter, 1999). A major drawback
in formulation and implementation of conservation, breeding and management policies for Indian
livestock breeds is the lack of information regarding their current genetic status. Over the past
decade microsatellite markers have proven to be useful in genetic diversity studies in several
livestock species (Acosta et al 2013;Arora et al., 2011; Kumar et al 2006). The awareness of the
importance of this diversity at the phenotypic level has led to the assessment of diversity at the
genetic level as well. Genetic characterization enables the prioritization of breeds for conservation.
The amount of genetic divergence between populations is regarded as a major criterion for deciding
their uniqueness and therefore prioritizing their conservation (Eding et al., 2002).
Microsatellite markers have been liberally used globally to measure genetic diversity, gene flow,
migration and effective population size in livestock breeds (Kantanen et al., 2000; Peter et al., 2007,
Cinkulev et al., 2008). A plethora of information has been generated on the genetic characterization
of Indian livestock during the last decade, using neutral microsatellite markers. Coancestry and
kinship between breeds has also been determined through the use of microsatellite markers. Past
genetic bottlenecks have been detected in several livestock breeds using microsatellites. These
markers are also being used for assigning individuals to the population of origin as well as for
parentage verification. They are the best suited markers for differentiating closely related breeds.
Microsatellites- Markers of choice
Inspite of the growing competition from new genotyping and sequencing techniques, the
microsatellite markers are still regarded as the most powerful DNA tools for genetic analysis owing
to their several unique characteristics and are globally being exploited to establish genetic profiles
of animal genetic resources. Since their discovery, microsatellites have been used in mapping
programmes and by population biologists for studies of population genetic structure and kinship
investigations.
Microsatellites have been recommended by FAO as first priority molecular tools for the
Measurement of Domestic Animal Diversity (MoDAD).The term microsatellite was introduced by
Litt and Luty (1989) to characterize the simple sequence motifs repeated in tandem, one to six
nucleotides (mono, di, tri, tetra, penta and, hexanucleotides tandem repeats) in length. For example,
mono nucleotides , AAAAAAAAAAA would be referred to as (A) 11
di nucleotides, GTGTGTGTGTGT would be referred to as (GT) 6
tri nucleotides, CTGCTGCTGCTG would be referred to as (CTG) 4
tetra nucleotides, ACTCACTCACTCACTC would be referred to as (ACTC) 4
Microsatellites are also known as simple sequence repeats (SSR), short tandem repeats (STR) and
sequence tagged microsatellite repeats (STMR). They occur at a frequency of one SSR per 10Kb
DNA and numbering to a total of 50,000 - 100,000 in the mammalian genome. They are found in a
wide variety of eukaryotes including plants. Microsatellites occur very frequently and randomly
in most eukaryotic DNA. Human genomic DNA contains on an average one microsatellites every
6 bp (Beckman and Weber, 1992).
48
Major advantages of these highly polymorphic microsatellites are their locus specificity,
abundance and random distribution over the genome, co-dominant inheritance, ease and speed of
their application, and suitably for automated analysis. An advisory group of the International
Society for Animal Genetics (ISAG) in collaboration with FAO, has established, for each species of
interest, a set of microsatellite markers to be used as the standard set for the calculation of genetic
distances. Adherence to such recommendations allows for reasonable comparison of parallel or
overlapping studies and helps combine results in meta-analyses. To attain a certain precision for
different levels of resolution or discrimination among breeds, it is recommended to sample at least
25 animals per breed (mainly blood samples, hair or tissues may be taken) and investigate 25
microsatellite loci with 4-10 alleles per locus. The primer sequences and map position of each of
these markers can be obtained from Domestic Animal Diversity Information System (DAD-ISMoDAD) and are also available at site http://dad.fao.org/dad-is/data/molecula/index.html.
Isolation of microsatellite markers
Tandem repeat sequences (microsatellites) are first detected from the entire genome and their
unique flanking sequences are used to develop primers for amplification of the specific
microsatellites by PCR. Broadly, two strategies are used for the isolation of microsatellite markers.
(A) Cosmid derived microsatellite markers
In this strategy, the genomic DNA, after digestion with restriction enzymes is cloned into suitable
vectors mostly cosmids, thus forming a cosmid genomic library. The cosmids are then screened
with a labelled (CA) n or (GT) n polynucleotide probe. The clones that hybridize to the probes are
detected by autoradiography. The positive clones are isolated and the insert (microsatellite) which
they harbour is sequenced and characterized. Appropriate primers are designed from the flanking
regions.
(B) Microdissected chromosome derived microsatellite markers
A chromosome spread is obtained from a blood culture in this methodology and the chromosome
of interest is identified under a microscope.
This chromosome is dissected using a
micromanipulator. The microdissected chromosomal fragments are then used to construct genomic
DNA library which is screened with radiolabelled (CA) n or (GT) n probes. Positive clones are
isolated and subjected to PCR amplification. The PCR products are sequenced and the sequences
checked for uniqueness to develop PCR primers. A modification of this method involves
amplification of the microdissected chromosomal fragments by PCR using degenerate
oligonucleotide primers. To the amplified products biotinylated (CA) n probes are added. After
denaturation and annealing the annealed DNA is added to streptavidin paramagnetic particles and
incubated to capture DNA fragments hybridized to biotinylated (CA) n probes. The bound DNA is
eluted and amplified using appropriate primers. The amplified products are purified and
sequenced to be used as markers.
Evolution of microsatellites
It is believed that when DNA is being replicated, errors occur in the process and extra sets of these
repeated sequences are added to the strand. Although a clear understanding of the origin and
evolution of microsatellites is still not available, the number of repeats increases or decreases by a
single repeat unit, though sometimes more. Simple repeats are considered to be generated mostly
by slipped strand mispairing (Moxon and Wills, 1999) or by insertions or substitutions (Zhu et al.
2000).
Slipped strand mispairing
In this process, the number of microsatellite repeats increases or decreases during DNA replication.
An increase in the number of microsatellite repeats occurs when slippage occurs on the newly
49
synthesized strand in its binding to the template strand and the DNA polymerase adds the
nucleotides to fill in the gap, thereby increasing the strand by one repeat. Decrease in the repeat
number occurs when the old or template strand slips resulting in the repair enzymes deleting a
repeat. DNA polymerase has a very high rate of slippage or templates containing simple repeats in
vivo, but most of these errors are corrected by cellular mismatch repair systems. But the instability
of simple repeats observed for some human diseases may be a consequence of either an increased
rate of DNA polymerase slippage or a decreased efficiency of mismatch repair (Strand et al. 1993).
Insertions and substitutions
Slippage of DNA polymerase depends on mispairing of tandem repeats during DNA replication, so
it may not occur when there are few tandem repeats. Studies of slippage mutations show that they
are more common in loci with longer repeats. Loci with fewer than five repeats are rarely
polymorphic. If they incur few mutations which increase the number of repeats, their
polymorphism levels increase. For slippage to occur on longer repeats some mechanism other than
slippage must occur on shorter repeats from which the longer repeats evolved. Microsatellite
sequences are exceptionally vulnerable to spontaneous insertion or deletion mutations and nontriplet microsatellites when located in coding sequences are expected to introduce frameshift
mutations at high frequency. Substitutions are much more common than insertions and they are
the dominant source of new two-repeat loci. Microsatellites have been estimated to mutate at the
rate of 103 to 105 mutations per gamete.
Theoretical Models of Microsatellite Mutations
Theoretical mutation models have been derived to explain the evolutionary processes of
microsatellites from which genetic distances and population differentiation are estimated. The
Infinite Allele Model (IAM) was given by Kimura and Crow (1964), according to this model a
mutation can involve any number of tandem repeats and always results in a new allele state not
previously existing in the population. But this model does not confer with the slipped strand
mispairing mechanism responsible for microsatellite length variation. This mechanism leads to
small changes in the repeat numbers and alleles may mutate towards allele states that are already
present in the population. In order to explain the discrepancies in the mutational processes, the
Step-wise Mutation Model (SMM) was introduced in the 1970s. The model assumes that the entire
sequence of allelic states can be expressed as integers and mutation results in a change in one repeat
unit either by insertion or deletion (Kimura and Ohta, 1978). In addition to this model, DiRienzo et
al. (1994) described the Two Phase Model (TPM), where a limited proportion of mutations involve
several repeats.
Limitations
Null Alleles
Failure of amplification of some alleles due to mutations in the binding regions results in reduction
or loss of PCR products. These are termed as null alleles and may lead to serious underestimation
of heterozygosity. In a heterozygote of two different alleles, if one allele fails to amplify due to
primer annealing difficulties then the phenotype will appear as a single banded homozygote. This
problem may be overcome by designing new primers but it is a very tedious task.
Slippage
This problem is due to the activity of the Taq polymerase used in the PCR. During PCR
amplification, the thermo-polymerase tends to slip leading to production of differently sized
products. These products are less intense and are also referred to as shadow bands. Further, the
Taq polymerase has a tendency to add an additional ATP at the 3end of the amplified PCR
products. This can also lead to difficulties in scoring bands.
50
Homoplasy
Homoplasy can be defined as the co-occurrence of alleles that are identical by descent. If two
alleles are inherited without any mutation from the same ancestral allele they are identical by
descent. But two alleles may have the same structure and even the same sequence but may not
have been inherited from the same ancestral allele. Such alleles are identical in state (Jarne and
Lagoda, 1996).
Applications of Microsatellite Markers
Biodiversity Analysis:
By analyzing the microsatellite profiles for each individual across different loci inferences can be
made about overall magnitude of genetic diversity within breeds. The priority breeds for
conservation should be the ones with the largest within breed diversity. Microsatellites are most
suitable to determine the relationships, expressed as genetic distances among breeds, possible
levels of inbreeding in each breed, gene flow in livestock populations, most diverse and distinctive
i.e. genetically unique breeds/populations for higher priority in conservation programmes and
relative contribution of each breed to the total (species) genetic diversity. These markers have been
successfully used for differentiation of closely related breeds and assignment of individuals to
specific breeds.
Breed demarcation and phylogenetic studies:
Microsatellite markers have been successfully used to determine the genetic variation between
breeds. Characterization of breeds is necessary for the development of conservation programmes,
to determine which breeds should be conserved. Pihkanen et al. (1996) used microsatellite markers
to estimate dog breed differentiation. Microsatellite have been successfully used to assess the
genetic variation between various cattle breeds (Moazami-Goudarzi et al., 1997; Martin-Burriel et al.,
1999; Schmid et al., 1999; Kantanen et al., 2000). The relationship among breeds of species other than
cattle have also been estimated, viz., Goats (Saitbekova et al., 1999), horse (Bjornstad et al., 2000) and
donkey (Jordana et al., 2001). Microsatellite loci have been successfully used to reconstruct
phylogenetic relationships among populations. Ritz et al. (2000) determined the phylogenetic
relationships in the tribe Bovini using 20 microsatellite markers. Takezaki and Nei (1996) have
suggested that microsatellite DNA is very useful for clarifying the evolutionary relationship of
closely related populations.
Parentage testing:
Microsatellite typing can be used as a tool for identity or paternity testing by detection of
hypervariable sequences. Identity testing and parentage determination are useful in artificial
insemination and progeny testing programmes and also in paternity related disputes. DNA
analysis allows a far greater accuracy of parent identification through comparison of microsatellite
sequences of an individual and its candidate parents. A DNA-based technique can be used to
identify parentage in situations with multiple sire matings. In addition, these molecular markers
also serve as a useful tool for animal identification, particularly for verification of the semen used
for artificial insemination. ISAG has recommended panels of microsatellite markers for parentage
verification in horse, dog, cattle, sheep, goat and pig.
(www.isag.us/Docs/consignmentforms/02_PVpanels_LPCGH.doc).
Population Bottleneck:
A population bottleneck is a drastic reduction in the size of a population that may be caused by
natural calamities, habitat destruction or endemic disease. The decrease in population number
directly impacts the genetic diversity which also decreases. When populations are under strong
natural selection or artificial selection, only a subset of individuals in the population will reproduce
51
therefore relatively few individuals contribute alleles to subsequent generations. Alleles for generegions that are not under selection are present in the post-selection population as a random subset
of the original allelic diversity. The probability of an allele being present in subsequent generations
is equivalent to its frequency in the original population therefore high frequency alleles have a
greater probability of being present in the post-selection population than low frequency alleles
(Luikart et al., 1998a,b). If selection pressure lasts for many generations, rare alleles will be lost
simply by chance resulting in a post-selection population with fewer alleles and lower
heterozygosity than the original population. Unfortunately, it is often difficult to identify losses of
variability because levels of genetic variability prior to a population decline are generally unknown
(Spencer et al., 2000) A number of statistical methods now make it possible to investigate a
population's history without the need for information on past population sizes (Spencer et al., 2000).
These tests typically quantify deviations from expected patterns in allele sizes, allele numbers,
heterozygosity levels, or allele distributions, often using microsatellites data as these molecular
markers are important modern tools for estimating the level of genetic diversity in endangered
populations (OBrien 1994).
Identification of disease carrier:
Many incurable diseases result from defects in genomes. DNA polymorphism occurring within a
gene helps to understand the molecular mechanism and genetic control of several genes and
metabolic disorders and allows the identification of heterozygous carrier animals. Identification of
carrier animals of weaver disease (progressive degenerative myeloencephalopathy) in cattle has
been accomplished using TGLA116 microsatellite marker. Georges et al. (1993) performed an
extensive linkage study in a bovine pedigree segregating for the weaver condition and identified a
microsatellite locus closely linked to the weaver gene and by extension; the weaver locus was
assigned to bovine synteny group 13. Microsatellite TGLA116 can be used to identify weaver
carriers, to select against this genetic defect.
Mapping of QTL:
The most important application of microsatellites includes mapping of QTL by linkage. Such
mapping information if available for genes of economic importance can be used in breeding
programmes of either within breeds manipulations like marker assisted selection of young sires or
between breeds introgression programmes. Microsatellites have been adopted widely for use in
heritage mapping studies of the farm animals to the point that they are now the favored
polymorphic marker for this purpose. Microsatellite marker D21S4 has shown significant
association with effects on milk and protein yields in cattle. The presence of QTL for milk
production on five chromosomes (namely chromosome no. 1, 6, 9, 10 and 20) has also been
demonstrated in 14 US Holstein half-sib families using 159 microsatellites. Significant association of
microsatellite markers with somatic cell score (SCS, an indicator for susceptibility to mastitis),
productive herd life and milk production traits has also been established. Potential QTL for SCS, fat
yield, fat percentage, and protein percentage have also been identified using microsatellite (Ron et
al.1994; Ashwell et al.1997). Characterization of QTL for economically important traits using
microsatellite markers will help in formulating more efficient breeding programmes using MAS.
The map would also help in identification, isolation and manipulation of animals with
predetermined phenotype by modifying the candidate genes.
Conclusion
SNP markers are gradually replacing microsatellite markers for diversity analysis within species.
However, SNPs are not without limitations of ascertainment bias (Schlotterer 2004). In addition,
there are limitation with existing genetic programs and computer applications to be able to process
the huge amounts of data generated in genome wide SNP studies (Decker et al, 2009). Further,
52
diversity analyses using SNPs/microarrays involves high costs. Therefore, despite some limitations
of sampling methods, number of markers used and type of analyses, microsatellite based studies
remain viable for analysis of biodiversity, potential conservation and sustainable utilization of
livestock genetic resources particularly the indigenous breeds. Although the information generated
from microsatellite data facilitates the outlining of genetic management and conservation programs
for livestock breeds/populations, additional information on population trends, economic
importance and specific adaptive features needs to be taken into consideration.
References
AcostaA.C, Uffo O, Sanz A, Ronda R, Osta R, Rodellar C, Martin-Burriel I and Zaragoza P. 2013. Genetic
diversity and differentiation of five Cuban cattle breeds using 30 microsatellite loci.Journal of Animal
Breeding and Genetics.130: 79.
Arora R, Bhatia S, Mishra B. P. and Joshi B.K. 2011. Population structure in Indian sheep ascertained using
microsatellite information. Anim. Genet. 42: 242.
Ashwell M.S, Rexroad Jr C. E, Miller R.H, VanRaden P.M and Da Y. 1997. Detection of loci affecting milk
production and health traits in an elite US Holstein population using microsatellite markers. Animal
Genetics. 28: 216.
Beckmann J.S. and Weber J.L. 1992. Survey of human and rat microsatellites. Genomics 12, 627-631.
Cinkulov M, Popovski Z, Porcu K, Tanaskoovska B., Hodzic A, Bytyqi H, Mehmeti H, Margeta V, Djedovic R,
Hoda A, Trailovic R, Brka M, Markovic B, Vazic B, Vegara M, Olasker I. and Kantanen J. (2008). Genetic
diversity and structure of the West Balkan Pramenka sheep types as revealed by microsatellite and
mitochondrial DNA analysis. Journal of Animal Breeding and Genetics. 125, 417-426.
Decker J.E, Pires J.C, Conant G.C. et al.2009. Resolving the evolution of extant and extinct ruminants with
high-throughput phylogenomics. Proc. Natl. Acad. Sci. USA. 106: 18644.
DiRienzo A, Peterson A.C, Garza J.C, Valdes A.M, Slatkin M. and Frieimer N.B. 1994. Mutational process of
simple sequence repeat loci in human populations. Proc. Natl. Acad. Sci., USA. 91: 3166.
Eding H, Crooijmans R.P.M.A, Groenen M.A.M. and Meuwissen T.H.E. (2002) Assessing the contribution of
breeds to genetic diversity in conservation schemes. Genetics Selection and Evolution. 34, 613-633.
Georges M, Dietz A.B, Mishra A, Nielsen D, Sartgeant L.S, Sorensen A, Steele M.R, Zhaho X, Leipold H,
Womack J.E and Lathrop M. 1993. Microsatellite mapping of the gene causing weaver disease in cattle
will allow the study of an associated quantitative trait locus. Proceedings National Academy of
Sciences, USA. 90: 1058.
Jarne P. and Lagoda P.J.L. 1996. Microsatellites from molecules to populations and back. TREE, 11, 424-429.
Kantanen J, Olsaker I. and Holm L.E. 2000. Genetic diversity and population structure of 20 North European
cattle breeds. Journal of Heredity.91:446.
Kimura M. and Crow J. F. (1964). The number of alleles that can be maintained in a finite population.
Genetics. 49, 725-738.
Kimura M. and Ohta T. 1978. Stepwise mutation model and distribution of allelic frequencies in a finite
population. Proc. Natl. Acad. Sci., USA. 75: 2868.
Kumar S, Gupta J, Kumar N, DikshitK, Navani N, Jain P and Nagarajan M. 2006. Genetic variation and
relationships among eight Indian riverine buffalo breeds.Molecular Ecology. 15: 593.
Litt M and Luty J.A. 1989. A hypervariable microsatellite revealed by in-vitro amplication of a dinucleotide
repeat within the cardiac muscle actin gene. American Journal of Human Genetics. 44: 397.
Luikart G, Allendorf F.W, Cornuet J. M, and Sherwin W.B. 1998a. Distortion of allele frequency distributions
provides a test for recent population bottleneck. J.Hered. 89: 238.
Luikart G, Sherwin W.B, Steele B.M. and Allendorf F.W. 1998b. Usefulness of molecular markers for detecting
population bottlenecks via monitoring genetic change. Mol. Ecol. 7: 963.
Moxon E.R. and Wills C. 1999. DNA microsatellites: agents of evolution? Sci. Am. 94.
Notter D.R. 1999. The importance of genetic diversity in livestock populations of the future. J Anim Sci. 77: 61.
OBrien, S.J. 1994. A role for molecular genetics in biological conservation. Proc. Natl. Acad. Sci. USA.91: 5748.
Peter C, Bruford M, Perez T, Dalamitra S, Hewitt G, Erhardt G. and the ECONOGENE Consortium. (2007)
Genetic diversity and subdivision of 57 European and Middle-Eastern sheep breeds. Animal Genetics.
38, 37-44.
53
Ron M, Band M, Yanai A and Weller J.I 1994.Mapping quantitative trait loci with DNA microsatellites in a
commercial dairy cattle population. Animal Genetics. 25: 259.
Schlotterer C. 2004. The evolution of molecular markers just a matter of fashion? Nature Reviews Genetics.
5, 639.
Spencer C.C, Neigel J.E. and Leberg P.L. 2000. Experimental evaluation of the usefulness of microsatellite
DNA for detecting bottlenecks. Mol. Ecol.9: 1517.
Strand M, Prolla T.A, Liskay R.M. and Petes T.D. 1993. Destabilization of tracts of simple repetitive DNA in
yeast by mutations affecting DNA mismatch repair. Nature. 365, 274-276
Zhu Y, Strassman J.E. and Queller D.C. 2000. Insertions, substitutions and the origin of microsatellites. Genet
Res. 76: 227.
54
9
Mitochondrial DNA as a Marker for Genetic Diversity and Evolution in
Farm AnGR
Monika Sodhi, Amit Kishore and Manishi Mukesh
ICAR- National Bureau of Animal Genetic Resources, Karnal, Haryana
________________________________________________________________________________________
Livestock breeds have been formed through human and natural selection since the beginning of
domestication thousands of years ago so as to best fit the environmental condition and human
needs. Detailing the evolutionary and demographic history of domesticated animals has always
been a focus of research. The genetic diversity has been exploited in livestock species to identify
new traits developed in response to changes in environment, diseases or market conditions
(Erhardt and Weimann, 2007). Also the evolutionary potential of a species depends mainly on the
genetic variation of their populations, which is the consequence of a balance between evolutionary
and demographic processes generating either heterogeneity or homogeneity among local
populations. Understanding the evolutionary relationships among livestock breeds can reveal the
origin of animal husbandry, distinction between wild and domesticated forms of a species and
elucidation of the events surrounding bovine prehistory made from archeological and
anthropological data (Loftus et al., 1994a).
Recent developments in molecular genetics have provided new powerful tools, called molecular
markers, to assess the evolutionary and demographic history of livestock species, domestication
events and geographic distribution of their diversity (Hanotte and Jianlin, 2006). DNA based
marker methods are commonly used in ecological, evolutionary, and genetic approaches to analyze
efficiently genetic structure in both animal and plant species (Tarnita et al., 2009). These markers
have helped in identification of the wild ancestors of modern livestock and the nature of livestock
expansion in past millennia. Such Information tells us about history and the way in which
extraordinary biological diversity has been shaped in a relatively short period of time. With
development of molecular technologies, DNA-based polymorphisms became the markers of choice
for molecular-based survey of genetic variation (Hanotte and Jianlin, 2006). Different genetic
markers provide different levels of genetic diversity information among which Mitochondrial DNA
(mtDNA) sequences are the markers of choice for significant insights into the domestication and
past migration history of livestock species. Use of mtDNA has broadened the perspective on the
origin and evolution of domesticated cattle (Maji et al., 2009). Further, one of the persistent
challenges in the analysis of population genetic data is to account for the spatial arrangement
(nonrandom distribution of genetic variation among individuals within populations) of samples
and populations. mtDNA data have been extensively used to understand the spatial distribution of
genetic lineages within species allowing the historical factor with the highest effect on the lineages
spatial patterns. mtDNA has been used for the identification of maternal and paternal lineages
(Erhardt and Weimann, 2007) as well as test hypothesis related to past genetic history and
evolution of different species (Hebsgaard et al. 2007). mtDNA can also tell us about the recent
demographic processes affecting a population, for example whether a population has undergone a
recent demographic expansion, or has a more complex history. The recognition of mitochondrial
DNA molecule as a genetic marker in population and evolutionary biology derives in part from the
relative ease with which clearly homologous sequences can be isolated and compared. Simple
sequence organization, maternal inheritance and absence of recombination make mtDNA an ideal
marker for tracing maternal genealogies.
55
56
Table 1.Size of mitochondrial genome and the control region in livestock animals
Size (bp)
16,338
16,339
16,355
GenBank Acc.
V00654
NC_005971
AY488491
First report
Anderson et al., 1982
Hiendleder et al., 2008
Parma et al., 2004
Sheep
Organism
B.taurus
B. indicus
Bubalus
bubalis
Ovis aries
16,616
934*
Goat
Pig
Horse
Capra hircus
Sus scrofa
Eqqus cabllus
16,640
16,613
16,660
AF010406,
AF010407
AF034253
X79547
1176
960*
Buffalo
observed upon electron microscopy, which is termed as displacement or D-loop. The peripheral Dloop domains containing main regulatory elements evolved rapidly in a species-specific manner,
generating heterogeneity in length and base composition. Structurally, 3 domains are present in
mtDNA viz., left and right peripheral domain and central domain. Because of the peculiar evolution
of both left and right domains, they cannot correctly estimate the genetic distances between
mammalian species. On the other hand, the central domain is highly conserved during evolution
behave as a good molecular clock which gives reliable estimates of the times of divergence between
closely and distantly related species. The D-loop analysis had been approached through
polymerase chain reaction (PCR) based techniques viz., PCR-RFLP (restriction fragment length
polymorphism) and PCR-SSCP (single strand conformation polymorphism) or by direct sequencing
of the D loop. Earlier reports are based on PCR-RFLP, but presently on direct sequencing of
mtDNA/D loop is being used in most of the studies.
Restriction mapping of mitochondrial genome for population structuring
Restrictions map i.e. the physical mapping of the relative position of cleavage sites of one or more
restriction endonucleases in mtDNA have been established for most of the livestock animal species
(Hecht et al., 1990). Alignment of the physical map with specified genes encoded by mtDNA is also
available for several species. Anderson et al.,(1982) presented a complete mitochondrial gene map
of bovine mtDNA derived from complete sequence of this genome. Several reports are available on
the physical map of mtDNA in livestock animal species viz. cattle (Loftus et al., 1994a), sheep and
riverine buffalo (Mishra et al., 2009). Structural analysis of mtDNA by restriction endonuclease
digestion and agarose gel electrophoresis has proven useful in assessment of genetic relatedness in
systematic and population genetic studies. Various workers have used mtDNA- RFLP extensively
to study intra- and inter-species comparison. Polymorphism data have been used successfully to
characterize population/breed structure. This type of data also helped in studying the evolution of
a species.
Watanabe et al.,(1985) analysed mtDNA restriction pattern from cattle and pigs of Asian and
European descent and observed clear distinction between the two groups. Bhat et al.,(1990) studied
mtDNA polymorphism in Holstein (Bos taurus) and Haryana (Bos indicus) cattle breeds using 13
restriction endonucleases using 6 enzymes viz., AvaII, BamHI, BglII, HindIII, HpaI and PstI. The two
Holstein differed at 6 sites, where as the Haryana breed did not show any site polymorphism. The
authors observed different mitotype for Holstein as reported by Anderson et al. (1982) and
Watanabe et al. (1985). The Haryana breed did not showed any polymorphic site which can be
understood in the light of the history of this breed, where gene migration has been very limited. On
the basis of existence of polymorphism in the Holstein breed which is known to be genetically
diverse, and no polymorphism in zebu cattle, it was suggested that mtDNA polymorphism might
characterize the two breeds of cattle. Loftus et al., (1994a) analyzed mtDNA from 13 different cattle
breeds of European, African and Asian origin were analysed to determine the phylogenetic
relationship and level of variation among breeds. The presence of 26 different mitotypes described
by 20 polymorphisms indicated two major lineages as Afro-European and Asian types. None of the
mitotypes found in the Asian lineage was detectable in the Afro-European lineage or vice-versa.
Further, the grouping of all African indicine population within the clade containing all Bos taurus
lineage pointed towards hybrid origins of the humped cattle of African continent and their
distinction from Bos indicus population.
Origin and phylogeography of cattle based on mitochondrial DNA
Livestock breeds have been developed through centuries of natural and human selection to fit
different environmental conditions and human needs. The domestication of cattle was an important
step in human history leading to modification of diet and socio-economic structure of several
58
populations (Beja-Pereira et al., 2006). The process of cattle domestication (B. taurus and B. indicus)
was probably started approximately 11,000 years ago and the breeds and strains have been
morphologically differentiated primarily by the absence (Bos taurus) or presence (Bos indicus) of a
hump. The domesticated cattle breeds have been derived from wild aurochs (B. primigenius) as their
ancestor and the origin of domestication is believed to be in Southeast Asia (Anatolia/ the Fertile
Crescent and its Eastern margin, towards the Indus valley region). The wild aurochs (B. primigenius)
has 3 distinct subspecies: the European taurine cattle have B. p. primigenius in the near and Middle
East; African taurine cattle have B. p. opisthonomus in the northern Africa, while Asian zebuine cattle
have B. p. nomadicus in northern Indian subcontinent as their progenitor (Table 2). On the basis of
mtDNA haplotypic diversity (haplogroups), humpless B. taurus have been reported to be diverged
from humped B. indicus ~1.7-2.0 million years ago (Bradley et al., 1996; Hiendleder et al., 2008;
Loftus et al., 1994a; Loftus et al., 1994b). The clear genetic distinctness between taurine (B. taurus)
and zebu (B. indicus) cattle breeds points two distinct lineages indicating two major sites of
domestication (Fig 2a), one in Indian subcontinent and the other in the Near East, where zebu and
the taurine breeds would have emerged independently from their respective distinct aurochsen
groups. It has been hypothesized that all extant European breeds would have been descended from
cattle domesticated in the Near East and subsequently spread during the diffusion of herding and
farming lifestyles (Beja-Pereira et al., 2006)
Table 2. Origin and domestication of modern cattle breeds
Domestic species
Bos taurus taurus
Bos
indicus
taurus
Wild Ancestor
Aurochs
(3 subsps.) (extinct)
B. primigenius primegenius
MtDNA
clades
4
Domestication
Events (at least)
1
Time
B.P.
~8000
B. p. opisthonomous
B. p. nomadicus
2
2
1
1
~9500
~7000
Location
Near and Middle
East (West Asia)
Northeast Africa
Northern
Indian
subcontinent
Further, information based on mtDNA, microsatellite DNA and Y-chromosome DNA sequence
variation revealed distinctness among Indian and African zebu cattle (Anderung et al., 2007;
Bradley et al., 1998) (Figure 3). The African cattle breeds are taurine in origin with independent
domestication events in African subcontinent. It is estimated that at around 700 AD, the male
mediated introgressions and widespread of zebu cattle started resulting into the admixture
population. The African zebu admixture population with zebu alleles decreased from East to West
Africa and then followed a steep north-south gradient in West Africa. Thus African zebu seem to be
hybrids, with majority of their genome derived from Bos indicus introgression, but with maternally
inherited mtDNA variation that is representative of the original Bos taurus domesticates of that
continent. bovine mtDNA sequences follow the well established taxonomic distinction between the
two domestic cattle forms, namely the humpless variety of Europe, the Middle East and West
Africa (B. taurus or taurine) and the humped cattle of South Asia (B. indicus or zebu) (Magee et al.,
2007).
Distribution of mitochondrial DNA haplogroups in cattle breeds
The variation in mtDNA D-loop region has been extensively studied to determine the ancestral
haplogroups. Based on mitochondrial D-loop nucleotide sequence diversity from different
geographical locations, 5 maternal lineages with major taurine haplogroups (T and T1 to T4) in Bos
taurus and 2 major indicine haplogroups (I1 and I2) in Bos indicus (Chen et al., 2010; Jia et al.,
2010)have been defined and used as nomenclature for mtDNA haplotypes (Pellecchia et al., 2007).
The haplogroup T is defined by a transition at position 16,255 from the Anderson sequence
59
Figure 2: Phylogenetic complexities in modern day (a) cattle, (b) horses, (c) sheep and (d) goat
Bruford et al., 2003
61
For Indian sheep breeds, the study conducted by Pardeshi et al. (2007) suggested a common
origin for Deccani, Bannur and Garole breeds. Arora et al. (2013) analyzed mtDNA diversity among
19 Indian sheep breeds from 3 different agroecological regions. The lineage analysis reflected major
population with Type A haplotype which is in accordance with their Asian origin, while type B
haplotype was observed in only few animals (Chokla, Jaisalmeri, Kheri, Marwari and Nali) from
Northwestern region that might have resulted as crossbreeding with European breeds for
improvement of wool quality. Based on mitochondrial DNA sequence data, modern horse (Equus
caballus), has been grouped into ~17 mtDNA phylogenetic lineages indicating a complex and
numerous domestication steps (Figure 2b) . In pigs, mtDNA evidence supported multiple centre of
domestication (Larson et al., 2005). Summing up, mtDNA marker has been very useful in tracing the
ancestry, time of divergence and domestication events of various livestock species.
Challenges and future prospects
The characterization of genetic diversity within and between breeds, and the identification of the
geographical category of variation will allow region specific conservation measures to be put in
place. The identification of ancestral livestock populations could be important for sustainable
utilization and preservation of animal genetic resources and to meet the need and aspiration of
future generations. The use of mtDNA in diversity studies might be challenged by genetic
improvement programmes using exotic germplasm leading to genetic erosion in native breeds. In
such circumstances conservationist need to be cautious in selecting sample of animals for
investigation. Thus, in areas where there is gene flow from introduced breeds and the population
size of purebred is reduced, samples should be taken from areas which are relatively isolated by
geography maintaining a high chance of authentic genetic components. Secondly, samples from
areas where both male and female are bred to reproduce their own herd reducing genetic exchange
with the outside gene pool are ideal for analysis of diversity.
Although been studied widely and most resistant part of the genome to introgression, use of
mtDNA to infer the evolutionary and demographic past of both population and species has been
questioned in recent years. It has also been held that being smaller in size and an extra-nuclear
genetic marker, mtDNA may not always be sufficient to answer all the questions related to genetic
diversity of the species and not the organelle and being maternally inherited, it does not detect
male-mediated gene flow, which has a significant influence on the evolution of species in the
modern times (Bruford et al., 2003). Thus, for the holistic picture of diversity analysis to determine
the population historic events, the approach of mtDNA need to be supplemented with information
from other neutral markers such as autosomal microsatellite markers and Y-chromosome DNA
variations. Also, the combination of mtDNA and microsatellite will avoid inheritance bias since
they relay information on maternally and codominant inherited regions.
References
Achilli, A., Olivieri, A., Pellecchia, M., Uboldi, C., Colli, L., Al-Zahery, N., Accetturo, M., Pala, M., Hooshiar
Kashani, B., Perego, U.A., Battaglia, V., Fornarino, S., Kalamati, J., etal., 2008. Mitochondrial genomes of
extinct aurochs survive in domestic cattle. Curr Biol 18: R157.
Anderson, S., De Bruijn, M., Coulson, A., Eperon, I., Sanger, F., Young, I., 1982. Complete sequence of bovine
mitochondrial DNA conserved features of the mammalian mitochondrial genome. J Mol Biol 156, 683-717.
Arora, R., Yadav, H.S., Mishra, B.P., 2013. Mitochondrial DNA diversity in Indian sheep. Livestock Science
153: 50.
Beja-Pereira, A., Caramelli, D., Lalueza-Fox, C., Vernesi, C., Ferrand, N., Casoli, A., et al., 2006. The origin of
European cattle: Evidence from modern and ancient DNA. Proceedings of the National Academy of
Sciences 103: 8113.
Berggren, K., Ellegren, H., Hewitt, G., Seddon, J., 2005. Understanding the phylogeographic patterns of
European hedgehogs, Erinaceus concolor and E. europaeus using the MHC. Heredity 95: 84.
62
Bhat, P., Mishra, B., Bhat, P., 1990. Polymorphism of mitochondrial DNA (mtDNA) in cattle and buffaloes.
Biochemical genetics 28: 311.
Bradley, D.G., MacHugh, D.E., Cunningham, P., Loftus, R.T., 1996. Mitochondrial diversity and the origins of
African and European cattle. Proc Natl Acad Sci U S A 93: 5131.
Bruford, M.W., Bradley, D.G., Luikart, G., 2003. DNA markers reveal the complexity of livestock
domestication. Nat Rev Genet 4: 900.
Carvajal-Carmona, L.G., Bermudez, N., Olivera-Angel, M., Estrada, L., Ossa, J., Bedoya, G., Ruiz-Linares, A.,
2003. Abundant mtDNA diversity and ancestral admixture in Colombian criollo cattle (Bos taurus).
Genetics 165: 1457.
Chen, S., Lin, B.Z., Baig, M., Mitra, B., Lopes, R.J., Santos, A.M., Magee, D.A., Azevedo, M., Tarroso, P.,
Sasazaki, S., Ostrowski, S., Mahgoub, O., Chaudhuri, T.K., Zhang, Y.P., Costa, V., Royo, L.J., Goyache, F.,
Luikart, G., Boivin, N., Fuller, D.Q., Mannen, H., Bradley, D.G., Beja-Pereira, A., 2010. Zebu cattle are an
exclusive legacy of the South Asia neolithic. Mol Biol Evol 27: 1.
Cozzi, M.C., Strillacci, M.G., Valiati, P., Bighignoli, B., Cancedda, M., Zanotti, M., 2004. Mitochondrial D-loop
sequence variation among Italian horse breeds. Genet Sel Evol 36: 663.
Edwards, C.J., MacHugh, D.E., Dobney, K.M., Martin, L., Russell, N., Horwitz, L.K., McIntosh, S.K.,
MacDonald, K.C., Helmer, D., Tresset, A., Vigne, J.D., Bradley, D.G., 2004. Ancient DNA analysis of 101
cattle remains: limits and prospects. J Archaeol Sci 31: 695.
Erhardt, G., Weimann, C., 2007. Use of molecular markers for evaluation of genetic diversity and in animal
production. Archivos Latinoamericanos de Produccion Animal 15: 63.
Galtier, N., Nabholz, B., Glmin, S., Hurst, G., 2009. Mitochondrial DNA as a marker of molecular diversity: a
reappraisal. Mol Ecol 18: 4541.
Hanotte, O., Jianlin, H., 2006. Genetic characterization of livestock populations and its use in conservation
decision-making. The role of biotechnology in exploring and protecting agricultural genetic resources.
FAO, Rome, Italy, 89.
Hecht, W., Geldermann, H., Ellendorff, F., 1990. Studies on mitochondrial DNA in farm animals. Genome
analysis in domestic animals., 259.
Hiendleder, S., Lewalski, H., Janke, A., 2008. Complete mitochondrial genomes of Bos taurus and Bos indicus
provide new insights into intra-species variation, taxonomy and domestication. Cytogenetic and Genome
Research 120: 150.
Hiendleder, S., Lewalski, H., Wassmuth, R., Janke, A., 1998. The complete mitochondrial DNA sequence of
the domestic sheep (Ovis aries) and comparison with the other major ovine haplotype. J Mol Evol 47: 441.
Jia, S., Chen, H., Zhang, G., Wang, Z., Lei, C., Yao, R., Han, X., 2007. Genetic variation of mitochondrial Dloop region and evolution analysis in some Chinese cattle breeds. Journal of genetics and genomics = Yi
chuan xue bao 34, 510-518.
Jia, S.G., Zhou, Y., Lei, C.Z., Yao, R., Zhang, Z.Y., Fang, X.T., Chen, H., 2010. A new insight into cattle's
maternal origin in six Asian countries. J Genet Genomics 37, 173-180.
Joshi, M.B., Rout, P.K., Mandal, A.K., Tyler-Smith, C., Singh, L., Thangaraj, K., 2004. Phylogeography and
origin of Indian domestic goats. Mol Biol Evol 21, 454-462.
Kierstein, G., Vallinoto, M., Silva, A., Schneider, M.P., Iannuzzi, L., Brenig, B., 2004. Analysis of mitochondrial
D-loop region casts new light on domestic water buffalo (Bubalus bubalis) phylogeny. Mol Phylogenet
Evol 30: 308.
Kim, K.-I., Lee, J.-H., Lee, S.-S., Yang, Y.-H., 2003. Phylogenetic relationships of northeast Asian cattle to other
cattle populations determined using mitochondrial DNA D-loop sequence polymorphism. Biochemical
genetics 41: 91.
Kumar, S., Nagarajan, M., Sandhu, J.S., Kumar, N., Behl, V., 2007a. Phylogeography and domestication of
Indian river buffalo. BMC Evol Biol 7: 186.
Kumar, S., Nagarajan, M., Sandhu, J.S., Kumar, N., Behl, V., Nishanth, G., 2007b. Mitochondrial DNA
analyses of Indian water buffalo support a distinct genetic origin of river and swamp buffalo. Animal
Genetics 38: 227.
Lai, S.J., Liu, Y.P., Liu, Y.X., Li, X.W., Yao, Y.G., 2006. Genetic diversity and origin of Chinese cattle revealed
by mtDNA D-loop sequence variation. Mol Phylogenet Evol 38: 146.
63
Larson, G., Dobney, K., Albarella, U., Fang, M., Matisoo-Smith, E., Robins, J., Lowden, S., Finlayson, H.,
Brand, T., Willerslev, E., Rowley-Conwy, P., Andersson, L., Cooper, A., 2005. Worldwide phylogeography
of wild boar reveals multiple centers of pig domestication. Science 307: 1618.
Lavrov, D.V., 2007. Key transitions in animal evolution: a mitochondrial DNA perspective. Integr Comp Biol
47, 734-743.
Lin, C.S., Sun, Y.L., Liu, C.Y., Yang, P.C., Chang, L.C., Cheng, I.C., Mao, S.J.T., Huang, M.C., 1999. Complete
nucleotide sequence of pig (Sus scrofa) mitochondrial genome and dating evolutionary divergence within
Artiodactyla. Gene 236: 107.
Loftus, R.T., MacHug, D.E., Bradley, D.G., Sharp, P.M., Cunningham, P., 1994a. Evidence for two
independent domestications of cattle. Proc Natl Acad Sci U S A 91: 2757.
Loftus, R.T., MacHugh, D.E., Ngere, L.O., Balain, D.S., Badi, A.M., Bradley, D.G., Cunningham, E.P., 1994b.
Mitochondrial genetic variation in European, African and Indian cattle populations. Animal Genetics 25:
265.
MacHugh, D.E., Bradley, D.G., 2001. Livestock genetic origins: goats buck the trend. Proceedings of the
National Academy of Sciences 98: 5382.
Magee, D.A., Mannen, H., Bradley, D.G., 2007. Duality in Bos indicus mtDNA diversity: Support for
geographical complexity in zebu domestication. Vertebr Paleobiol Pa, 385.
Magee, D.A., Meghen, C., Harrison, S., Troy, C.S., Cymbron, T., Gaillard, C., Morrow, A., Maillard, J.C.,
Bradley, D.G., 2002. A partial african ancestry for the creole cattle populations of the Caribbean. J Hered
93, 429-432.
Maji, S., Krithika, S., Vasulu, T.S., 2009. Phylogeographic distribution of mitochondrial DNA
macrohaplogroup M in India. J Genet 88: 127.
Mannen, H., Kohno, M., Nagata, Y., Tsuji, S., Bradley, D.G., Yeo, J.S., Nyamsamba, D., Zagdsuren, Y.,
Yokohama, M., Nomura, K., Amano, T., 2004. Independent mitochondrial origin and historical genetic
differentiation in North Eastern Asian cattle. Mol Phylogenet Evol 32: 539.
Mirol, P.M., Giovambattista, G., Liron, J.P., Dulout, F.N., 2003. African and European mitochondrial
haplotypes in South American Creole cattle. Heredity 91, 248-254.
Mishra, B., Kataria, R., Bulandi, S., Prakash, B., Kathiravan, P., Mukesh, M., Sadana, D., 2009. Riverine status
and genetic structure of Chilika buffalo of eastern India as inferred from cytogenetic and molecular
markerbased analysis. J Anim Breed Genet 126, 69-79.
Parma, P., Erra-Pujada, M., Feligini, M., Greppi, G., Enne, G., 2004. Water buffalo (Bubalus bubalis): complete
nucleotide mitochondrial genome sequence. DNA Seq 15: 369.
Pellecchia, M., Negrini, R., Colli, L., Patrini, M., Milanesi, E., Achilli, A., Bertorelle, G., Cavalli-Sforza, L.L.,
Piazza, A., Torroni, A., Ajmone-Marsan, P., 2007. The mystery of Etruscan origins: novel clues from Bos
taurus mitochondrial DNA. P R Soc B 274: 1175.
Pietro, P., Maria, F., GianFranco, G., Giuseppe, E., 2003. The complete nucleotide sequence of goat (Capra
hircus) mitochondrial genome: goat mitochondrial genome. Mitochondrial DNA 14: 199.
Steinborn, R., Schinogl, P., Wells, D.N., Bergthaler, A., Muller, M., Brem, G., 2002. Coexistence of Bos taurus
and B. indicus mitochondrial DNAs in nuclear transfer-derived somatic cattle clones. Genetics 162: 823.
Troy, C.S., MacHugh, D.E., Bailey, J.F., Magee, D.A., Loftus, R.T., Cunningham, P., Chamberlain, A.T., Sykes,
B.C., Bradley, D.G., 2001. Genetic evidence for Near-Eastern origins of European cattle. Nature 410: 1088.
Xu, X., Arnason, U., 1994. The complete mitochondrial DNA sequence of the horse, Equus caballus: extensive
heteroplasmy of the control region. Gene 148: 357.
Tarnita C. E., Antal T., Ohtsuky H., Nowak M. A. 2009. Evolutionary dynamics in set structured populations.
Proceedings of the National Academy of Science 21: 8601.
64
10
Y- Chromosome Based Genetic Diversity in Farm Animal Genetic
Resources with Special Reference to Bovine
Indrajit Ganguly, Monika Sodhi, Suchit Kumar, Sanjeev Singh and K N Raja
ICAR- National Bureau of Animal Genetic Resources, Karnal, Haryana
________________________________________________________________________________________
Studies on Y chromosome are of particular interest in livestock species because in common
breeding strategies, only a few males contribute genetically to the next generation (Lindgren et al.,
2004). The mammalian Y chromosome is a gene poor male specific chromosome in a species with
male heterogamety like in human. It often determines sex in a dominant fashion and is inherited
clonally from father to son, so it is never present in females. Y chromosome may thus, complement
the studies which are using mitochondrial DNA for inferring sex-specific population genetic
processes. The mammalian Y chromosome has two components, a pseudo-autosomal region, which
frequently recombines with the X chromosome and a male-specific region (MSY). In many higher
organisms (with an X-Y sex determining system), it is the only chromosome with truly haploid
characteristics; wherein no genetic material is exchanged with a homologue through recombination,
making all sites linked to each other. This non-recombining region is called the male-specific region
of the Y chromosome, the MSY, and comprises 95% of the length of the Y chromosome in human.
Remaining5% that is genetically similar to X, make up the pseudo-autosomal region, PAR, in the
telomere ends of the Y chromosome and recombine with the X chromosome during meiosis.
Markers on the MSY, which is paternally inherited in a haploid way, have been used for studying
the origin of species, range expansion, admixture of populations, and migration in animals (Pidancier et al., 2006). Molecular variation in the Y chromosome provides information about genetic
diversity, since it reveals the pattern of distribution of paternal lineages. For instance, it may
indicate stocks upgrading, which is often performed by using sires from breeds with the desired
properties.
Origins and evolution of Y-chromosome
The evolutionary ancestor of the sex chromosomes was a pair of matched, autosomal chromosomes
that acquired sex-determining genes on one member of the pair. This occurred about 300 million
years ago in a reptile-like ancestor. Over time additional genes with male-specific functions
accumulated in this same chromosome, called proto-Y, which then lost its ability to recombine with
its counterpart chromosome, called proto-X. There are four regions of the proto-X chromosome,
which appear to have been involved in four different steps, resulting in the loss of recombination
with proto-Y. Each of the four regions accumulated mutations in those non-recombining regions of
proto-Y at four different times in evolution. Each time recombination was lost there was
degradation and loss of the non-recombining region. Over time this chromosome evolved into Y,
losing most of its genetic information as a result of the degradation of the non-recombining regions
of the chromosome. Its partner chromosome evolved into the X chromosome. The degeneration of
the Y was offset at various times by additions of autosomal genes to this chromosome (as well as to
X), leading to a pattern of loss and gain of genetic material over a period of about 170 million years.
If we consider human Y chromosome as an example then the degeneration seems to be occurred
in four discrete episodes, beginning about 300 million years ago when a reptile-like ancestor
acquired the SRY gene on one of its autosomal chromosomes. Each of the four episodes involved a
failure of recombination to occur between the X and the Y chromosomes, resulting in subsequent
decay of some genes in the non-recombining region.Around 166 million years ago, a huge chunk of
the Y chromosome in one of our mammalian ancestors was turned upside down and reinserted.
65
The change was so extreme that the Y chromosome no longer matched the X, and it became
impossible for the two to swap genes. The Y chromosome began collecting mutations and losing
genes, ultimately taking on its characteristic Y shape as a result.In humans, it now carries a mere 19
of the 800 genes it originally shared with the X. Given that rate of loss, some geneticists have
predicted that the chromosome will lose its final gene in 4.6 million years. Recently, Jennifer
Hughes and her colleagues from Whitehead Institute for Biomedical Research in Cambridge,
Massachusetts sequenced the Y chromosome of the rhesus macaque - a primate that diverged from
humans around 25 million years ago.They found that the monkey's Y chromosome contains 20
genes that match its X chromosome, and 19 of them are the same as human Y genes. This suggests
that the human Y chromosome has lost only one gene since humans and macaques last shared a
common ancestor (Nature, DOI: 10.1038/nature10843). This empirical data suggests that the Y
chromosome has held steady over the last 25 million years and the 19 surviving genes probably
have vital biological functions.
Characteristics features of mammalian Y chromosome
The mammalian Y is the smallest chromosome of the genome, comprising < 3% of the haploid
genome (Krausz and DeglInnocenti 2006). It is usually a metacentric or acrocentric chromosome
and contains a short (Yp) and long arm (Yq). A small region (5% of the Y) located in the distal part
of either Yp or Yq that mediates X and Y segregation is known as the pseudoautosomal region
(PAR), where X and Y chromosomes pair and recombine during meiosis. The rest of the Y (95%)
contains Y chromosome male-specific sequences (MSY) that do not recombine with the X during
meiosis (Rice 1996). Several special features set the MSY apart from the rest of genome: absence of
homologous recombination, male-limited transmission, abundance of Y-specific repetitive
sequences with unique genomic structures (i.e. massive palindromes, or palindrome-like
sequences), tendency of MSY genes to degenerate during evolution, acquisition of autosomal genes,
and accumulation and functional cluster of testis genes for maleness and reproduction (Lahn and
Page 1997, Tilford et al. 2001, Rozen et al. 2003, Gvozdev et al. 2005, Liu 2010). Investigating Y
chromosomes is challenging
as the absence of recombination between the X and Y makes classical linkage-mapping of MSY
virtually impossible, and the complexity of the repetitive sequences makes sequencing extremely
difficult (Liu and Ponce de Len 2007). This explains why the Y was excluded from most
mammalian genome sequencing projects.Most of todays knowledge regarding the mammalian Y
chromosome is based on the three sequenced primate (human, chimpanzee and rhesus macaque) Y
chromosomes (Skaletsky et al. 2003, Hughes et al. 2010, Hughes and Rozen 2012, Hughes et al.
2012) and the partially sequenced mouse (Alfldi 2008) and bovine Y chromosomes (Chang et al.
2013b).
The Bos taurus Y chromosome (BTAY) is ~ 51 Mb in size and is the smallest chromosome in the
genome (Liu and Ponce de Len 2007). The PAR is ~ 6 Mb (Das et al. 2009), and the MSY is ~ 45 Mb.
Cytogenetically, the size and morphology of the Y chromosome differ among bovid lineages (Di
Meo et al. 2005). BTAY is submetacentric, while the zebu (Bos indicus, BIN) and river buffalo
(Bubalus bubalis, BBU) Y chromosomes are acrocentric (Kieffer and Cartwright 1968). This
morphological difference is the consequence of Y chromosomal rearrangements through either
centromeric transposition or pericentric inversion as revealed by comparative fluorescent in situ
hybridization (FISH) (Di Meo et al. 2005). By using Y-linked repetitive sequences as FISH painting
probes, Di Meo and coworkers found that the Y chromosome in different bovid lineages has
underwent genomic rearrangements and accumulated various classes of repetitive sequences
during the bovid evolution (Di Meo et al. 2005). The bovine Y is being sequenced
(http://www.ncbi.nlm.nih.gov/bioproject/20275), and a draft sequence assembly of ~ 43.3 Mb is
available (GenBank acc. no. CM001061.2).
66
Effective Population Size: In sexual populations, half of the alleles are derived from females and
half from males. The number of chromosome variants maintained in the population is, among other
things, dependent on the effective population size (Ne) of each chromosome. In an ideal population
the relationship between Y, X and autosomes is Ne: 3Ne:4Ne suggesting an expected 1:3:4
relationship in diversity between the different chromosomes.
Mating systems:Different mating systems can cause differences in effective population size. Skewed
mating systems, for example in polygynous species, where one male mates with many females, will
affect the relative difference in effective population size between chromosomes. For example; the
Y:X: autosome relationship when one male mates with two females will be 1:5:6. If the ratio of
females to males is increased to ten (which is common in some species (McComb and Clutton-Brock
1994; Roed et al. 2002) the relationship would be 1:21:22.
Selection:New mutations can be neutral, advantageous or disadvantageous. The probability of
fixation or elimination of the mutation in the population depends on the relative fitness of the new
phenotype. Exceptions occur for balancing selection, overdominance (where heterozygotes are
favored) and in limited populations. Negative selection will tend to eliminate disadvantageous
mutants or genotypes from the population and is the prevailing type of selection since the majority
of non-neutral mutations are deleterious or slightly deleterious. Positive selection increases the
probability for an advantageous mutation to become fixed in the population (Li 1997). However,
the chance of losing a new advantageous mutation from the population by random genetic drift
(change in allele frequency due to chance) can still be high (Hartl and Clark 1997). Selection at a
locus will also affect linked sites. In the absence of recombination, selection will tend to reduce
genetic variability at linked sites to the same extent as at the locus under selection. With
recombination, the effect becomes gradually smaller as the rate of recombination between the
selected locus and linked sites increases. In line with this thinking, levels of neutral variability have
been shown to correlate with recombination rate in humans (Nachman 2001), mice (Nachman
1997), plants (Stephan and Langley 1998) and fruit flies (Begun and Aquadro 1992). The Y
chromosome, which lacks recombination (except in the PAR), should be expected to have reduced
variation as compared to recombining chromosomes. Selective sweeps and background selection
may have severe affects on the MSY where all sites are linked compared to other genomic regions.
Selective Sweeps:Selective sweep or the hitchhiking effect is an effect of positive selection where a
favorable allele drives through the population to fixation together with its linked loci. This will
reduce the variation linked to the selected site and decrease the diversity in the population (Rice
1987). The impact of a selective sweep depends on the recombination rate and selection coefficient,
the lower the recombination rate and/or the higher the selection coefficient, the larger is the
genomic region affected by the sweep. In the Y chromosome where 95% of the sites are linked, an
advantageous gene regulating a male specific trait, like one involved in spermatogenesis, may
sweep through the population and eliminate all variation in the MSY (Roldan and Gomendio 1999;
Wyckoff et al. 2000). Selective sweeps can bring about fixed Y chromosomes within a species and
different between species, while mutations will only slowly produce new variants in a population.
In contrast to the neutralists prediction of a positive correlation of intraspecific variation and
interspecific divergence, positive selection can lead to uncoupling of levels of polymorphism and
divergence (Li 1997).
Background selection:Background selection is an effect of negative selection where deleterious
mutations will be eliminated from the population together with their linked loci. This process, as
with selective sweeps, will reduce variation in the region around the selected site. Similar to sweeps
the impact of background selection depends on the recombination rate and the selection coefficient.
68
Background selection is not thought to alter allele frequencies to the same extent as selective
sweeps; indicating that the two types of selection can be distinguished from each other
(Charlesworth et al. 1995). In a non-recombining region, mildly deleterious as well as weakly
advantageous alleles will survive linked to each other. In the absence of a strongly advantageous
mutation, a neutral or weakly selected mutation can only survive on a non-recombining
chromosome (like Y) if there is no strongly deleterious mutation, otherwise it will be eliminated
(Charlesworth, 1994).
Sex Specific Mutation Rates: Mutations are generated during DNA replication. The number of germ
cell divisions differs between spermatogenesis and oogenesis. In oogenesis every mature oocyte has
gone through a total of 24 cell divisions irrespective of the age of the female. In spermatogenesis
however cell division is a continuous process, so the older the male the more cell divisions his
sperms have gone through. For example in a 20 year-old man every sperm has gone through about
150 cell divisions and at the age of 40, 610 cell divisions (Hurst and Ellegren 1998). The male to
female mutation rate ratio, m, is mostly dependent on the skewed number of cell division in the
germ lines of males and females. As m is generally larger than one, meaning that male germ cells
mutate more frequently than female germ cells, more mutations in the Y chromosome than in other
chromosomes can be predicted (Miyata et al. 1987). Estimates ofm from X and Y comparisons
suggest that it co-varies with the mean age of reproduction; rodent (m=2)(Chang et al. 1994) <
felidae (m=4)(Pecon Slattery and O'Brien 1998) < primates (m=6)(Chang et al. 1996).
Other factors affecting Y diversity:Differences in migration between males and females can produce
variation in the patterns of genetic differentiation detected in maternally and paternally inherited
systems. In a patrilocal species, where female migrate more than males, this will imply less
variation in the Y chromosome locally. In a global perspective, this will lead to higher
differentiation in Y chromosome than in the maternally inherited mitochondrial DNA (mtDNA)
(Seielstad et al. 1998). Spermatogenesis and sperm mobility are energy demanding processes;
therefore, the function of the mitochondria is vital for reproductive success. Deleterious mutations
in the mtDNA that affect the energy production negatively would be expected to lead to impaired
reproduction and, consequently, reduced effective population size among males. This will lower
the effective population size of Y chromosomes and reduce its diversity (Gemmell and Sin 2002).
Origin and domestication of cattle
Y chromosome analyses have long been used to study the process of domestication. The aurochs, or
the wild ox (Bos primigenius), extinct since 1627, was once widespread throughout Europe, northern
Africa, and southern Asia during the Pleistocene and Holocene period. Modern cattle have
probably been domesticated from this wild aurochs in the Near East and Asia around 10,000 years
ago (Anderung et al., 2007; Freemann et al., 2006; Gtherstrm et al., 2005). The estimated time of
divergence between Bos taurus and Bos indicus ranges from 117 000 to 275 000 years according to
mtDNA analyses (Bradley et al., 1996) and from 610 000 to 850 000 years according to microsatellite
data analyses (MacHugh et al., 1997). A new estimated divergence study based on mtDNA data has
indicated the approximately time of divergence between B. taurus and B. indicus could be about 1.7 2.0 million years ago (Hiendleder et al., 2008).B. taurus and B. indicus cattle were domesticated
independently from the aurochsen in the Near East and in Indian subcontinent, respectively
around 10,000 year ago (Beja-Pereira et al., 2006 ; Bradley et al., 1998). Subsequently, cattle
accompanied human migrations, which led to the dispersal of domestic cattle of taurine, indicine,
or mixed origin over Asia, Africa, Europe, and the New World (Ajmone-Marsan et al., 2010).
Bovine Y-chromosome variations
Cattle Y-chromosome studies are generally affected by a lack of powerful sources of information.
There are limited numbers of informative segregating sites and polymorphic Y specific
69
microsatellites (Ginja et al., 2009; Gtherstrm et al., 2005). The first analysis of the Y-chromosome
was on the exploration of karyological features of different species and it was identified as
metacentric/submetacentric and acrocentric in taurine (Bos taurus) and zebu/indicine (Bos indicus)
cattle, respectively (Potter and Upton, 1979; Halnan and Watson, 1982). Studies on cattle Ychromosomes have mainly focused on the assessment of male-mediated migration patterns and
admixture between B. taurus and B. indicus (Hanotte et al., 2000; Anderung et al., 2007; Edwards et
al., 2007) or the assessment of differences in diversity among different breeds (Ginja et al., 2009 ;
Kantanen et al., 2009). The understanding of the origin, relationships, and paternal inheritance of
native breeds indicated that there is large share of Y chromosome-specific markers (Edwards et al.,
2000; Hellborg and Ellegren, 2004; Li et al., 2007). Y chromosome-specific markers are preferred for
testing paternity, examining contamination risks of DNA samples (analysis of male component in
male/female mixtures), and handling criminal cases (Jobling et al., 1997; Jobling, 2001).
Gtherstrm et al. (2005) identified five polymorphic sites (Table 1) on the cattle Y-chromosome,
allowing identification of three haplotypes viz., Y1, Y2 and Y3 (Table 2) in contemporary cattle,
with Y1 being more frequent in north-western Europe B. taurus, Y2 being dominant in southern
Europe B. taurus and Anatolian cattle, and Y3 being exclusive to B. indicus. Recently, Li et al. (2013)
studied the Y chromosome genetic diversity and paternal origin of Chinese cattle including 369
bulls from 17 Chinese native cattle breeds, 30 bulls from Holstein and four bulls from Burma. In
total, the taurine Y1, Y2 haplogroup and indicine Y3 haplogroup were detected in 7 (1.9 %), 193
(52.3 %) and 169 (45.8 %) individuals of 17 Chinese native breeds, respectively. Y2 was observed to
be dominating northern China (91.4 %), Y3 in southern China (81.2 %) while Central China was an
admixture zone with Y2 predominating overall (72.0 %). The results also demonstrated that Chinese
cattle have two paternal origins, one from B. taurus (Y2) and the other from B. indicus (Y3). The Y1
haplogroup might have originated from the imported beef cattle breeds in western countries.
Interestingly, the geographical distributions of the Y2 and Y3 haplogroup frequencies reveal a
pattern of male indicine introgression from south to north China, and male taurine introgression
from north to south China. SNP markers were also been used to identify genetic variations in both
X and Y chromosomes of taurine and zebu cattle breeds in Africa and reported to be useful in
determining zebu admixture in African cattle breeds (Anderung et al., 2007).
The combined investigation of Y-chromosome SNPs and microsatellite alleles, which are highly
conserved (i.e., ~108 per site per generation in humans) and highly mutable (i.e., ~103 per locus per
generation in humans), respectively (Hurles and Jobbing, 2001), facilitates the assessment of Yhaplotype diversity among species and the taxonomic origins of the genes. Y chromosome-specific
single nucleotide polymorphisms (SNPs) and microsatellites markers were therefore combined and
used to investigate the genetic diversity and origins in cattle (Bradley et al., 1994; Budowle et al.,
2005; Cai et al., 2006; Yang et al., 2011), dogs (Bannasch et al., 2005; Erdoan et al., 2013), sheep
(Niemi et al., 2013), and humanpopulations of different regions (Cinnioglu et al., 2004; Rootsi et al.,
2004). More recently, the combination of 5 SNPs, 1 indel, and 7 STRs identied 13 Y-chromosome
specific haplotypes in Portuguese native cattle breeds (Ginja et al., 2009). The 13 Y-haplotypes
included 3 previously described patrilines (Y1, Y2, and Y3) and 10 new haplotypes within Bos
taurus. Native cattle contained most of the diversity with 7 haplotypes (H2Y1, H3Y1, H5Y1, H7Y2,
H8Y2, H10Y2, and H12Y2). H6Y2 and H11Y2 occurred in high frequency across breeds including
the exotics and thus had a common genetic signature (Ginja et al., 2009).
The genetic diversity of the Y chromosome was determined as lower than that of autosomal
chromosomes (Liu et al., 2003; Hellborg and Ellegren, 2004; Ginja et al., 2009). Relatively low levels
of Y-chromosome genetic diversity have been reported in several mammalian species including
cattle (Hellborg and Ellegren 2004; Lindgren et al. 2004; Bannasch et al., 2005; Meadows et al., 2006;
Li et al., 2007). In the case of domestic animals, the effective Y chromosome contribution tends to be
70
reduced because of common use in breeding schemes of a few selected males that produce a large
number of offspring (Hellborg and Ellegren 2004). For example, a demographic analysis of the
native Portuguese cattle breed Alentejana indicates that, from an original number of 671 founder
sires, only 24 Y chromosomes are currently represented with an effective number of 2.73 males
(Carolino and Gama 2008). Despite limitations, studies of male lineages contribute to a better
understanding of the origin and relationships among domestic breeds (Edwards et al., 2000;
Lindgren et al., 2004; Anderung et al., 2005; Gotherstrom et al., 2005; Li et al., 2007). Detail of Ychromosome specific cattle STR, Primer sequences, Ta, fluorescence labeling etc. may be available
from previous studies (Bishop et al.,1994; Gtherstrm et al., 2005; Vaiman et al., 1994; Kappes et al.,
1997; Liu et al., 2002)
Y chromosome study in other domestic animals
Up to date, however, few phylogenetic surveys involving the Y chromosome have been reported in
domestic species due to a lack of MSY variation. Indeed, very low rates of nucleotide diversity have
been reported within the MSY of horse (Lindgren et al., 2004), cattle (Hellborg and Ellegren, 2004),
and sheep (Meadows et al., 2006). In goat, latest studies based on mitochondrial DNA analyses
revealed a complete pattern of caprine domestication (Luikart et al., 2001; Naderi et al., 2007). In the
ECONOGENE project (http://econogene.eu/) the sequence variation at the Y chromosome was
used to integrate information from mitochondrial and autosomal DNA to study the genetic
diversity of several goat breeds.
Future perspective
In the post genomic era, whole Y-chromosome sequencing holds the promise of stretching the
paternal phylogeny to its maximal resolution. The absence of recombination enables all Ychromosome sequences to be placed within a single phylogenetic tree and a single locus hierarchy
may oversimplify the demographic history of a particular breed/individual. In Y chromosome, the
ordering of the accumulated sequence variants since the most recent common ancestor is preserved.
Due to this molecular encapsulation of male demographic history, Y-chromosome phylogeny has
become one of the pillars of archaeogenetics. Although, it is possible to infer the phylogenetic
relationship based on STRs and SNPs identified till date however, the number of haplogroups that
may be interesting for deriving Y chromosome based phylogenies is expected to be much higher.
Also, from reports based on Y-STR networks, it is clear that it is still not possible to distinguish
several phylogenetic groups, which may be relevant for applications of the Y chromosomal tree.
Identification of thousands of unknown Y-SNPs through whole genome studies is the best resource
for deducing Y chromosome based diversity and phylogeography. The informative markers
derived from whole-genome sequences, followed by genotyping in larger panels can be used to
precisely delineate patterns of restricted geographic and/or population specificity. With the
advancements in the sequencing technology, a plethora of sequence information on Y chromosome
will be available leading to exponential number of new Y-SNPs and a growing number of (sub-)
haplogroups. Whole genome Y-SNP profiles will facilitate in better resolution of Y chromosomal
phylogenetic tree and help us to understand our animal genetic resources in a better way.
Table 1. Polymorphic sites on the cattle Y-chromosome
Locus
DDX3Y-1
DDX3Y-7
UTY-19
ZFY-9
Region
Intron 1
Intron 7
Intron 19
Intron 9
SNPs
425>C/T
123 > C/T
423> C/A
120> C/T
GenBank #
AY928816
AY928819
AY936543
AY928828
ZFY-10
ZFY_10indel
Intron 10
Intron 10
665> C/T
704> /GT
AF241271
AF241271
71
Origin
B.taurus
B.taurus
B.indicus
ZFY-9
Marker
DDX3Y-1
ZFY-10
C
C
T
C
C
T
C
C
T
DDX3Y-7
UTY-19
C
C
T
C
A
A
Table 3: Primer sequences, Ta,fluorescence label and references for Y chromosome STR loci
Locus Name
BM861-F
BM861-R
DDX3Y1STR-F
DDX3Y1STR-R
INRA124-F
INRA124-R
INRA126-F
INRA126-R
INRA189-F
INRA189-R
UMN0103-F
UMN0103-R
UMN0307-F
UMN0307-R
UMN0504-F
UMN0504-R
UMN0920-F
UMN0920-R
UMN2001-F
UMN2001-R
UMN2303-F
UMN2303-R
UMN2404-F
UMN2404-R
UMN3008-F
UMN3008-R
References
Ta(C)
58
Label
Ned
Reference
Bishop et al. (1994)
58
Fam
58
Vic
58
Fam
55
Fam
58
Fam
58
Vic
58
Ned
55
Ned
55
Fam
58
Ned
58
Fam
58
Fam
Ajmone-Marsan P., Garcia J.F., LenstraJ.A. and the GLOBALDIV CONSORTIUM 2010.On the origin of cattle:
how aurochs became cattle and colonized the world. EvolAnthropol 19: 148.
Anderung C., Bouwman A., Persson P., Carretero J.M., Ortega A.I., ElburgR.,Smith C., Arsuaga J.L., Ellegren
H., Gotherstrom A. 2005. Prehistoriccontacts over the Straits of Gibraltar indicated by genetic analysis
of IberianBronze Age cattle. Proc. Natl. Acad. Sci. USA. 102:8431.
Anderung C., Hellborg L., Seddon J., Hanotte O., Gtherstrm A. 2007. Investigation of X- and Yspecific
single nucleotide polymorphisms in taurine (Bos taurus) and indicine (Bos indicus) cattle. Anim. Genet.
38: 595.
Bannasch D.L., Bannasch M.J., Ryun J.R., Famula T.R., Pedersen N.C. 2005.Y chromosome haplotype analysis
in purebred dogs. Mamm Genome 16: 273.
Beja-Pereira A., Caramelli D., Lalueza-Fox C., Vernesi C., Ferrand N., CasoliA,Goyache F., Royo L., Conti S.,
Lari M., Martini A., Ouragh L., Magid A., AtashA.,Zsolnai A., Boscato P., Triantaphylidis C., Ploumi
K., Sineo L., Mallegni F., TaberletP.,Erhardt G., Sampietro L, Bertranpetit J, Barbujani G, Luikart G,
Bertorelle G 2006.The origin of European cattle: evidence from modern and ancient DNA. Proc. Natl.
Acad. Sci. U S A, 103:8113.
72
Bradley D, Loftus R, Cunningham P, MacHugh D (1998). Genetics and domesticcattle origins.Evol. Anthr
6:79.
Bradley D.G., MacHugh D.E., Cunningham P. and Loftus R.T. 1996. Mitochondrial diversity and the origins
of African and European cattle.ProcNatlAcadSci U. S. A. 93:5131.
Bradley D.G., MacHugh D.E., Loftus R.T., Sow R.S., Hoste C.H. and Cunningham E.P. 1994. Zebu-taurine
variation in Y chromosomal DNA: a sensitive assay for genetic introgression in West African
trypanotolerant cattle populations. Anim Genet 25: 7.
Budowle B., Adamowicz M., Aranda X.G., Barna C., Chakraborty R., Cheswick D., Dafoe B., Eisenberg A.,
Frappier R., Gross A.M.et al.2005. Twelve short tandem repeat loci Y chromosome haplotypes: genetic
analysis on populations residing in North America. Forensic SciInt 150: 1.
Cai X., Chen H., Wang S., Xue K. and Lei C. 2006. Polymorphisms of two Y chromosome microsa-tellites in
Chinese cattle. Genet SelEvol 38:525.
Charlesworth B. 1994. The effect of background selection against deleterious mutations on weakly selected,
linked variants. Genet Res 63: 213.
Charlesworth D., Charlesworth B., and Morgan M.T. 1995.The pattern of neutral molecular variation under
the background selection model. Genetics 141: 1619.
Edwards C.J., Baird J.F. and MacHugh D.E. 2007. Taurine and zebu admixture in Near Eastern cattle: a
comparison of mitochondrial, autosomal and Y-chromosomal data. Anim. Genet. 38:520.
Edwards C.J., Gaillard C., Bradley D.G. and MacHugh D.E. 2000. Y-specificmicrosatellite polymorphisms in a
range of bovid species.Anim Genet.31:127.
Erdoan M., Tepeli C., Brenig B., Akbulut M.D., Uuz C., Savolainen P. and zbeyaz C 2013. Genetic
variability among native dog breeds in Turkey. Turk J Biol 37: 176.
Freemann, A., Hoggart, C., Hanotte, O. and Bradley, D.G. (2006). Assessing the relative ages of admixture in
the bovine hybrid zones of Africa and the Near East using X chromosome haplotype
mosaicism.Genetics 173: 1503-1510.
Gemmell N.J. and Sin F.Y. 2002. Mitochondrial mutations may drive Y chromosome evolution. Bioessays 24:
275.
Ginja C., Telo da Gama L. and Penedo M.C.T. 2009. Y Chromosome haplotype analysis in Portuguese cattle
breeds using SNPs and STRs. J. Hered. 100: 148.
Gotherstrom A., Anderung C., Hellborg L., Elburg R., Smith C., Bradley D.G. and Ellegren H. 2005. Cattle
domestication in the Near East was followed by hybridization with aurochs bulls in Europe. ProcBiol
Sci. 272:2345.
Halnan C.R.E. and Watson J.I. 1982. Y chromosome variants in cattle Bos taurus and Bos indicus. Ann Genet
SelAnim 14: 1.
Hanotte O., Tawah C.L., Bradley D.G., Okomo M., Verjee Y., Ochieng J. and RegeJ.E. 2000. Geographic
distribution and frequency of a taurineBostaurusandanindicineBosindicusY specific allele amongst
sub-saharan African cattlebreeds.Mol Ecol.9:387.
Hellborg L. and Ellegren H. 2004. Low levels of nucleotide diversity inmammalianY chromosomes.
MolBiolEvol 21:158.
Hiendleder S., Lewalski H. and Janke A. 2008. Complete mitochondrial genomes of Bos taurus and Bos
indicus provide new insights into intra-species variation, taxonomy and domestication. Cytogenet
Genome Res 120(12):150.
Hurst LD, and Ellegren H. 1998. Sex biases in the mutation rate. Trends Genet 14: 446-452.
Jobling M.A. and Tyler-Smith C. 1995. Fathers and sons: the Y chromosome and human evolution. Trends
Genet 11: 449.
Jobling M.A., Pandya A. and Tyler-Smith C. 1997. The Y chromosome in forensic analysis and paternity
testing.Int J Legal Med 110: 118.
Jobling M.A. 2001. In the name of the father: surnames and genetics. Trends Genet 17: 353.
Kantanen J., Edwards C.J., Bradley D.G., Viinalass H., Thessler S., Ivanova Z. et al. 2009. Maternal and
paternal genealogy of Eurasian taurine cattle (Bos taurus). Heredity doi:10.1038/hdy.2009.68
Li M.H., Zerabruk M., Vangen O., OlsakerI. and Kantanen J. 2007. Reduced genetic structure of north
Ethiopian cattle revealed by Y-chromosome analysis. Heredity, 98:214.
73
Li R., Xie W.M., Chang Z.H., Wang S.Q., Dang R.H., Lan X.Y., Chen H. and Lei C.Z. 2013. Y chromosome
diversity and paternal origin of Chinese cattle.MolBiol Rep. 40 (12):6633-6. doi: 10.1007/s11033-0132777-y.
Lindgren G., Backstrom N., SwiLindgren G., Backstrom N., Swinburne J., Hellborg L., Einarsson A., Sandberg
K., Cothran G., Vila C., Binns M. and Ellegren H. 2004. Limited number of patrilines in horse
domestication.Nat Genet. 36:335.
Liu WS, Beattie CW, Ponce de Leon FA.(2003). Bovine Y chromosome microsatellite
polymorphisms.Cytogenet Genome Res. 102:5358.
Luikart, G., Gielly, L., Excoffier, L., Vigne, J.D.,Bouvet, J. and Taberlet, P. 2001. Multiple maternal origins and
weak phylogeographic structure in domesticgoats. Proc. Natl. Acad. Sci. U.S.A. 98:5927-5932.
MacHugh D.E., Shriver M.D., Loftus R.T., Cunningham P. and Bradley D.G. 1997. Microsatellite DNA
variation and the evolution, domestication andphylogeography of Taurine and Zebu cattle (Bos taurus
and Bos indicus). Genetics, 146:1071.
McCombK. and Clutton-Brock T. 1994. Is mate choice copying or aggregation responsible for skewed
distributions of females on leks? Proc R SocLond B BiolSci 255: 13.
Meadows J.R., Hanotte O., Drogemuller C., Calvo J., Godfrey R., Coltman D., Maddox J.F., Marzanov N.,
Kantanen J., Kijas J.W. 2006. Globally dispersed Y chromosomal haplotypes in wild and domestic
sheep. Anim Genet. 37: 444.
Mitchell R.J. and Hammer M.F. 1996. Human evolution and the Y chromosome. CurrOpin Genet Dev 6: 737.
Miyata T., Hayashida H., Kuma K., Mitsuyasu K. and Yasunaga T. 1987. Male-driven molecular evolution: a
model and nucleotide sequence analysis. Cold Spring HarbSymp Quant Biol 52: 863-867.
Naderi, S., Rezaei, H.R.,Taberlet, P., Zundel, S., Rafat, S.A., Naghash, H.R., El-Barody, M.A., Ertugrul, O.and
Pompanon, F., 2007.Large-scale mitochondrial DNA analysis of the domestic goat reveals six
haplogroups with high diversity.PLoS One 2, e1012.
Niemi M., Bluer A., Iso-Touru T., Nystrm V., Harjula J., Taavitsainen J.P., Stor J., Lidn K. and Kantanen J.
2013. Mitochondrial DNA and Y-chromosomal diversity in ancient populations of domestic sheep
(Ovisaries) in Finland: comparison with contemporary sheep breeds. Genet SelEvol 45: 2.
Pidancier, N., Jordan, S., Luikart, G. and Taberlet, P., 2006. Evolutionary history of the genus Capra
(Mammalia, Artiodactyla): discordance between mitochondrial DNA and Y-chromosomephylogenies.
Mol. Phylogenet. Evol.40:739-749.
Potter W.L., Upton PC 1979. Y chromosome morphology of cattle.Aust Vet J 55: 539.
Rice W.R. 1987. Genetic hitchhiking and the evolution of reduced genetic activity of the Y sex chromosome.
Genetics 116: 161.
Roed K.H., Holand O., Smith M.E., Gjostein H., Kumpula J. and Nieminen M. 2002. Reproductive success in
reindeer males in a herd with varying sex ratio.MolEcol 11: 1239.
Roldan E.R. and Gomendio M. 1999. The Y chromosome as a battle ground for sexual selection. Trends in
Ecology and Evolution 14: 58-62.
Rozen S., Skaletsky H., Marszalek J.D., Minx P.J., Cordum H.S., Waterston R.H., Wilson R.K., and Page D.C.
2003. Abundant gene conversion between arms of palindromes in human and ape Y chromosomes.
Nature 423: 873.
Seielstad M.T., Minch E. and Cavalli-Sforza L.L. 1998.Genetic evidence for a higher female migration rate in
humans. Nat Genet 20: 278.
Skaletsky H., Kuroda-Kawaguchi T., Minx P.J., Cordum H.S., Hillier L., Brown L.G., et al. 2003. The malespecific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature 423: 82537.
Verkaar E.L.C., Nijman I.J., Beeke M., Hanekamp E., Lenstra J.A. 2004. Maternal and paternal lineages in
cross-breeding bovine species. Has Wisent a hybrid origin? Mol. Biol. Evol. 21: 1165.
Wyckoff G.J., Wang W., and Wu C.I. 2000. Rapid evolution of male reproductive genes in the descent of man.
Nature 403: 304.
Yang Y., Chang T.C., Yasue H., Bharti A.K., Retzel E.F., Liu W.S. 2011. ZNF280BY and ZNF280AY: autosome
derived Y-chromosome gene families in Bovidae. BMC Genomics 12: 13.
74
11
Candidate Gene Polymorphism Approaches for Detection and Genotyping
R S Kataria and S K Niranjan
ICAR-National Bureau of Animal Genetic Resources, Karnal (Haryana)
____________________________________________________________________________________________
Variations in the genomic sequence along with various environmental forces influence the
evolutionary process in different species. Efforts have been made, since long to exploit the genetic
variation or polymorphism among and within species for the benefit of mankind through selection
for better performance. Genetic polymorphism can be defined as the occurrence of two or more
alleles at same locus in the same population, each with appreciable frequency. The locus is said to
be polymorphic and the population to exhibit polymorphism for that locus. A polymorphic locus is
usually defined as one for which frequency of the most common allele is less than 0.99. There are
only two kinds of polymorphism: First due to replacement of DNA bases and second due to
insertion or deletion of base pairs. The term polymorphism is different from mutation, which is
generally used to refer to changes in DNA sequence which is not present in most individuals of a
species. Genetic polymorphism during last one decade has found important place in livestock
genomics for developing molecular markers for an early selection of animals and revealing
polymorphisms at the DNA level, is now key player in animal genetics. This has also been used
widely for studying the genetic variation existing within and between species, parentage
verification and also for understanding the process of evolution.Before the discovery of new
generation sequencing tools and genome-wide polymorphism discovery, nucleotide variations
within functional genes were being exploited largely as selection markers, some of them finding
their ways into the commercial use of such markers. Candidate gene polymorphism with
significant effects on production traits has been exploited widely in the livestock species in the past.
Utility of such molecular markers is governed by two major factors- genotyping protocol, which
should be simplest possible and the cost of genotyping which should be low, so as to generate vast
amount of data for association and selection.
Single Nucleotide Polymorphism
Discovery of one kind of genetic polymorphism referred as Single Nucleotide Polymorphism
(SNPs), has paved the newer ways to harness the genomic variationfor developing markers most
suitable for faster genetic gain in livestock species. SNPsare stable genetic markers and have low
mutation rate in comparison with other genetic markers. Even though SNPs are bi-allelic codominant markers, when compared to more informative multiallelic microsatellites, these are still
considered important because of their importance in a higher density and a comparative low cost of
genotyping. Advent of newer high throughput low-cost technologies like nano-pore sequencing
will pave the ways for their better use as markers for future genomic selection in livestock sector.
Depending upon location and their nature the SNPs could be synonymous which are present in
coding region but do not result in change of amino acids due to degeneracy of triplet codons. But
they are non-synonymous if they are present in coding region and also result in change of amino
acid. Non-synonymousSNPs could be further of two types: Missense - Non-synonymous change
results in a different amino acid or nonsense - Non-synonymous change results in a premature stop
codon.Two out of every three SNPs, involve the replacement of C with T. SNPs occur in both the
coding and non-coding regions of the genome. The coding region SNPs may result in mutations,
affecting protein function or resulting in neutral mutations, which do not affect the protein
function. The SNPs coding outside the coding regions may serve as useful markers, because of their
75
close proximity to disease loci. Polymorphic nucleotides present in 5 or 3UTR and promoter
regions could affect the expression of genes and hence the phenotype of animal. Several SNP
markers in the intronic regions also have been shown to be having association with the phenotypes
including disease resistance.
Identification of SNPs
Sequencing is the method of choice for the detection of SNPs, if we want to target particular region
of the genome, which could be a candidate gene of interest, selected from a pathway governing the
trait of interest. The genomic DNA samples are selected from the diverse possible
breeds/races/populations or individuals, which could be either pooled or sequenced individually
after PCR amplification of the region, we are interested in analyzing. Next generation sequencing
techniques have now helped in screening of whole genome for the generation of enormous data
that could be utilized for preparing SNP chip for even whole genome selection. Once we have
identified the SNPs various techniques given below could be utilized for genotyping of large
numbers of individual to find the association of SNPs with the trait of interest or to study the
genetic variation at that particular locus. Some of the techniques like SSCP, PCR-RFLP have been
utilized for detecting as well as genotyping of SNPs in the known target regions of the genome.
Highly polymorphic genes like Major Histocompatibility Complex (MHCs) have been analyzed for
detecting genetic variation using simple PCR-RFLP.
Random Amplification of Polymorphic DNA, RAPD-PCR technique hasalso been successfully
used indefining genetic diversity among different species. RAPD method was used to generate
specificfingerprint patterns of ten different species: includingwild boar, pig, horse, buffalo, beef,
venison, dog, cat, rabbit, and kangaroo. RAPD markers have advantages like, no prior sequence
knowledgeis necessary for designing the specific primers,which can then be used in different
templates. Theamount of DNA required is very small because it will be amplified by PCR. RAPDs
are simple, quick, andcost effective compared to RFLP. However,the technique also has some
disadvantages, likethe repeatability and reliability of RAPD polymorphic profiles are poor. Some
non-specific and thereforenon-reproducible binding of primers occurs and RAPDs are dominant
genetic markers which cannot beused to distinguish homozygote.
Amplified fragment length polymorphism or AFLP is acombination of the RFLP and PCR
techniques for the detection of polymorphism. In this technique first the genomic DNA isdigested
with a restriction enzyme and then the digestedfragments are ligated to the primers that are
complementary to a selectivesequence on the adaptors. Subsequent separationof the amplified
fragments are obtained by selective primersand visualized using autoradiography or by size in gel
electrophoresis. AFLPsovercome the drawbacks of the labor-intensive, timeconsumingRFLP
method and solve the reliability problemcaused by non-specific amplifications in RAPDs. AFLP has
the advantage of genetic stability, being effective, rapid, and economical tool for detectinga large
number of polymorphic genetic markers thatcan be genotyped. The AFLP method is an
idealmolecular approach for population genetics and genometyping, it is consequently widely
applied to detect geneticpolymorphisms, evaluate, and characterize animal genetic resources.
Genotyping of SNPs
The increasing need for large-scale genotyping applications of single nucleotide polymorphisms
(SNPs) in model and non-model organisms requires the development of low-cost technologies
accessible to minimally equipped laboratories. Many techniques have been developed and are
being utilized depending upon the facilities available and the number of individuals to be screened.
Direct sequencing: Sequencing is the best way to detect as well as genotype SNPs. It is not only
helpful in detection of polymorphic sites but also confirms the alleles present. This is the gold test
to confirm alleles present at a site and detected by other methods. This is the method of choice for
76
sequencing closely placed SNPs which could be detected by single pass sequencing and also
becomes economical than many other genotyping methods, which otherwise give ambiguous
results. In chromatogram heterozygous positions can be scored manually or by sequence alignment
tools like MegAlign, it is possible to score homozygous as well as heterozygous alleles. Typically
heterozygous position will depict double peak and software like PhredPhrap can help in base
calling and scoring of alleles.
(a)
(b)
Figure 1: Sequence chromatogram showing polymorphism with multiple peaks in heterozygous sample and
distinct homozygous peaks (G/T in a. and C/T in b.) for two alleles present at the site.
Restriction enzyme cutting (PCR-RFLP):This is the method of choice for genotyping, having
several advantages like easy to perform and simple to record genotypes. But it is not possible to
design protocol for each polymorphic locus identified. Limitation is the possibility of getting a
suitable restriction site at the locus as only few restriction enzymes are available, which recognize
specific cleavage site. Online software are available to design PCR-RFLP protocols. It is also
possible to use PCR-RFLP for detecting as well as genotyping unknown polymorphic sites within
the amplified region, particularly for highly polymorphic genes like Major Histo-compatibility
Complex (MHC) genes. A different approach is followed for that. First we digest the target region
using selected restriction enzymes of our choice e.g. preferably tetra-cutters having higher
frequency of cutting sites. Allelic patterns are recorded and representative alleles are confirmed by
sequencing afterwards. It is also possible to clone and confirm the alleles, particularly when
duplication in genes like MHC class-I is observed during PCR-RFLP.
Figure 2: PCR-RFLP genotyping of amplified product with restriction enzyme and recording of genotypes.
77
Figure 3: Detection of SSCP variants in non-denaturing polyacrylamide gel and their confirmation by
sequencing.
Primer extension: includes the following methodsMass spectrometry: The principle of the commercially available mass spectrophotometry based
MassARRAY(http://agenabio.com/) system is the extension of an oligonucleotide probe over a
SNP site in a PCR product, with a mixture of deoxynucleotides and dideoxynucleotides, to produce
different size products for each allele of a SNP. The extended products are analyzed by
SEQUENOM MALDI-TOF (Matrix-assisted laser desorption/ionizationtime of flight mass
spectrometry)mass spectrometry, and the time-of-flight is proportional to mass, permitting precise
determination of the size of products generated, which can be converted into genotype information.
Because the mass resolution of this method is very high, one can routinely perform multiplexed
assays to permit analysis of up to 6 SNPs in one PCR reaction/tube.Candidate genetic marker
development with the MassARRAYsystem is preferable over the other systems due to itsflexibility.
The highly efficient assay design, short lead time and easy panel modification enable users to
rapidly validate the genetic markers at low reagent and labor cost.
78
Allele-specific primers: The method allows efficient discrimination of SNPs by allele-specific PCR in
a single reaction with standard PCR conditions. A common reverse primer and two forward allelespecific primers with different tails amplify two allele-specific PCR products of different lengths,
which are further separated by agarose gel electrophoresis. PCR specificity is improved by the
introduction of a destabilizing mismatch within the 3 end of the allele-specific primers. This is a
simple and inexpensive method for SNP detection that does not require PCR optimization.
Tetra-Primers Amplification Refractory Mutation System-(Tetra-ARMS) PCR: It is a simple
technique, requiring no special equipment etc. and capable of detecting genotypes directly on gel
after PCR without any post PCR processing. The principle of technique is very simple, designing
and utilizing four different set of primers as shown below. One outer forward and outer reverse
primer set, which will amplify a product without discriminating the alleles. One each forward and
reverse primer discriminating the alleles in opposite orientation utilizing the outer primers. Inner
allele specific primers will amplify the products of different lengths utilizing one of the outer
primers, depending upon the allele present. Only disadvantage of the technique is it requires lots of
standardization and it might not be possible to design primers easily from the polymorphic site.
Figure 6: A typical agarose gel showing tetra-ARMS PCR genotyping results.Note the common outer primers'
amplified product on the top.
79
Single base extension: The SNaPshot Multiplex System is a primer extension-based method that
enables multiplexing up to 10 SNPs (single nucleotide polymorphisms).SNaPshot labeling
chemistry relies on single-base extension and termination.The SNaPshot Multiplex Kit uses a
single-tube reaction to interrogate SNPs at knownlocations. The chemistry is based on the dideoxy
single-base extension of an unlabeledoligonucleotide primer (or primers). Each primer binds to a
complementary template in thepresence of fluorescently labeled ddNTPs and DNA polymerase.
The polymerase extendsthe primer by one nucleotide, adding a single ddNTP to its 3 end. The
fluorescence colorreadout reports which base was added.
Number of
(rs#'s)
in gene
dbSNP
Build
Genome
Build
Number of
Submissions
(ss#'s)
Bos taurus
143
6.2
234,957,222
95,182,052
(44,427,026)
45,672,343
Sus scrofa
143
4.2
82,712,833
52,679,275
(27,768,562)
20,672,310
Ovis aries
143
1.1
100,819,196
54,004,457
(28,360,665)
19,571,908
Capra hircus
143
1.1
55,076,618
37,166,653 (0)
13,055,034
Ovis orientalis
143
1.1
29,547,990
29,263,208 (0)
10,156,420
Capra aegagrus
143
1.1
17,530,711
17,415,157 (0)
6,073,671
Zea mays
143
1.1
13,784,397
10,526,779
(2,343,328)
3,098,050
Ciona
intestinalis
143
3.1
3,233,523
3,190,512 (0)
1,900,242
Microtus
ochrogaster
143
1.1
14,699
14,545 (0)
5,375
0
genomes
537,677,189
299,442,638
(102,899,581)
120,205,353
Organism
Total: 9
Organisms
Number
of
(ss#'s)
with
genotype
-
Number
of
(ss#'s)
with
frequency
968
161
173
1,302
(http://www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi?view+summary=view+summaryandbuild_id=143)
Figure 8: The structure of the flanking sequence in dbSNPdisplaying composite of bases either assayed for
variation or included from published sequence.
81
The dbSNP thus has been designed to support submissions and research into a broad range of
biologicalproblems. These include physical mapping, functional analysis and pharmacogenomics,
association studies, and evolutionary studies. Because dbSNP was developed to complement
GenBank, it may contain nucleotide sequences from any organism; currently, the majority of the
data is for human and mouse, with enormous increase in submissions from livestock and poultry
species, mainly cattle, sheep, pig and chicken.
References
82
12
________________________________________________________________________________________
The vast majority of economically important traits in livestock production systems are quantitative,
that is they show continuous distributions. In attempting to explain the genetic variation observed
in such traits, two models have been proposed, the infinitesimal model and the finite loci model.
The infinitesimal model assumes that traits are determined by an infinite number of unlinked and
additive loci, each with an infinitesimally small effect (Fischer 1918). This model has been
exceptionally valuable for animal breeding, and forms the basis for breeding value estimation
theory (e.g Henderson 1984).
However, the existence of a finite amount of genetically inherited material (the genome) and the
revelation that there are perhaps a total of only around 20 000 genes or loci in the genome (Ewing
and Green 2000), means that there is must be some finite number of loci underlying the variation in
quantitative traits. In fact, there is increasing evidence that the distribution of the effect of these loci
on quantitative traits is such that there are a few genes with large effect, and a many of small effect
(Shrimpton and Robertson 1998, Hayes and Goddard 2001). In Figure 1.1, the size of quantitative
trait loci (QTL) reported in QTL mapping experiments in both pigs and dairy cattle is shown. These
histograms are not the true distribution of QTL effects however, they are only able to observe
effects above a certain size determined by the amount of environmental noise, and the effects are
estimated with error. In Figure1.1.B, the distribution of effects adjusted for both these factors is
displayed. The distributions in Figure 1.1.B indicate there are many genes of small effect, and few of
large effect. The search for these loci, particularly those of moderate to large effect, and the use of
this information to increase the accuracy of selecting genetically superior animals, has been the
motivation for intensive research efforts in the last two decades. Anylocus with an effect on the
quantitative trait is a called a QTL, not just the loci of large effect.
Figure 1.1 A. Distribution of additive (QTL) effects from pig experiments, scaled by the standard deviation of
the relevant trait, and distribution of gene substitution (QTL) effects from dairy experiments scaled
by the standard deviation of the relevant trait. B. Gamma Distribution of QTL effect from pig and
dairy experiments, fitted with maximum likelihood (adapted from manual of Prof Ben Hayes).
Two approaches have been used to uncover QTL. The candidate gene approach assumes that a gene
involved in the physiology of the trait could harbor a mutation causing variation in that trait. The
gene, or parts of the gene, is sequenced in a number of different animals, and any variations in the
83
DNA sequences, that are found, are tested for association with variation in the phenotypic trait.
This approach has had some successes for example a mutation was discovered in the oestrogen
receptor locus (ESR) which results in increased litter size in pigs (Rothschild et al. 1991). There are
two problems with the candidate gene approach, however. Firstly, there are usually a large number
of candidate genes affecting a trait, so many genes must be sequenced in several animals and many
association studies carried out in a large sample of animals (the likelihood that the mutation may
occur in non-coding DNA further increases the amount of sequencing required and the cost).
Secondly, the causative mutation may lie in a gene that would not have been regardeda priori as an
obvious candidate for this particular trait.
An alternative is the QTL mapping approach, in which chromosome regions associated with
variation in phenotypic traits are identified. QTL mapping assumes the actual genes which affect a
quantitative trait are not known. Instead, this approach uses neutral DNA markers and looks for
associations between allele variation at the marker and variation in quantitative traits. A DNA
marker is an identifiable physical location on a chromosome whose inheritance can be monitored.
Markers can be expressed regions of DNA (genes) or more often some segment of DNA with no
known coding function but whose pattern of inheritance can be determined. When DNA markers
are available, they can be used to determine if variation at the molecular level (allelic variation at
marker loci along the linkage map) is linked to variation in the quantitative trait. If this is the case,
then the marker is linked to, or on the same chromosome as, a quantitative trait locus or QTL which
has allelic variants causing variation in the quantitative trait.
Until recently, the number of DNA markers identified in livestock genome was comparatively
limited, and the cost of genotyping the markers were high. This constrained experiment designed to
detect QTL to using a linkage mapping approach. If a limited number of markers per chromosome
are available, then the association between the markers and the QTL will persist only within
families and only for a limited number of generations, due to recombination. For example in one
sire, the A allele at a particular marker may be associated with the increasing allele ofthe QTL,
while in another sire, the a allele at the same marker may be associated with the increasing allele at
the QTL, due to historical recombination between the marker and the QTL in the ancestors of the
two sires. To illustrate the principle of QTL mapping exploiting linkage, consider an example where
a particular sire has a large number of progeny. The parent and the progeny are genotyped for a
particular marker. At this marker, the sire carries the marker alleles172 and 184, Figure 1.2.
Figure 1.2. Principle of quantitative trait loci (QTL) detection, illustrated using an abalone example. A sire is
heterozygous for a marker locus, and carries the alleles 172 and 184 at this locus. The sire has a large
number of progeny. The progeny are separated into two groups, those that receive allele 172 and
those that receive allele 184. The significant difference in the trait of average size between the two
groups of progeny indicates a QTL linked to the marker. In this case, the QTL allele increasing size
is linked to the 172 allele and the QTL allele decreasing size is linked to the 184 allele (Figure adapted
from Nick Robinson).
84
The progeny can then be sorted into two groups, those that receive allele 172 and those that receive
allele 184 from the parent. If there is a significant difference between the two groups of progeny,
then this is evidence that there is a QTL linked to that marker.QTL mapping exploiting linkage has
been performed in all nearly livestock species for a huge range of traits. The problem with mapping
QTL exploiting linkage is that, unless a huge number of progeny per family or half sib family are
used, the QTL are mapped to very large confidence intervals on the chromosome. To illustrate this,
consider the formula that Darvasi and Soller (1997) gave for estimating the 95% CI for QTL location
for simple QTL mapping designs under the assumption of a high density genetic map. The formula
was CI=3000/(kN2)where N is the number of individuals genotyped, is the allele substitution
effect (the effect of getting an extra copy of the increasing QTL allele) in units of the residual
standard deviation, k the number of informative parents per individual, which is equal 1 for halfsibs and backcross designs and 2 for F2 progeny, and 3000 is about the size of the cattle genome in
centi-Morgans. For example, given a QTL segregates on a particular chromosome within a half sib
family of 1000 individuals, for a QTL with an allele substitution effect of 0.5 residual standard
deviations the 95% CI would be 12 cM. Such large confidence intervals have two problems. Firstly if
the aim of the QTL mapping experiment is to identify the mutation underlying the QTL effect, in a
such a large interval there are a large number of genes to be investigated (80 on average with 20 000
genes and a genome of 3000cM). Secondly, use of the QTL in marker assisted selection is
complicated by the fact that the linkage between the markers and QTL is not sufficiently close to
ensure that marker-QTL allele relationships persist across the population, rather marker-QTL phase
within each family must be established to implement marker assisted selection.
Designs for QTL detection in livestock
The designs used to detect the QTLs in livestock vary from experimental backcross and F2
populations to half sib designs that use existing family structures within a commercial population.
The situation in case of livestock is however ticklish as compared to plant species. The absences of
inbred lines in livestock, maintenance of the experimental populations are prohibitively expensive.
The other part is reproductive capacity and generation interval is often the limiting in the choice of
experimental design. These factors have to be taken into account in both the design and analysis of
QTL experimental. Using a sparse marker map of 10-20 cMspacings, several designs have been
used to detect regions across the genomes in livestock.
Experimental crosses for QTL detection in livestock
The experimental crosses have been implemented in pigs and poultry as the generation intervals
are relatively short and the number of offspring is moderate to high. Such crosses have been
established between domestic breeds and descendants of their wild progenitors as well between
phenotypically divergent commercial breeds. The analysis is generally the same as that used for
inbred crosses i.e., markers alleles in the second generation are traced back to their line origin and
contrasts for putative QTL are estimated as differences between lines.
Exploiting existing family structures
In case of large ruminants common approach of large paternal half sib family structures that occur
where the usage of artificial insemination is common. In these half sib designs the genotypes are
collected on a number of grandsires and their half sib offspring. Phenotypes are either collected on
the half sib offspring themselves or on a group of progeny from each half sib. In dairy cattle where
the number of daughters of each sire are more than 100with phenotype records, the three
generation half sibdesign (A granddaughter design)has been a common practice in several
advanced nations. The analysis using least square analysis makes no assumptions about number of
QTL alleles and estimates a unique QTL effect within each half sib family. As the half sib family
structures also exist within experimental crosses, these models are sometimes also fitted on line
85
cross populations to gain further insights into identified QTL by checking which F1 parents are
most likely to be heterozygous for a QTL and allow for which QTL genotypes of the F 1 parents. The
same methodology can be extended to accommodate full sib family structures, provided family
sizes are sufficient, and such an approach has only be used in poultry breeding.
When we have in mind a single trait or an index of traits as the main focus, a potentially efficient
use of genotyping resources can be achieved by selective genotyping, where only the individuals
showing a more extreme phenotype within a family or a cross are genotyped. Selective genotyping
pooling, where marker allele frequencies are high pools are contrasted with those in the low pools
within families. The approach can reduce the number of genotyping further, but requiresmore
careful design, analysis and consideration of technical aspects if it is not to lead to detection of
spurious QTL.
Utilising marker information in genetic evaluation programs
The value of genotypic information for predicting the genetic merit of animals is dependent
on the predictive ability of the marker genotypes. The three types of molecular lociviz.; direct
markers, LD markers and LE markers differ not only in methods of detection but also in methods
of their incorporation in genetic evaluation procedures. Whereas direct and, to a lesser
degree, LD markers, allow selection on genotype across the population, use of LE markers
must allow for different linkage phases between markers and QTL from family to family, i.e.
LE markers are family specific and family specific information must be derived.
Utilising information of QTLs in selection models
By using QTL information in genetic evaluation, in principle, part of the assumed polygenic
variation is substituted by a separate effect due to a genetic polymorphism at a known locus. This
has the immediate effect of having a much better handle on the Mendelian sampling process, as
phenotypic covariance can be evaluated based on specific genetic similarity rather than on an
average relationship.
A number of different approaches have been described to accommodate marker information in
genetic evaluation. Roughly, these methods can be distinguished through their modeling of the
QTL effect and through the type of genetic marker information used. The QTL effect can be
modeled as random or fixed, while the molecular information comes from LE, LD or direct
markers.
With a fixed QTL model, regression on genotype probabilities would be used in genetic
evaluation to account for the effect of QTL polymorphisms. In the simplest additive QTL model,
suitable for estimating breeding values, simple regressions could be included on the probability of
carrying the favourable mutation. Regression can be on known genotypes (class variables), or
probabilities can be derived for ungenotyped animals in a general complex pedigree. A fixed QTL
model is sensible if few alleles are known to be segregating, and where dominance and/or epistasis
are important. The model also assumes effects being the same across families. The effects of various
genotypes could be fitted separately, giving power to account for dominance and epistasis in case
of multiple QTL. For selection purposes, a fixed QTL effect, if additive, would be added to the
polygenic estimated breeding values (EBVs), similar to breed effects in across breed
evaluations. The advantage of a fixed QTL model is the limited number of effects that need to be
fitted. Alternatively, QTL effects could be modeled as random effects, with each individual having
a different QTL effect. Covariances are based on the probability of QTL alleles being identical by
descent rather than on numerator relationships as in the usual animal model with polygenic
effects. With full knowledge about segregation, this would effectively fit all founder alleles as
different effects. The random QTL model makes no assumptions about number of alleles at a QTL
and it automatically accommodates possible interaction effects of QTL with genetic background
86
(families or lines). Therefore, the random QTL model is less reliant on assumptions about
homogeneity of QTL effects. The random QTL model is a natural extension to the usual mixed
model and seems therefore a logical way to incorporate genotype information into an overall
genetic evaluation system. These models result in EBVs for QTL effects along with a polygenic
EBV. The total EBV is the simple sum of these estimates. One of the main computational limitations
of this method, however, is the large number of equations that must be solved, which increases
by two per animal for each QTL that is fitted.
Genetic evaluation using direct markers
When the genotype of an actual functional mutation is available, no pedigree information is
needed to predict the genotypic effect, as QTL genotypes are measured directly. When there is
only a small number of alleles, the number of specific genotypes is limited. In genetic evaluation,
it would seem appropriate to treat the genotype effect as a fixed effect, i.e. the assumption is that
genotype differences are the same in different families and herds or flocks. Such assumptions
might be reasonable for a biallelic QTL model in a relatively homogeneous population.
Alternatively, random QTL models could be used with different effects for different founder alleles,
or even QTL by environment interactions. In both fixed and random QTL models, genotype
probabilities can be derived for individuals with missing genotypes.
Genetic evaluation using LE markers
When the genotype test is not for the gene itself, but for a linked marker, QTL probabilities derived
from marker genotypes will be affected by the recombination rate between marker and QTL
and by the extent of LD between the QTL and marker across the population. If LD between the
QTL and a linked marker only exists within families, marker effects or, at a minimum, marker QTL
linkage phase must be determined separately for each family. This requires marker genotypes and
phenotypes on family members. If linkage between the marker and QTL is loose, phenotypic
records must be from close relatives of the selection candidate because associations will erode
quickly through recombination. With progeny data, marker QTL effects or linkage phases can
be determined based on simple statistical tests that contrast the mean phenotype of progeny that
inherited alternate marker alleles from the common parent. A more comprehensive approach is
based on Fernando and Grossmans (1989) random QTL model, where marker information from
complex pedigrees can be used to derive covariances between QTL effects, yielding best linear
unbiased prediction (BLUP) of breeding value for both polygenic and QTL effects. Random
effects of paternal and maternal QTL alleles are added to the standard animal model with random
polygenic breeding values. The variance covariance structure of the random QTL effects, also
known as the gametic relationship matrix (GRM), is based on probabilities of identity by descent
(IBD), and is now derived from co-segregation of markers and QTL within a family. Probabilities
of IBD derived from pedigree and marker data link QTL allele effects that are expected to be equal
or similar, therefore using data from relatives to estimate an individuals QTL effects. For example,
if two paternal halfsibsi and j have inherited the same paternal allele for markers that flank the
QTL (with recombination rate r), they are likely IBD for the paternal QTL allele and the
correlation between the effects of their paternal QTL alleles will be (1r)2. The method is
appealing, but computationally demanding for large scale evaluations, especially when not all
animals are genotyped and complex procedures must be applied to derive IBD probabilities.
Genetic evaluation using LD markers
Most QTL projects have moved towards fine mapping where the final result is a marker or
marker haplotype in LD with the QTL, if not the direct mutation. A haplotype of marker alleles
close enough to the putative QTL is likely to be in LD with QTL alleles. Such a marker test
provides information about QTL genotype across families, and is in a sense not very different
87
from a direct marker. The most convenient way to include genotypic information from marker
haplotypes in genetic evaluation systems is through the random QTL model. In their original paper,
Fernando and Grossman (1989) derived IBD from genotype data on single markers and
recombination rates between marker and QTL.
However, the random QTL model is more
versatile, and covariances based on IBD probabilities can also use information beyond pedigree,
based on LD. The latter can be derived from marker orhaplotype similarity, e.g. based on a
number of marker genotypes surrounding a putative QTL. Meuwissenand Goddard (2001)
proposed using both linkage and LD information to derive IBD based covariances (termed LDL
analysis). Lee and van der Werf (2005) showed that with denser markers, the value of linkage
information, and therefore pedigree, reduces. Hence, when QTL positions become more
accurately defined, genetic information from close markers (within a few cM) can be used
increasingly to derive LD based IBD probabilities, thereby defining covariances between random
QTL effects without the need for a family structure or information through pedigree.
Lee and van der Werf (2006) have shown that LD information results in a very dense GRM.
Genetic evaluation, which is usually based on mixed model equations that are relatively sparse, is
currently not feasible computationally for the LDL method for a large number of individuals
and alternative models are needed. One approach is to model population wide LD by simply
including the marker genotype or haplotype as a fixed effect in the animal model evaluation,
as suggested by Fernando (2004). An advantage of modeling population wide LD effects as fixed
rather than random is that fewer assumptions about population history are needed. A disadvantage
is that estimates are not BLUPed, i.e. regressed towards a mean depending on the amount of
information that is available to estimate their effects. This will be important if some of the genotype
or haplotype effects cannot be estimated with substantial accuracy because the number of
individuals with that genotype or haplotype is limited. Haplotype effects could also be fitted as
random, but more development is needed in this area.
In general, for the purpose of increased genetic
change of economically important
quantitative traits, and in the context of well recorded and efficient breeding programmes, there is
no need to have knowledge of functional mutations since nearby markers will have a high
predictive value about genetic merit. Moreover, the benefit from the extra investment and time
spent on finding functional mutations might be superseded by the genetic change that can
be made in the breeding programme in the meantime.
Implementation of marker assisted genetic evaluation
It is important to note that, for most of the gene marker tests which are in use do not get
integrated in the evaluation programs. This is because the gene testing is either for a Mendelian
characteristic, or it predicts phenotypic differences for traits that are not the same as those in
current genetic evaluation. Moreover, breeders would not only be interested in more accurate
EBVs based on gene markers, but they would also want to know the actual QTL genotypes
for their breeding animals. This information on individual genotype will become less relevant if
more gene tests become available and if testing becomes cheaper and more widespread. This might
still take some years. Thus, as gene marker testing is gradually introduced, it is more likely to
create additional selection criteria to consider and it will take some time before QTL
information is seamlessly and optimally integrated in existing genetic evaluation programmes. In
particular, if genetic evaluation is based on information from many different breeding units,
such as in cattle or sheep, genotyping information will initially be available for only a small
proportion of the breeding animals, possibly not justifying a total over haul of the system for
genetic evaluation. Simple ad hoc procedures where QTL effects are estimated and presented
separately as additional effects are initially a more likely route to implementation.
88
Solutions for fixed QTL genotype effects, along with genotype probabilities as outputs of
genetic evaluation, might be interesting to breeders and, compared with random QTL
effects, may be more likely to be presented and used separately from polygenic EBVs. This
would also be the case for genotypic information on Mendelian characters; where there is no
polygenic component.Thus Molecular information can be used to enhance both the processes of
integrating superior qualities of different breeds and within breed selection.
Between breed selection
Crossing breeds results in extensive LD, which can be capitalized upon using MAS in a number
of ways. If a large proportion of breed differences in the trait(s) of interest are due to a small
number of genes, gene introgression strategies can be used. If a larger number of genes are
involved, MAS within a synthetic line is the preferred method of improvement. Introgression of the
desirable allele at a target gene from a donor to a recipient breed is accomplished by multiple
backcrosses to the recipient, followed by one or more generations of intercrossing. The aim of the
backcross generations is to produce individuals that carry one copy of the donor QTL allele but that
are similar to the recipient breed for the rest of the genome. The aim of the intercrossing phase is to
fix the donor allele at the QTL. Marker information can enhance the effectiveness of the
backcrossing phase of gene introgression strategies by: (i) identifying carriers of the target gene(s)
(foreground selection); and (ii) enhancing recovery of the recipient genetic background
(background selection). The effectiveness of the intercrossing phase can also be enhanced through
foreground selection on the target gene(s). If the target gene cannot be genotyped directly,
carrier individuals can be identified based on markers that flank the QTL at <10 cM, because of
the extensive LD in crosses. The markers must have breedspecific alleles in order to identify
line origin. For the introgression of multiple target genes, gene pyramiding strategies can be used
during the backcrossing phase to reduce the number of individuals required (Hospital
andCharcosset, 1997; Koudandet al., 2000). For background selection, markers are used that are
spread over the genome at <20 cM intervals, such that most genes that affect the trait will be within
10 cM from a marker. Combining foreground and background selection, selection will be for the
donor breed segment around the target locus but for recipient breed segments in the rest of the
genome. Foreground selection will result in selection for both the target locus and for donor breed
loci that are linked to this locus, some of which could have an unfavorable effect on performance.
To reduce this so called linkage drag around the target locus, in the molecular score used for
background selection greater emphasis can be given to markers that are in the neighborhood of
the target locus (apart from the flanking markers, which are used in foreground selection).
Most studies have considered marker assisted introgression (MAI) of single QTL (e.g. Hospital
and Charcosset, 1997) but often several QTL must be introgressed simultaneously. Koudandet al.
(2000) showed that large populations are needed to obtain sufficient individuals that are
heterozygous for all QTL in the backcrossing phase. This would make MAI not feasible in livestock
breeding programmes. In many cases, however, immediate fixation of introgressed QTL alleles may
not be required. Instead, the objective of the backcrossing phase can be to enrich the recipient
breed with the favourable donor QTL alleles at sufficiently high frequency for selection
following backcrossing. The effectiveness of such strategies was demonstrated by Chaiwong et
al. (2002).
Within breed selection
The procedures described previously for incorporating markers in genetic evaluation result in
estimates of breeding values associated for QTL, together with estimates of polygenic breeding
values. Alternatively, if molecular data are not incorporated into genetic evaluations, as will be
the case for more ad hoc approaches and for gene tests for Mendelian characteristics,
89
separate selection criteria will be available that capture the molecular information. The following
three selection strategies can then be distinguished (Dekkers, 2004):
Select on the QTL information alone;
Tandem selection, with selection on QTL followed by selection on polygenic EBV;
Selection on the sum of the QTL and polygenic EBV.
Selection on QTL or marker information alone ignores information that is available on
all other genes (polygenes) that affect the trait and is expected to result in the lowest response to
selection unless all genes that affect the trait are included in the QTL EBV. This strategy does
not, however, require additional phenotypes other than those that are needed to estimate
marker effects, and can be attractive when phenotype is difficult or expensive to record (e.g.
disease traits, meat quality, etc.). Selection on the sum of the QTL and polygenic EBV is expected
to result in maximum response in the short term, but may be suboptimal in the longer term
because of losses in polygenic response (Gibson, 1994). Indexes of QTL and polygenic EBV can
be derived that maximize longer term response (Dekkers and van-Arendonk, 1998) or a
combination of short and longer term responses (Dekkers and Chakraborty, 2001). However, if
selection is on multiple QTL and emphasis is on maximizing shorter term response, selection on
the sum of QTL and polygenic EBV is expected to be close to optimal. Optimizing selection on a
number of EBVs, indexes and genotypes, while also considering inbreeding rate and other
practical considerations is not a trivial task. Kinghornetal., (2002) have proposed a mate selection
approach that could be used to handle such problems, and it can be expected that with more
widespread use of genotypic information for a larger number of regions, specific knowledge about
individual QTL becomes less interesting and will simply contribute to prediction of whole EBV or
whole genotype.
Meuwissenand Goddard (1996) reported a simulation study that looked at the main
characteristics determining efficiency of MAS using LE markers. They found that MAS could
improve the rate of genetic improvement up to 64 percent by selecting on the sum of QTL and
polygenic EBV. Their work also demonstrated that MAS is mainly useful for traits where
phenotypic measurement is less valuable because of: (i) low heritability; (ii) sex limited
expression; (iii) availability only after sexual maturity; and (iv) necessity to sacrifice the animal
(e.g. slaughter traits). Selection of animals based on (most probable) QTL genotype will allow
earlier and more accurate selection, increasing the short and medium term selection response.
Most simulation studies have assumed complete marker genotype information but in practice
only a limited number of individuals will be genotyped. However, in an advanced breeding
programme with complete information on phenotype and pedigree information, marker and QTL
genotype probabilities could be derived for ungenotyped animals and genotyping strategies
could be optimized to achieve a high value for the investments made. However, in breeding
programmes for more extensive production systems (beef, sheep), pedigree recording is often
incomplete and only a small proportion of animals are genotyped. Moreover, these genotyped
animals are not necessarily the key breeding animals. The utility of linked markers will be even
more limited if pedigree relationships cannot be used to resolve genotype probabilities and
marker QTL phase of ungenotyped individuals.
A second point of caution is that many studies on MAS have taken a single trait approach
and shown that genetic markers could have a large impact on responses for traits that are
difficult to improve by phenotypic selection. However, within the context of a multitrait
breedingobjective, the overall impact of such markers on the breeding goal may be less because a
greater response for one trait often appears at the expense of another. Therefore, the overall effect of
MAS on the breeding program will generally be much smaller than predicted for single trait MAS
favorable cases. The main effects of MAS would be to shift the selection response in favor of the
90
marked traits, rather than achieving much additional overall response. Hence, while it will be easier
to select for carcass and disease resistance, further improvement for these traits will be at the
expense of genetic change for production traits (growth, milk).
The impact of MAS on the rate of genetic gain may be limited in conventional breeding
programmes (ranging up to perhaps 10 percent extra gain) unless the variation in
profitability is dominated by traits that are hard to measure. However, new technologies often
lead to other breeding program designs being closer to optimal. Genotypic information has extra
value in the case of early selection and where within family variance can be exploited, which is
particularly the case in programs where reproductive technologies are used. Reproductive
technologies usually lead to early selection and more emphasis on between family selection. DNA
marker technology and reproductive technologies
are therefore
highly
synergistic and
complementary (van derWerf and Marshall, 2005) and gene markers have much more value in
such programmes. Gene marker information is also clearly valuable in introgression programmes,
as demonstrated by simulation (Chaiwonget al., 2002; Dominiket al., 2006) as well as in practice
(Nimbkaretal., 2005). Yet, although these examples are favorable to the value of gene marker
information, the added value of MAS still relies heavily on a high degree of trait and pedigree
recording.
Marker assisted selection in Indian context
Complete phenotypic and pedigree information is often only available in intensive breeding units.
Therefore, in the context of low input production systems, some questions can be raised concerning
the validity and practicality of the simulation studies described above, and it would be more
difficult to realize the value of marker information. It would be harder and more expensive to
determine the linkage phase in the case of using linked markers. Moreover, even if the genetic
marker were a direct or LD marker, its effect on phenotype would have to be estimated for the
population and the environment in which it is used. This would require phenotypes and
genotypes on a sample of a rather homogeneous population to avoid spurious associations that
could result from unknown population stratification. Therefore, a gene marker for a QTL is likely to
be most successful in an environment with intensive pedigree and performance recording.
Nevertheless, in low input environments, direct and LD markers will be more useful than
LE markers because the latter require routine recording of phenotypes and genotypes to
estimate QTL effects within families.
In addition to MAS within local breeds, several other strategies for breed improvement could
bepursued in developing countries, including gene introgression and MAS within synthetic breeds.
This would be most advantageous for introducing specific disease resistance alleles into breeds
with improved
production characteristics to make them more tolerant to the environments
encountered in developing countries. Gene introgression is, however, a long and expensive process
and only worthwhile for genes with large effects. MAS within synthetic breeds, e.g. a cross
between local and improved temperate climate breeds, can allow development of a breed that is
based on the best of both breeds (e.g. Zhang and Smith, 1992). Because of the extensive LD within
the cross, a limited number of markers would be needed. However it is important to avoid the
impact of genotype x environment interactions if MAS is implemented in a more controlled
environment.
References
Chaiwong, N., Dekkers, J.C.M., Fernando, R.L. and Rothschild, M.F. 2002.Introgressing multiple QTL in
backcross breeding programs of limited size. Proc. 7th Wld. Congr.Genet.Appl. Livest.Prodn. Electronic
Communication No. 22: 08. Montpellier, France.
Darvasi, A. and Soller, M. 1997. A simple method to calculate resolving power and confidence interval of QTL
map location. Behavior Genetics 27: 125.
91
Dekkers J.C.M. and Chakraborty, R. 2001.Potential gain from optimizing multi-generation selection on an
identified quantitative trait locus. J. Anim. Sci. 79: 2975.
Dekkers, J.C.M. and van Arendonk, J.A.M. 1998. Optimizing selection for quantitative traits with information
on an identified locus in outbred populations, Genet. Res. 71: 257275.
Dominik, S., Henshall, J., OGrady, J. and Marshall, K.J. 2007. Factors influencing the efficiency of a marker
assisted introgression program in merino sheep. Genet. Sel. Evol.,39 495.
Ewing B. and Green P. 2000. Analysis of expressed sequence tags indicates 35,000 human genes. Nat Genet.
25:232-4.
Fernando, R. and Grossman, M. 1989. Marker assisted selection using best linear unbiased prediction.
Genetics Selection Evolution,21: 467.
Fernando, R.L. 2004.Incorporating molecular markers into genetic evaluation.Session G6.1.Proc.55th Meeting
of the European Association of Animal Production. 59 September 2004, Bled, Slovenia.
Fischer, R. A. 1918. The correlation between relatives: the supposition of mendelaininheritance.Transactions
of the royal society of Edinburgh, 52:399.
Gibson, J.P. 1994. Short-term gain at the expense of long-term response with selection of identified loci, Proc.
5th Wld. Cong. Genet. Appl. Livest.Prodn. CD-ROM Communication No. 21: 201204. University of
Guelph, Canada.
Hayes, B. J. and Goddard, M.E. 2001.The distribution of the effects of genes affecting quantitative traits in
livestock. Genetics Selection Evolution 33: 209-229.
Henderson, C. R. 1984. Applications of linear models in animal breeding.Can. Catal. Publ. Data,Univ Guelph,
Canada.
Hospital, F. and Charcosset, A. 1997.Marker-assisted introgression of quantitative trait loci. Genetics 147:
1469.
Kinghorn, B.P., Meszaros, S.A. and Vagg, R.D. 2002.Dynamic tactical decision systems for animal breeding,
Proc. 7th Wld. Congr.Genet.Appl. Livest.Prodn.Communication No. 23-07. Montpellier, France.
Koudand, O.D., Iraqi, F., Thomson, P.C., Teale, A.J. and van Arendonk, J.A.M. 2000.Strategies to optimize
marker-assisted introgression of multiple unlinked QTL.Mammal. Genome 11:145150.
Lee SH, van der Werf JH.2004. The efficiency of designs for fine-mapping of quantitative trait loci using
combined linkage disequilibrium and linkage. Genet SelEvol.36:145.
Lee, S.H. and van der Werf, J.H.J. 2006. An efficient variance component approach implementing an average
information REML suitable for combined LD and linkage mapping with a general complex pedigree.
Genet. Sel. Evol. 38: 25.
Meuwissen, T. H. E., B. Hayes, and M. E. Goddard.2001. Prediction of total genetic value using genome-wide
dense marker maps. Genetics 157:1819.
Meuwissen, T.H.E. and Goddard, M.E. 1996.The use of marker haplotypes in animal breeding
schemes.Genet. Sel. Evol. 28: 161.
Nimbkar, C., Pardeshi, V. and Ghalsasi, P. 2005. Evaluation of the utility of the FecB gene to improve the
productivity of Deccani sheep in Maharashtra, India. pp. 145154. In H.P.S. MakkarandG.J.Viljoen, eds.
Applications of gene-based technologies for improving animal production and health in developing
countries. Netherlands, Springer.
Rothschild MF, Larson RG, Jacobson C, Pearson P. 1991. PvuII polymorphisms at the porcine oestrogen
receptor locus (ESR).Anim Genet. 22(5):448.
Shrimpton, A. E., Robertson, A. 1988. The Isolation of Polygenic Factors Controlling Bristle Scorein
Drosophila melanogaster. II. Distribution of Third Chromosome Bristle Effects Within Chromosome
Sections. Genetics 118: 445.
van der Werf, J.H.J. and Marshall, K. 2005. Combining gene-based methods and reproductive technologies to
enhance genetic improvement of livestock in developing countries, pp. 131144.In H.P.S Makkarand G.J.
Viljoen, eds. Applications of gene-based technologies for improving animal production and health in
developing countries. Netherlands, Springer.
Zhang, W. and Smith, C. 1992.Computer simulation of marker-assisted selection utilizing linkage
disequilibrium.Theor. Appl. Genet. 83: 813.
92
13
________________________________________________________________________________________
Real-time PCR is a recent modification of polymerase chain reaction (PCR) that allows precise
quantification of specific nucleic acids in a complex mixture by fluorescent detection of labeled PCR
products. It is also known as kinetic PCR, qPCR, qRT-PCR and RT-qPCR. Both specific, as well as
nonspecific fluorescent probes may be used for detection. Real-time PCR is often used in the
quantification of gene expression levels. Before using real-time PCR to quantify a target message,
care must be taken to optimize the RNA isolation, primer design, and PCR reaction conditions so
that accurate and reliable measurements can be made. Here we will be discussing some basic
aspects of real-time PCR, primer and probe designing guidelines, real time chemistries, as well as
real time quantification using both relative and absolute quantification approaches. Useful Web
sites have been mentioned in the text during discussion.
Real Time PCR
Real time PCR is a technique used to monitor the progress of a PCR reaction in real time. A
relatively small amount of PCR product (DNA, cDNA or RNA) can easily be quantified. Real Time
PCR is based on the detection of the fluorescence produced by a reporter molecule which increases,
as the reaction proceeds. This occurs due to the accumulation of the PCR product with each cycle of
amplification. These fluorescent reporter molecules include dyes that bind to the double-stranded
DNA (i.e. SYBR Green) or sequence specific probes (i.e. Molecular Beacons or TaqMan Probes).
Real time PCR facilitates the monitoring of the reaction as it progresses. One can start with minimal
amounts of nucleic acid and quantify the end product accurately. Moreover, there is no need for the
post PCR processing which saves the resources and the time. These advantages of the fluorescence
based real time PCR technique have completely revolutionized the approach to PCR-based
quantification of DNA and RNA. Real time PCR assays are now easy to perform, have high
sensitivity, more specificity, and provide scope for automation. Real time PCR is also referred to as
real time RT PCR which has the additional cycle of reverse transcription that leads to formation of a
DNA molecule from a RNA molecule. This is done because RNA is less stable as compared to
DNA.
Real Time PCR and Traditional PCR
Quantitative real-time PCR (qPCR) has become the most precise and accurate method for analyzing
gene expression. Prior to qPCR, the most common methods for determining expression levels were
northern blotting, RNase protection assays, or traditional endpoint reverse transcription (RT) PCR.
Endpoint RT-PCR was an improvement over the older methods due to its ease of use and the much
smaller amounts of RNA needed for the reaction. Traditional or conventional PCR uses gel
electrophoresis for the detection of PCR amplification in the final phase or at end-point of the PCR
reaction. In contrast, real-time PCR allows the accumulation of amplified product to be detected,
during the early phases of the reaction, and measured as the reaction progresses, that is, in real
time. In contrast, traditional RT-PCR can be useful for determining the presence or absence of a
particular gene product. The main advantage of real-time PCR over conventional PCR is that realtime PCR allows you to determine the starting template copy number with accuracy and high
sensitivity over a wide dynamic range. Real-time PCR results can either be qualitative (presence or
absence of a sequence) or quantitative (number of copies of DNA). In contrast, conventional PCR is
93
at best semi-quantitative. Additionally, real-time PCR data can be evaluated without gel
electrophoresis, resulting in reduced experiment time and increased throughput. Finally, because
reactions are run and data are evaluated in a closed-tube system, opportunities for contamination
are reduced and the need for post amplification manipulation is eliminated.
Different phases of Real Time PCR amplification
A typical qPCR amplification plot may be divided into four major phases (Figure 1): the linear
ground phase, early exponential phase, log-linear phase, and plateau phase. During the linear
ground phase (usually the first 1015 cycles), PCR is just beginning, and fluorescence emission is
yet to rise above the background. Baseline fluorescence is calculated at this point. At the early
exponential phase, the amount of fluorescence has reached a threshold where it is significantly
higher than background levels. The cycle at which this occurs is known as Ct or crossing point (CP).
This value is representative of the starting copy number in the original template and is used to
calculate experimental results. During the log-linear phase, PCR reaches its optimal amplification
period with the PCR product doubling after every cycle in ideal reaction conditions. Finally,
amplification reaches a plateau as the reaction components are exhausted and the fluorescence
intensity is no longer useful for data calculation. In the endpoint PCR, amplification can only be
viewed at the end of the reaction, and only the final plateau is observedany differences in initial
abundance are obscured.
9. Unknown: Unknown: A sample containing an unknown quantity of template. This is the sample
of interest (experimental sample as opposed to positive controls or standards) whose quantity is
being determined.
10. Background: It is due to the non PCR based fluorescence in the reaction due to presence of large
amount of double stranded DNA or inefficient quenching of the fluorophore.
11. Endogenous reference gene: This the gene whose expression level should not differ between
samples, such as a house keeping gene (GAPDH, HPRT, Beta actin etc).
12. Slope:Mathematically calculated slope of standard curve, e.g., the plot of Ct values against
logarithm of ten-fold dilutions of target nucleic acid. This slope is used for efficiency calculation.
Ideally, the slope should be 3.32 (3.1 to 3.6), which corresponds to 100% efficiency (precisely
1.0092) or two-fold (precisely, 2.0092) amplification at each cycle.
13. Reference dye: Used in all reactions to obtain normalized reporter signal (Rn) adjusted for well-towell variations by the analysis software. The most common passive reference dye is ROX and is
usually included in the master mix. Not all instruments require the use of a reference dye (see
Table 1 in Real-Time PCR by Qiagen).
14. ROX: 6-carboxy-X-rhodamine. Most commonly used passive reference dye for normalization of
reporter signals in ABI instruments. The emission recorded from ROX during the baseline cycles
(usually 3 to 15) is used to normalize the emission recorded from the reporter due to
amplification in later cycles. The use of ROX improves the results by compensating for small
fluorescent fluctuations such as bubbles and well-to-well variations that may occur in the plate.
ROX or any other internal reference dye is not required by all machines (see the list in Table 1
in Qiagen Publication: Checklist for Multiplex Real-time PCR). If in a ROX requiring instrument,
a master mix with lower than required ROX concentration is used, the SD will be large and may
be reduced by using an appropriate master mix.
15. Rn (normalized reporter signal): The fluorescence emission intensity of the reporter dye divided by
the fluorescence emission intensity of the passive reference dye. Rn+ is the Rn value of a reaction
containing all components, including the template and Rn is the Rn value of an unreacted
sample. The Rn value can be obtained from the early cycles of a real-time PCR run (those cycles
prior to a significant increase in fluorescence), or a reaction that does not contain any template.
16. Rn (delta Rn, dRn): The magnitude of the fluorescence signal generated during the PCR at each
time point. The Rn value is determined by the following formula:
2.
3.
4.
5.
6.
quantitation to be performed using the comparative C T (CT) method (Livak and Schmittgen,
2001). This method increases sample throughput by eliminating the need for standard curves.
G/C content:Whenever possible, select primers and probes in a region with a G/C content of 30 to
80%. Regions with a G/C content >80% may not denature well during thermal cycling, leading
to a less efficient reaction. G/C-rich sequences are susceptible to nonspecific interactions that
may reduce reaction efficiency and produce nonspecific signal in assays using SYBR Green
reagents. Avoid primer and probe sequences containing runs of four or more G bases.
Melting Temperature:When working with Primer Express software, you can select primers and
probes with the recommended melting temperature(Tm) using universal thermal cycling
conditions. It is generally recommended that the probe Tm should be 10 C higher than that of
the primers. In Primer Express software recommended Tm for probe is 68-70 C and for primer it
is about 58-60 C.
5End of probes:Primer Express software does not select probes with a G on the 5end. The
quenching effect of a G base in this position will be present even after probe cleavage. The
presence of a G base can result in reduced fluorescence values (Rn) that can negatively affect
assay performance. G bases in positions close to the 5end, but not on it, have not been shown to
compromise assay performance.
3End of primers:To reduce the possibility of nonspecific product formation, ensure that the last
five bases on the 3end of the primers do not contain more than two C and/or G bases. Under
certain circumstances, such as a G/C-rich template sequence, you may have to relax this
recommendation to keep the amplicon under 150 basepairs in length. In general, avoid primer
3ends extremely rich in G and/or C bases.
General considerations:Select the probe first, and then design the primers as close as possible to the
probe without overlapping the probe.
Intron spanning primer pair should be preferred in order to prevent potential signals from
genomic DNA contamination in the sample.
Make TaqMan MGB probes as short as possible without being shorter than 13 nucleotides.
Finally, if oligo(dT) is used for priming in reverse transcription, primers should be located
within 1000 bp of the 3' end of mRNA (Wang et al., 2006).
There are some free online tools or commercially available soft-wares which can be used for
primer design if the parameters described above are provided. The selected list of useful web
resources and some commercial programs is given in table 1.
one amplified products in the reaction and thus no specific amplification for a single DNA sequence
has been occurred. This method is not affected when the presence of variations (i.e. single
nucleotide polymorphisms or SNPs) on the target sequence. Moreover, less specialized knowledge
is required as compared to the designing of fluorescent labeled oligo probes.
Figure 2. Amplification data using SYBRGreen reagents. (a) Amplification plot (linear view) demonstrating
suspected nonspecific amplification in negative control (NC) wells; (b) Melt curve analysis
confirming that product in NC wells has a different melting temperature from the specific product
Sequence specific detection: Fluorescent probe based technology in Real-time PCR allows us to
perform sensitive and specific detection. Mostly, three types of probes are used having distinct
molecular structure and dyes attached. They are hybridization probes, hydrolysis probes and
hairpin probes. All detection methods using fluorescent probe technology rely on a process referred
to as fluorescence resonance energy transfer (FRET) in which the transfer of light energy between
two adjacent dye molecules occurs (Espy et al., 2006). However, both hydrolysis and hybridization
probes depend on FRET to change fluorescence emission intensity; the energy transfer works in
opposite manners in these two chemistries. While FRET reduces fluorescence intensity in
hydrolysis probes, it increases intensity in hybridization probes.
A. Hybridization probes
In an assay/reaction one or two hybridization probes can be used. When two hybridization probes
are used, they bind to target sequence in close proximity to each other in a head-to-tail arrangement
(Figure 3). The upstream probe carries an acceptor (or quencher) dye on its 3' end the second probe
or downstream probe is labeled with a donor (or reporter) dye on 5' end. On the other hand, in one
probe method, the upstream primer is labeled with an acceptor dye on the 3' end instead of labeling
probe. Thus, labeled primer replaces the function of one of the probes used two hybridization probe
method. In both cases, the energy transfer depends on the distance between two dye molecules.
Because of the distance between two dyes in solution, donor dye emits only background
fluorescence. When the probes hybridize to their complementary sequence, this binding brings the
97
two dyes in close proximity to one another and FRET occurs at high efficiency. Since, a fluorescent
signal is detected only as a result of two independent probes hybridizing to their correct target
sequence, increasing amounts of measured fluorescence is proportional to the amount of DNA
synthesized during the PCR reaction. Moreover, as the probes are not hydrolyzed, fluorescence
signal is reversible and allows the generation of melting curves (Bustin, 2000).
B. Hydrolysis probes
Hydrolysis probes (also known as TaqMan probes or 5' nuclease assay) contain a fluorescent
reporter dye at its 5' end and quencher dye at its 3' end. If the probe is unbound, reporter and
quencher dyes are maintained in close proximity, which allows the quencher to reduce the reporter
fluorescence intensity by FRET, and thus no reporter fluorescence is detected (Bustin, 2000) (Figure
3). After annealing to the target sequence, the bound and quenched probe will be degraded by the
DNA polymerases 5 nuclease activity during the extension step of the PCR. Probe degradation
allows separation of the reporter from the quencher dye, resulting in increased fluorescence
emission (Figure 3). Minor groove binders (MGBs), such as dihydrocyclopyrroloindoletripeptide
(DPI3), may be added to these probes to increase their Tm and allow the use of a shorter probe.
These probes are less expensive, display reduced background fluorescence and a larger dynamic
range due to increased efficiency of reporter quenching. Hydrolysis probes commonly are in
structure of nucleic acids, however, recently developed, Locked Nucleic Acids (LNA) containing
hydrolysis probes are commercially available from Roche Applied Science under the name of
Universal
Probe
Library
(UPL)
probes
and
can
be
accessed
online
(www.universalprobelibrary.com). LNAs are DNA nucleotide analogues with increased binding
strengths compared to standard DNA nucleotides. In order to maintain the specificity and Tm,
LNA bases are incorporated in each UPL probes.
98
C. Hairpin probes
Hairpin or stem-loop DNA probes display an increased specificity of target recognition compared
to linear DNA probes. Hairpin DNA probes are single-stranded oligonucleotides and contain a
sequence complementary to the target that is flanked by self-complementary target unrelated
termini. Invention of hairpin probes is let to view hybridization process in real-time. They are
widely used in different applications and two major factors are responsible for such broad
applications of these DNA probes: Enhanced specificity of the probetarget interaction and the
possibility of closed-tube real-time monitoring formats (Broude, 2005). There are several types of
hairpin probes commercially available including molecular beacons, scorpions, LUXTM fluorogenic
primers and SunriseTM Primers (Figure 4).
a. Molecular beacons
This class of hairpin probes and first developed in 1996 (Tyagiand Kramer 1996). A molecular
beacon is a dye-labelled oligonucleotide (2540 nt) that forms a hairpin structure with a stem and a
loop (Figure 4A). The 5' and 3' ends of the probe have complementary sequences of 56 nucleotides
that form the stem structure. The loop portion of the hairpin is designed to hybridize specifically to
a 1530 nucleotide section of the target sequence. A fluorescent reporter molecule is attached to the
5' end of the molecular beacon, and a quencher is attached to the 3' end. Formation of the hairpin
therefore brings the reporter and quencher together, so no fluorescence is emitted. During the
annealing step of the amplification reaction, the loop portion of the molecular beacon binds to its
target sequence, causing the stem to denature. The reporter and quencher are thus separated,
quenching is abolished, and the reporter fluorescence is detectable. Because fluorescence is only
emitted from the probe when it is bound to the target, the amount of fluorescence detected is
proportional to the amount of target in the reaction. The fluorescence of the probe increases 100fold even when it binds to its target Molecular beacons have some advantages over other
chemistries. They are highly specific, can be used for multiplexing, and if the target sequence does
not match the beacon sequence exactly, hybridization and fluorescence will not occur which is
especially desirable for allelic discrimination experiments. Unlike TaqMan assays, molecular
beacons are displaced but not destroyed during amplification, because a DNA polymerase lacking
5' exonuclease activity is used. The main disadvantage of using molecular beacons is that they are
difficult to design. The stem of the hairpin must be strong enough so that the molecule will not
spontaneously fold into nonhairpin conformations that result in unintended fluorescence. At the
same time, the stem of the hairpin must not be too strong; otherwise the beacon may not properly
hybridize to the target.
b. Scorpions primers
These assays use two primers. Scorpions combine the detection probe with the upstream PCR
primer and consist of a fluorophore on the 5end, followed by a complementary stem-loop structure
(also containing the specific probe sequence), quencher dye, DNA polymerase blocker, and finally a
PCR primer on the 3end (Figure 4B). During the first amplification cycle, the Scorpions primer is
extended, and the sequence complementary to the loop sequence is generated on the same strand.
After subsequent denaturation and annealing, the loop of the Scorpions probe hybridizes to the
internal target sequence, and the reporter is separated from the quencher. The resulting fluorescent
signal is proportional to the amount of amplified product in the sample. The Scorpions probe
contains a PCR blocker just 3' of the quencher to prevent read-through during the extension of the
opposite strand.
c. LUX TM fluorogenic primers
Light upon extension (LUX) primers (Invitrogen, Carlsbad, CA, USA) are self-quenched singlefluorophorelabeled primers. These assays employ two primers, one of which is a hairpin-shaped
99
primer with a fluorescent reporter attached near the 3' end (Figure 4C). The reporter is quenched by
the secondary structure of the hairpin. During amplification, the LUX primer is incorporated into
the product, eliminating the quenching hairpin structure, so fluorescence is emitted. LUX primers
are designed to have a G or C 3'-terminal nucleotide and fluorophore attached to the second or
third base (Thymine nucleotide) fromthe 3' end. It also has five to seven nucleotides 5'-tail that is
complementary to the 3' end of the primer. Such a design of the primer allows the molecule to form
a blunt-end hairpin structure with low fluorescence at temperatures below its Tm.
d. SunriseTM primers
The Sunrise primer-probes, originally created by Oncor (Gaithersburg, MD, USA), are
bifunctional molecules similar to Scorpions primer-probes which combine both the PCR primer and
detection mechanism in the same molecule. The Sunrise primer-probes have dual-labeled (reporter
and quencher fluorophores) hairpin loop on the 5end, with the 3end acting as the PCR primer
(Figure 4D). Unbound intact hairpin causes reporter quenching via FRET. Upon integration into the
newly formed PCR product, the reporter and quencher are held far enough apart to allow reporter
emission.
relative quantification requires less set up time and easier to perform than absolute quantification
because a standard curve is not essential (Livak, 2001; Fraga et al., 2008). Furthermore, it is
commonly not necessary to know the absolute amount of mRNA in biological applications
examining gene expression (Bustin, 2002; Huggett et al., 2005).
Absolute Quantification: Absolute quantification requires a standard calibration curve using
serially diluted standards of known concentrations for highly specific, sensitive and reproducible
result. Linear relationship between Ct and initial amounts of total RNA or cDNA using standard
curve allows the detection of unknowns concentration based on their Ct values. In this method, all
standards and samples are assumed to have equal amplification efficiency. It is necessary to control
the efficiency of the Real-time PCR reaction to quantify mRNA levels (Fraga et al., 2008). Real-time
PCR amplification efficiencies for calibration curve and target cDNA must have identical reverse
transcription efficiency to provide a valid standard for mRNA quantification (PfaffandHageleit,
2001). The amplification efficiencies of the standard and unknown target sequence should be
approximately equal and the concentration of the serial dilutions should be within the range of the
unknown(s) in order to ensure correct results. The standard and target sequence should have the
same primer binding sites and produce a product of approximately the same size and sequence
(Fraga et al., 2008). The standard can be based on known concentrations of double-stranded DNA
(dsDNA), single-stranded DNA (ssDNA), commercially synthesized long oligonucleotide and
complementary RNA (cRNA) bearing the target sequence.
DNA standards can be synthesized by cloning the target sequence into a plasmid, purifying a
conventional PCR product, or may directly be synthesized chemically. These standards have a
property of larger quantification range, greater sensitivity, more reproducibility and higher stability
than RNA standards. However, DNA standards are generally not possible to use as a standard for
absolute quantitation of RNA because there is no control for the efficiency of the reverse
transcription step. (Livak, 2001; Wong and Medrano, 2005). Therefore, RNA molecules are strongly
recommended as standards for quantification of RNA.
For RNA standard preparation, an in vitro-transcribed sense RNA transcript is generated
followed by a digestion with RNase-free DNase so eliminate DNA contamination. A recombinant
RNA (recRNA) can be synthesized in vitro by cloning the DNA of the gene of interest (GOI) into a
suitable vector, containing typically SP6, T3, or T7 phage RNA polymerase promoters. Several
commercial kits are available that facilitate the production of RNA from these vectors. After in vitro
transcribed RNA (standard RNA) is synthesized, the standard concentration is measured on a
spectrophotometer and converted the absorbance to a target copy number per g RNA (Bustin,
2000). Once the standard has been accurately quantified, it is serially diluted in increments of 5- to
10-fold and each dilution should be run in triplicate (Fraga et al., 2008). The dilutions should be
made over the range of copy numbers that include the likely amount of target mRNA expected to
be present in the experimental samples to maximize accuracy (Bustin, 2000; Fraga et al., 2008). The
average Ct values from each dilution are then plotted versus the absolute amount of standard
present in the sample to generate a standard curve (Figure 5).Comparison of experimental Ct
values to this standard curve produces an estimate of the amount of target present in the initial
sample (Bustin, 2000; Fraga et al., 2008).
101
Figure 5. Showing standard curve for absolute quantificationandestimation of conc of an unknown sample
102
ii) Normalized Expression (CT) method:In this approach, loading differences are eliminated.
Moreover, the Ct values of both the control and the samples for target gene are normalized to an
appropriate housekeeping or reference gene. This method also known as
2Ct method. Formulas are given below in eq.3 and 4.
R = 2Ct
(3)
R = 2[Ct sample Ct control]
(4)
Ct (sample) = Ct target gene Ct reference gene
Ct (control) = Ct target gene Ct reference gene
Ct = Ct (sample) Ct (control)
The reaction is rigorously optimized and the PCR product size should be kept small (less than 150
bp). Comparative Ct method can be chosen when assaying a large number of samples because the
standard curve is unnecessary. This model is acceptable for a first approximation of the crude
expression ratio. However, efficiency (E) corrected models are useful to obtained reliable relative
expression data (Pfaffl et al., 2009).
(B) Relative quantification with efficiency correction: Pfaffl Method
The 2CT method for calculating relative gene expression is only valid when the amplification
efficiencies of the target and reference genes are similar. If the amplification efficiencies of the two
ampicon are not the same, an alternative formula must be used to determine the relative expression
of the target gene indifferent samples. To determine the expression ratio between the sample and
calibrator, use the following formula:
=
, ( )
, ( )
This Pfaffl model combines gene quantification and normalization into a single calculation. This
model incorporates the amplification efficiencies of the target and reference (normalization) genes
to correct for differences between the two assays. The relative expression software tool (REST),
which runs in MicrosoftExcel, automates data analysis using this model (Pfaffl et al., 2002; Pfaffl et
al., 2009). REST uses the Pairwise Fixed Reallocation Randomization Testto calculate result
significance and will indicate if the reference gene used is suitable for normalization.
(C) Relative quantification by standard curve method: When the amplification efficiencies of the
target and the endogenous are not same then this method is used. In this method standard curves
are prepared for both the target and the endogenous control. For each experimental sample, the
amount of target and endogenous control is determined from the appropriate standard curve.
Then, the target amount is divided by the endogenous control amount to obtain a normalized target
value. One of the experimental samples is designated as the calibrator or 1x sample. The calibrator
is usually the expression level at baseline and the experimental samples are those collected after
treatment or some intervention. Each of the normalized target value is divided by the calibrator
normalized target value to generate the relative expression levels.
Technical and biological replicates
Depending on the applications, the use of technical and biological replicates or both has to be
considered. Often, the same cDNA sample is analyzed in triplicate in one RT-qPCR run. This type
of technical replicate only tells something about the pipetting skills of the operator and the accuracy
of the PCR instrument. Biological replicates refer to the application of the same treatment to two or
more samples. From each of the samples, the RNA isolation and cDNA synthesis are performed
independently but under identical conditions. Each of the obtained cDNA samples can be analyzed
once by RT-qPCR. Both types of replicates (technical or biological) provide information about the
experimental variation and allow statistics to be applied to identify differences in expression levels
103
between samples. Being a beginner, it is a good practice to include technical replicates to test for
pipetting skills. When testing the amplification efficiency of a new primer set, it is advisable to
include at least a triplicate of each dilution point. When investigating the effects of a treatment, the
use of biological replicates we think is of greater value. For example, in an in vitro experiment, cells
are incubated in the presence or absence of a stimulus. The treatment is repeated in at least three
replicate wells. Each of the three wells is a biological replicate; however, the cells are derived from a
single individual. More relevant would be to repeat the same in vitro experiment on cells isolated
from three different individuals, each of them being a biological replicate.
Table 1:Useful web link and some commercial programs for primer and probe designing
Software
Primer3
Name
Picking primer and hybridization probes
Primer-BLAST
RTPrimer DB
PrimerBank
OligoCalc
URL
http://frodo.wi.mit.edu/primer3
/input.htm
http://www.ncbi.nlm.nih.gov/to
ols/primer-blast/
http://www.rtprimerdb.org/
http://pga.mgh.harvard.edu/pri
merbank/
Universal Probe
Library
Primer Express
Beacon
Designer
Primer Premier
http://www.basic.northwestern.e
du/biotools/oligocalc.html
www.universalprobelibrary.com
or
http://www.roche-appliedscience.com
www.appliedbiosystems.com
http://www.premierbiosoft.com
Primer Design
http://www.premierbiosoft.com
References
Broude, N. E. 2005. Molecular Beacons and Other Hairpin Probes. Encyclopedia of Diagnostic Genomics and
Proteomics, 846-850 Marcel Dekker, Inc., New York.
Bustin, S. A. 2000. Absolute quantification of mRNA using real-time reverse transcription polymerase chain
reaction assays. J. Mol. Endocrinol.,25: 169.
Bustin, S. A. 2002. Quantification of mRNA using real-time reverse transcription PCR (RTPCR): trends and
problems. J. Mol. Endocrinol., 29:23.
Espy, M. J., Uhl, J. R., Sloan, L. M., Buckwalter, S. P., Jones, M. F., Vetter, E. A., Yao, J. D., Wengenack, N. L.,
Rosenblatt, J. E., Cockerill, F. R. 3rd., and Smith,T. F. 2006. Real-time PCR in clinical microbiology:
applications for routine laboratory testing. Clin.Microbiol. Rev.,19: 165.
Fraga, D., Meulia, T., and Fenster, S. 2008. Real-Time PCR. In: Current Protocols Essential Laboratory
Techniques, Gallagher, S. R., and Wiley, E. A. (Eds), 10.3.110.3.34, John Wiley and Sons, Inc. Retrieved
from http://onlinelibrary.wiley.com/doi/10.1002/9780470089941.et1003s00/full
Huggett, J., Dheda, K., Bustin, S., and Zumla, A. 2005. Real-time RT-PCR normalization; strategies and
considerations. Genes Immun., 6: 279.
Livak, K. J., and Schmittgen, T. D. 2001. Analysis of relative gene expression data using realtime quantitative
PCR and the 2(-Delta Delta C(T)) Method. Methods, 25: 402.
Livak, K.J., 2001. Relative quantification of gene expression, ABI Prism 7700 Sequence detection System User
Bulletin #2;.http://docs.appliedbiosystems.com/pebiodocs/04303859.pdf.
Peirson, S. N., Butler, J. N., and Foster, R. G. 2003. Experimental validation of novel and conventional
approaches to quantitative real-time PCR data analysis. Nucleic Acids Res., 31: e73.
104
Pfaffl, M. W. 2001a. A new mathematical model for relative quantification in real-time RTPCR. Nucleic Acids
Res., 29, 9, e45
Pfaffl, M. W., and Hageleit, M. 2001. Validities of mRNA quantification using recombinant RNA and
recombinant DNA external calibration curves in real-time RT-PCR. Biotechnology Letters, 23, 4, 275-282
Pfaffl, M.W., G.W. Horgan, and L. Demp-fle. 2002. Relative expression software tool (REST) for group-wise
comparison and statis-tical analysis of relative expression results in real-time PCR. Nucleic Acids Res.
30:e36
Pfaffl, M. W., Vandesompele, J., & Kubista M. (2009). Data analysis software, In: Real-time PCR: Current
Technology and Applications, Logan, J., Edwards K., and Saunders N., pp. 65-83, Caister Academic Press,
978-1-90-44-55-39-4, Norfolk, UK.
Souaz, F., Ntodou-Thom, A., Tran, C. Y., Rostne, W., and Forgez, P. (1996). Quantitative RT-PCR: limits and
accuracy. Biotechniques, 21, 2, 280-285.
Tyagi, S., and Kramer, F. R. (1996). Molecular beacons: probes that fluoresce upon hybridization. Nat.
Biotechnol., 14, 303308.
Wang, X., and Seed, B. (2006). High-throughput primer and probe design, In: Real-time PCR, Dorak T. M., pp.
93-106, International University Line, 0-4153-7734-X, New York, USA.
Wittwer, C. T., Herrmann, M. G., Gundry, C. N., and Elenitoba-Johnson, K. S. (2001). Real-time multiplex PCR
assays. Methods, 25, 4, 430-442.
Wong, M. L., and Medrano, J. F. (2005). Real-time PCR for mRNA quantitation. Biotechniques, 39, 1, 75-85.
105
14
High Throughput Techniques for Transcriptome Analysis in Farm
Animals with Special Reference to Expression Microarrays
Manishi Mukesh and Monika Sodhi
ICAR-National Bureau of Animal Genetic Resources, Karnal (Haryana)
________________________________________________________________________________________
Gene expression analysis is increasingly becoming important in many fields of biological
research including livestock research. Understanding expressed genes pattern is critical to provide
insights into complex regulatory networks and identification of genes relevant to new
biological processes . Developments in bio informatics and molecular biology have added
several tools to the arsenal of molecular biologists to study the gene expression and novel gene
discovery. The techniques for the evaluation of gene expression have progressed from methods
developed for the analysis of single, specific genes like, Northern blotting to the techniques
aimed at identifying all genes that differ in expression between or among experimental
samples like, subtractive hybridization, expressed sequence tags (ESTs), serial analysis of gene
expression (SAGE), microarrays etc.
High throughput gene expression profiling has emerged over the last decade as one of the most
important and powerful approaches in livestock genomic research. The advancement in this area
has largely been driven by the microarray technology, wherein mRNA expression level of
potentially the entire genome in particular tissue/cells can be assessed simultaneously. The rapidly
increasing popularity of this technology to dissect the entire transcriptome of livestock genome is
evidenced by number of publications in recent years. The microarray technology has revolutionized
the study of gene expression and has given rise to an unprecedented increase in the rate of data
acquisition in analysis of gene transcript regulation in complex eukaryotic genome enabling large
numbers of genes, up to the order of tens of thousands, to be evaluated simultaneously. The
objective of a microarray experiment might be to investigate genes which are differentially up or
down regulated in cells between, a control group and cells which have undergone some treatment,
or between cells of animals of different genetic background (e.g.control mice compared to knockout
mice) or between cells in healthy tissue and diseased tissues, or between cells at different time
points (e.g. developmental biology). Because the expression pattern of a gene is tied to its biological
role, microarray studies of global gene expression can provide detailed insights into the regulation
of specific sets of genes linked by function. In the past decade or so, there has been a rapid progress
in the development of new methods to quantify the gene expression at genome wide level.
Expression microarrays and RNA-seq are currently the two most widely used genome-wide gene
expression quantification methods.The high throughput techniques like DNA microarray have
proved revolutionary tool to ultimately link entire genome expression and whole organism
function by allowing for the study of the expression of a vast numbers of genes under a range of
experimental conditions.
Numerous studies have been published addressing the critical issues of microarray experimental
design, data analyses, and application of microarray technology to investigate normal physiology
and disease pathogenesis. The method is based on the phenomenon of preferential complementary
base pairing, known as hybridization, and produces its signal by parallel hybridization of labeled
targets to specific probes that have been immobilized on a solid surface in an ordered array. The
core principle behind microarrays is the hybridization between two DNA strands, the property of
complementary nucleic acid sequences to specifically pair with each other by forming hydrogen
bonds between complementary nucleotide base pairs. Thus, DNA microarrays are an orderly array
106
of target DNA material immobilized onto a substrate, normally a coated glass microscope slide in
a precise, well-known pattern. Each probe corresponds to either a complete transcript or part of a
transcribed sequence which is tethered onto the array and the target is a labeled pool of DNA that is
complementary to mRNA. Two of the major requirements of any good microarray platform are
system reproducibility, which provides the means for high confidence experiments and accurate
comparison across multiple samples; and high sensitivity, for the detection of significant gene
expression changes, including small fold changes across multiple gene sets. All components of
microarray workflow (such as probe design, printing process, RNA sample quality, labeling,
microarray processing, scanning of the images and feature extraction algorithms) can affect the
quality of the data acquired.
Types of microarray platforms
There are two principle DNA microarray methods based upon the nature of the target arrayed
DNA material (cDNA or oligonucleotide microarrays) and method of spotting DNA (mechanical
microspotting or photolithography). The number of target genes that make up an array can range
from a small number of specific well-characterized genes to a pool of thousands of genes that may
comprise entire genomes. For certain model organisms including Arabidopsis, yeast, mouse, and
human, both cDNA and oligonucleotide arrays are commercially available and are suited to
medical diagnostics and drug discovery applications. For many non-model organisms used in
physiological studies, custom arrays can be constructed from a number of different target DNA
sources including: cDNAs clones obtained from normalized libraries, ESTs, oligonucleotides,
genomic clones or genomic DNA. Obtaining this target DNA material remains a costly barrier to
employing microarray technology for a large number of non-model physiologically interesting
organisms. These days, oligo arrays and whole genome arrays have superseded the cDNA arrays in
terms of quality, reliability and spot uniformity and avoid some of the technical pitfalls of cDNA
arrays. The oligos representing transcripts/genes are physically spotted or printed onto a solid
surface. There are various types of microarray platforms that are commercially available for
different species. Arrays can be tissue specific (mammary, immune response genes specific) or
whole genome (representing all genes expressed in an organism).Two of the major requirements of
any microarray platform are system reproducibility, which provides the means for high confidence
experiments and accurate comparison across multiple samples; and high sensitivity to detect even
small fold changes across multiple gene sets. Agilent whole genome bovine 44K chip harboring 60
mer oligos is one such very popular platform for detecting accurate differential expression. Bovine
whole genome platforms from Affymetrix are coming with shorter oligos (25-35 mer) built by
photolithographic masks. Microarray platforms from Illumina are also available for bovine and
other species. The bead chip from Illumina consists of 50 mer oligos attached to beads randomly.
Generally the cost of spotted arrays is lower than that of Affy- or Illumine arrays.
Strategies to utilize gene expression microarrays
Usually, microarrays allow for the direct comparison of expression patterns of all the target
genes spotted on an array between samples taken under two conditions or treatments. Different
fluorophores are used to label cDNA prepared from either total RNA or messenger RNA, typically
representing control and experimental conditions. Many types of fluorescent dyes are available for
microarray experiments. However, the most common dyes used for microarray studies are Cy3 and
Cy5. The fluorescently labeled cDNAs are mixed, and the probe is hybridized to target DNA
samples on the array, where labeled messenger sequences will quantitatively anneal to target
DNA sequences. However, the two dyes have non-linear sample labeling and hybridization
kinetics, which means that they do not provide equal sensitivity across the whole range of
transcripts in a sample. More specifically, they have differential labeling and scanning efficiencies
107
and also exhibit gene-specific bias. To combat this, the roles of the dyes are often exchanged and the
procedures of hybridization and scanning repeated, known as a dye-swap, means exchanging the
dye labels across samples. Taking a suitable average of both dye-swap pair ratios removes dye-bias,
giving more reliable results. If a dye-swap has not been performed, gene-specific dye-bias cannot
easily be removed. The contribution and cause of gene-specific dye-bias to the underlying variation
has not been properly characterized, however there has been recent research in this area aimed at
modeling this effect.
Recently, because of the availability of high quality microarrays and robust workflow, several
groups have started utilizing one colour (intensity based) microarrays which are much simpler to
perform than the traditionally more common two colour (ratio based) microarrays. In one colour
intensity based microarrays, researchers simply hybridize each available sample on one microarray.
Therefore, a one colour microarray provides the ability to compare the measured gene expression
output of a microarray directly across other microarrays to generate new and multiple ratio-metric
measurements. Therefore, one colour microarrays differ from two colour approach, where all gene
expression ratios are generated only from two samples compared on the same microarrays. For one
colour microarray experiments, mostly Cy3 is chosen because it is less susceptible to degradation
by environmental factors such as ozone, pH, and organic solvents as compared to Cy5 dye. In
general, between and within slide replication, as well as the use of well-characterized control genes
are used to ensure accuracy. Automated processes calculate a relative measure of gene expression
within the two samples for each of the target DNA samples present on the array. The overall
expression pattern of all genes collectively is known as an expression profile, wherein genes that
are up-regulated or down-regulated can easily be identified. Detailed descriptions of DNA
microarray protocols are outlined and available from several web-based sources
(e.g.http://www.gene-chips.com; http://cmgm.stanford.edu/pbrown/mguide/index.html).
Analysis of microarray gene expression data
With the generation of large amounts of microarray data, it has become increasingly important to
address the challenges of data quality and standardization related to this technology. The major
concern of data quality control is to detect problematic raw probe-level data (array with spatial
artefacts or with poor RNA quality for example) to facilitate the decision of whether to remove this
array from further analysis. Data-QC is followed by two other pre-processing steps. The first step is
data normalization. It is a fundamental step which aims at removing systematic bias and noise
variability caused by technical and experimental artifacts. Whereas the aim of the next step or data
filtering step is to discard the probe sets with very low expression across the samples (and that
provide no biological information) in order to reduce noise in data and to avoid wrong
interpretations of the final results. Of the most common normalization procedure is to choose a
gene-set which consists of genes for which expression levels should not change under the
conditions studied (housekeeping gene), that is the expression ratio for all genes in the gene-set is
expected to be 1. From that set, a normalization factor, which is a number that accounts for the
variability seen in the gene set, is calculated. It is then applied to the other genes in the microarray
experiment. The normalization procedure is carried out only on the background corrected values
for each spot.
Computational data analysis tasks such as data mining which includes classification and
clustering are used to extract useful knowledge from microarray data. In addition, relating gene
expression data with other biological information; it will provide kind of biological discoveries such
as transcription factor biding site analysis, pathway analysis, and protein-protein interaction
network analysis. Identification of differential gene expression is the first task of an in depth
microarray analysis. Differentially expressed genes are the genes whose expression levels are
significantly different between two groups of experiments. There are two common methods for in
108
depth microarray data analysis, i.e. clustering and classification. Clustering is one of the
unsupervised approaches to classify data into groups of genes or samples with similar patterns that
are characteristic to the group. Generally, classification is a process of learning-from-examples.
Given a set of pre-classified examples, the classifier learns to assign an unseen test case to one of the
classes. Clustering is the most popular method currently used in the gene expression data matrix
analysis. It is used for finding co-regulated and functionally related groups. There are three
common types of clustering methods (i.e.) hierarchical clustering, k-means clustering and selforganizing maps. Classification is also known as class prediction, discriminant analysis, or
supervised learning.
In the earlier stage, simple fold change approach was used to find differences under
assumption that changes above some threshold, (For example, two-fold) were biologically
significant. Several univariate statistical methods were used later to determine either the expression
or relative expression of a gene from normalized microarray data, including t-tests, modified t-test
known as SAM, two-sample t-tests, F-statistic and Bayesian models. For more complex datasets
with multiple classes, Analysis of Variance (ANOVA) techniques were also used. Due to the large
number of genes represented on a microarray, this may lead to a large number of false positive calls
or Type 1 error. This demand is addressed by the concept of the False Discovery Rate (FDR).
Factors determining FDR are the proportion of truly differentially expressed genes, the distribution
of the true differences, measurement variability and sample size. Benjamini and Hochberg
described a procedure to control the FDR under the assumption that the test statistics arising from
the true null hypotheses are independent. The FDR must be smaller than the number of real
differences that one finds - which in turn depends on the size of the differences and variability of
the measured expression values.
Classification, clustering and identification of differential genes can be considered as basic
microarray data analysis tasks with gene expression profiles alone. However, Gene expression
profiles can be linked to other external resources to make new discoveries and knowledge. The
identification of functional elements such as transcription-factor binding sites (TFBS) on a wholegenome level can be one of the challenging tasks. Transcription factors play a prominent role in
transcription regulation; identifying of their binding sites is central to annotating genomic
regulatory regions and understanding gene-regulatory networks. Protein-protein interactions (PPI)
are also useful tools for investigating the cellular functions of genes. It is a core of the entire
interactomics system of any living cell. Several databases that have been developed to store protein
interactions such as the Biomolecular Interaction Database (BIND), Database of Interacting Proteins
(DIP), IntAct, STRING and the Molecular Interaction Database (MINT). Combining coexpressed as
well as interacting genes in the same cluster several meaningful predictions related to gene
functions, evolutionary relationships and pathways can be made.
The next promising method for analysing microarray data is pathway analysis as it involves the
cascade of network interactions. Analysing the microarray data in a pathway perspective could
lead to a higher level of understanding of the system. This integrates the normalized array data and
their annotations, such as metabolic pathways and gene ontology and functional classifications.
Metabolic pathway analysis can identify more subtle changes in expression than the gene lists that
result from univariate statistical analysis. Gene Set Enrichment Analysis (GSEA) is a computational
method that determines whether a set of genes shows statistically significant and concordant
differences between two biological states. The gene sets are defined based on prior biological
knowledge, e.g. published information about biochemical pathways, located in the same
cytogenetic band, sharing the same Gene Ontology category, or any user-defined set. The goal of
GSEA is to determine whether members of a gene set tend to occur toward the top (or bottom) of
the list, in which case the gene set is correlated with the phenotypic class distinction.
109
lactation cycle by generating whole genome expression pattern coupled with metabolic/hormonal
pathways has become high priority area of research in animal genomics that can yield a wealth of
information on as yet unknown molecular adaptations in response to physiological stage of the
animal. Such inputs can relate functional development of mammary gland of dairy animals with
coordinated changes in the global expression pattern to understand the basic biology of mammary
gland development that is far from complete.
RNA sequencing (RNA Seq)
Recent advances in high throughput sequencing technologies (Next or Second Generation
Sequencing) have introduced a new alternative to microarrays, namely RNA-seq. After years of
extensive investigations based on the characterization of genome-wide gene expression through
oligonucleotide-based array technologies, transcriptomics has gained new momentum, thanks to
the advent of Next Generation Sequencing (NGS). This tool quantifies gene expression by
sequencing short strands of cDNA, aligning sequences obtained back to the genome or
transcriptome, and counting the aligned reads for each gene.
Until the advent of RNA-Seq, microarrays were the standard tool for gene expression
quantification. But with the development of new sequencing technologies and bioinformatic tools,
RNA-Seq has emerged as an appealing alternative to classical microarrays in measuring global
genomic expressions (Wolf et al. 2010). The RNA-seq, also called whole-transcriptome shotgun
sequencing, refers to the use of high-throughput sequencing technologies for characterizing the
RNA content and composition of a given sample. RNA-seq technology, unlike microarray, does not
depend on the prerequisite knowledge of the reference transcriptome. Further, RNA-seq data
contains very low background signal, a higher dynamic range of expression levels, and also
relatively small amount of total RNA required for quantification, when compared to microarray.
Therefore, gene detection in RNA-Seq, unlike microarray, is not dependent on probe design; rather
it relies on short nucleotide reads mapping which can attain exceedingly high resolution.
Although both RNA-seq and microarrays are generally in good agreement when it comes to
relative gene expression quantification (Nookaew et al. 2012), RNA-seq has clear advantages as it
can have sufficient coverage and captures a wider range of expression values. It is able to identify
transcripts that have not been previously annotated and it can quantify both very low transcripts
(unlike microarrays where there is background noise interference), and very high ones. As a digital
measure (count data), it scales linearly even at extreme values, whereas microarrays show saturation of analog-type fluorescent signals (Marioni et al. 2008). RNA-seq further provides
information on RNA splice events; these are not readily detected by standard microarrays
(Mortazavi et al. 2008). However, microarray technology is still widely used because of lower costs
and wider availability. While RNA-seq will most likely take the lead role in transcriptome analysis
in the near future, one should not forget that RNA-seq data collection and statistical analysis are
still under development. Several other problems related to read errors, overwhelming amount of
ribosomal RNA (rRNA) in the data, short reads, and variation of read density along the length of
the transcript, possess a challenge for this high-throughput method. Additionally, cost of NGS,
necessary computing, data storage facilities and bioinformatics expertise associated with these
technologies is still quite demanding compared to microarrays. Thus, microarrays should not be
dismissed by default, and it is worth considering which application is best suited for addressing the
question at hand before engaging in a large RNA-seq experiment.
The continuous development of bioinformatics and genomic approaches for improved
annotation combined with new data analysis tools that enable cross-species comparisons will
greatly enhance the extraction of biological information from species specific microarrays/RNA seq
and advance our understanding of livestock biology. From the economic point of view, the
importance and impact of genome wide tools in modern agriculture is likely to increase in coming
111
years. Over the longer term, these high-throughput technologies would reshape the livestock
biology in terms of functional annotation and discovery of new gene regulating trait of economic
importance, complete description and understanding of cellular pathways (e.g., metabolism,
proliferation, cell-cell interaction), understanding genomic-environment interaction (e.g.,
developmental pathways, abiotic stress, nutritional genomics and infectious diseases). This would
further help in identification of target molecules for improvement and selection of better
performing livestock species to ensure food security meeting the challenges of increasing global
population.
References
Bernard C et al. (2007). New indicators of beef sensory quality revealed by expression of specific genes. J Agric
Food Chem.55:52295237
Byrne KA, Wang Y H, Lehnert S A, Harper G S, McWilliam S M, Bruce H L and Reverter A (2005).Gene
expression profiling of muscle tissue in Brahman steers during nutritional restriction. J. Anim. Sci. 83:1-12
Caetano AR, Johnson RK, Ford JJ, Pomp D. (2004). Microarray profiling for differential gene expression in
ovaries and ovarian follicles of pigs selected for increased ovulation rate. Genetics, 168: 1529-1537
Hayashi KG, Ushizawa K, Hosoe M and Takahashi T. (2010).Differential genome-wide gene expression
profiling of bovine largest and second-largest follicles: identification of genes associated with growth of
dominant follicles.Reproductive Biology and Endocrinology, 8:11
Loor JJ, Everts RE, Bionaz M, Dann HM, Morin DE, Oliveira R, Rodriguez-Zas SL, Drackley JK, and Lewin
HA (2007). Nutrition-induced ketosis alters metabolic and signaling gene networks in liver of
periparturient dairy cows.Physiol. Genomics 32: 105-116
Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y (2008). RNA-seq: an assessment of technical
reproducibility and comparison with gene expression arrays. Genome Research, 18: 1509 1517
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008). Mapping and quantifying mammalian
transcriptomes by RNA-Seq. Nature Methods, 5: 621-628
Moyes K M, Drackley J K, Morin DE, Rodriguez-Zas SL, Everts R E, Lewin HA, and Loor JJ. (2010). Mammary
gene expression profiles during an intramammary challenge reveal potential mechanisms linking
negative energy balance with impaired immune response. Physiol Genomics, 41(2): 161 170
Nookaew I, Papini M, Pornputtpong N et al. (2012). A comprehensive comparison of RNA-Seq-based
transcriptome analysis from reads to differential gene expression and cross-comparison with
microarrays: a case study in Saccharomyces cerevisiae. Nucleic Acids Research, 40: 10084 10097
Reecy J M, Moody SD and CH Stah (2006). Gene expression profiling: Insights into skeletal muscle growth
and development. Journal of Animal Sciences, 84:E150-E154
Suchyta SP, Sipkovsky S, Halgren RG, Kruska R, Elftman M, Weber-Nielsen M, Vandehaar MJ, Xiao L,
Tempelman RJ, Coussens PM (2003) Bovine mammary gene expression profiling using a cDNA
microarray enhanced for mammary-specific transcripts. Physiol Genomics,16:818
Sudre K, Cassar-Malek I, Listrat A, Ueda Y, Loroux C, Jurie C, Auffrag C, Renand G, Martin P, and
Hocquette JF. (2005). Biochemical and transcriptomic analyses of two bovine skeletal muscles in
Charolais bulls divergently selected for muscle growth. Meat Sci. 70:267277
Ushizawa K, Herath CB, Kaneyama K, Shiojima S, Hirasawa A, Takahashi T, Imai K, Ochiai K, Tokunaga T,
Tsunoda Y, Tsujimoto G, Hashizume K. (2004). cDNA microarray analysis of bovine embryo gene
expression profiles during the pre-implantation period. Reprod Biol Endocrinol.2, 77
Wang, YH, Byrne KA, Reverter A, Harper GS, Taniguchi M, McWilliam S M, Mannen H, Oyama K, and
Lehnert S A. (2005). Transcriptional profiling of skeletal muscle tissue from two breeds of cattle. Mamm.
Genome, 16:201210
Wolf JBW, Bayer T, Haubold B et al. (2010). Nucleotide divergence versus gene expression differentiation:
comparative transcriptome sequencing in natural isolates from the carrion crow and its hybrid zone with
the hooded crow. Molecular Ecology, 19: 162-175
112
15
________________________________________________________________________________________
Genetic variation
Genetic variability is a measure of tendency to differ an individual from the average of the
population it belonged. Genetic variability also underlies the differential susceptibility of organisms
to diseases and sensitivity to toxins or drugs a fact that has driven increased interest in
personalized medicine given the rise of the human genome project and efforts to map the extent of
human genetic variation. Genetic recombination is one of the sources of variability. During the
process of meiosis, two homologous chromosomes from the male and female are crossed over
randomly on one another and tend to exchange gene sequences. Once split apart, it produces its
own offspring. This process is governed by its own sets of genes that code for where crossover can
occur and mechanism of exchange of DNA chunks. Recombination also vary in frequency and
location, thus, it can be selected to increase fitness by nature. More recombination results more
variability and its easy for a population to adapt for a changing environment.
Genetic polymorphism
Genetic polymorphism refers to the difference in DNA sequence among individuals, groups, or
populations, and can be caused by mutations ranging from a single nucleotide base change to
variations in several hundred bases. The progress of molecular genetic technology during last two
decades has generated many advances including the discovery of DNA based markers, which
immensely contributed to the development of gene mapping. This would facilitate to identify
genes, which control part of the variability of the phenotypic traits. Broadly, two experimental
strategies have been developed for this purpose: linkage studies and candidate gene approach.
Linkage studies rely on genetic map knowledge, searching quantitative trait loci (QTL) by using
family information and comparing segregation patterns of genetic marker and the traits being
analyzed. Markers that tend to co-segregate with the analyzed trait provide approximately
chromosomal location of the underlying genes.
SNP genotyping methods
For SNP genotyping, there are many techniques available. One key feature of most SNP genotyping
techniques, apart from those based on direct hybridization, is the two steps separation: 1)
generation of allele-specific molecular reaction products; 2) separation and detection of the allele
specific products for their identification.
Direct Hybridization Techniques:
i.
Dot Blot
ii.
Reverse Dot Blot technique
Techniques Involving Generation and Separation of an Allele-Specific Product:
i.
Restriction Fragment length polymorphism (RFLP)
ii.
Single strand conformation polymorphism (SSCP)
iii.
Primer extension
iv.
Oligonucleotide ligation assay (OLA)
v.
Invasive cleavage of oligonucleotide probes (invader assay)
vi.
Pyrosequencing
vii.
Array based high throughput genotyping
113
BeadChip name
No. of SNPs
(Approximate)
No. of mapped
SNPs
(Assembly)
Chicken
Dog
Multiple
chips*
CanineSNP20
22,362
Dog
CanineHD
170,000
Cattle
BovineSNP50
54,001
Cattle
BovineHD
777962
22,000
(CanFam2.0)
170,000
(CanFam2.0)
52,255
(Btau4.0)
>90% (Btau)
Cattle
Bovine3K**
3,000
3,000 (Btau4.0)
Horse
EquineSNP50
54,602
Pig
PorcineSNP60
64,232
Sheep
OvineSNP50
54,241
54,602
(EquCab2.0)
55,446
(Sscrofa9)
Average
interval
between
SNPs
(kb)
Average
MAF
across tested
populations
125
0.27
14.3
0.23
51.5
0.25
<3
>.05
43.2
0.21
40.7
0.27
46
~0.3
Release status
Open
with
restriction
Commercially
available
Commercially
available
Commercially
available
Commercially
available
Commercially
available
Commercially
available
Commercially
available
Commercially
available
traits in humans. Similarly, SNP markers in livestock allow use of association analyses of
economically important traits. Advancement in field of polymorphism studies has led to scaling up
from few SNPs to genome-wide SNPs. In most studies concerning the associations between
genotype and production traits the results are highly dependent on a breed, an animal population,
and even on a herd. Now the question is what are appropriate methods and models to use on these
data?
Binary traits: A complex binary trait is a character that has a dichotomous expression but with a
polygenic genetic background. The trait like disease comes under this category. Association
between genotype at a particular locus and disease can arise in three ways namely 1.) The locus
may be causally related to the disease, different allele carrying different risks.2) the locus may not
itself be causal but may be sufficiently close to causal locus as to be in linkage disequilibrium with
it. Association may be due to confounding by population stratification or admixture. If the
association due to confounding then it is of little interest and should be excluded. Measure of
association for disease traits can be done in form of risk rate or odd ratios.
Genotype
Disease
Yes
No
AA
X AA
1- X AA
Aa
X Aa
1- X Aa
aa
X aa
1- X aa
If there are three genotype AA, Aa and aa with relative penetrance as shown in above table and
allele a is more common form then aa genotype is taken as reference and relative risk is calculated
as AA = X AA / Xaa and Aa = X Aa / Xaa . The odd ratio is calculated as X* AA = X AA *1- X
aa / 1- X AA * X aa and as X* Aa = X Aa *1- X aa / 1- X Aa * X aa.
The test of association is done using test log likelihood ratio test and score test. Both these
statistics are asymptotically distributed as chi squared with two degree of freedom and both can be
expressed as simple function of observed frequencies and corresponding expected frequencies. The
Chi-squared test can also be done with assumption that all three genotype have equal rate of
disease or each genotype has different disease rate. A 2 x 2 (allele) or 3 x 2 (genotype) table is
analyzed. Mixed model are used if fixed effects thought to affect incidence of condition.
Continuous traits: For the association studies, the traits of interest can be analyzed using the
General Linear Model (GLM) procedure of the software packages like SAS, SPSS or SYSTAT and
the least square means of the genotypes can be compared by the Tukey test. The linear model used
to fit the quantitative variables can include, in addition to the genotype effect, other factors which
affect the trait. A simple model involving genotype effect can be represented like:
Y ijk =A+G i +Cj+e ijk , where Y ijk =production trait, A=overall mean, G i =fixed effect of the ith
genotype, C j = any other fixed effect and e ijk =random error. In order to exclude from the analysis
genotypes with small number of animals and avoid confounding between genetic groups and
genotype effects on traits of interest, genotypes with very low frequency in the total animal sample
or genetic groups showing a single genotype are not analyzed. G j estimates will contain the
additive effects of each of the two gene regions constituting the genotype, the general and specific
dominance interactions between the two gene regions and the average epistatic effects between
each of the two gene regions and the remaining genotype of each of the individuals within each G j
group. Genotype can be fitted as random effect when aim is to identify the best genotype for
selection and to obtain Best Linear Unbiased Predictions (BLUPs). The most common use of a
mixed model to test the association between a genetic marker and a phenotype is to fit the marker
as a fixed effect and a polygenic component modeled as a random effect. The random effect is the
115
individual taxon (strains, inbred lines or varieties). A likelihood ratio test against the chi-square
distribution (when using maximum likelihood), or the Wald test against either the chi-square or
normal distribution when using restricted maximum likelihood (REML), is performed to assess the
significance of the effect of a polymorphic marker.
Genome Wide Association studies (GWAS): GWAS involves correlating allele frequencies at each
of several hundred thousand markers spaced throughout the genome with trait variation in a
population-based sample. GWAS is based on the premise that a causal variant is located on a
haplotype, and therefore a marker allele in Linkage Disequilibrium with the causal variant should
show an association with a trait of interest. One of the advantages of the GWAS approach is that it
is unbiased with respect to genomic structure and previous knowledge of the trait etiology, in
contrast to candidate gene studies, where knowledge of the trait is used to identify candidate loci
contributing to the trait of interest. Therefore, GWAS results hold the promise to reveal causal
genes not previously suspected in disease etiology and to estimate relatively complete genetic
effects (additive and non-additive) and pleiotropy in an unbiased way.
In a GWAS, allele frequencies at thousands if not millions of loci are compared in individuals of
varying phenotype. Defining the phenotype is an important consideration because phenotypic
heterogeneity can reduce power. Other complexities, including data quality per individual and per
SNP, batch effects and relatedness among samples as well as genetic outliers must be accounted for
to avoid systematic bias. GWAS analysis tests for association of each SNP with qualitative or
quantitative trait value in hundreds to tens of thousands of individuals. For quantitative traits,
linear regression or Spearmans rank correlation is used to test each SNP for association between
trait values and genotype. For categorical traits (e.g., case-control status or phenotypic extremes),
chi-square or contingency table-based tests can be used in addition to logistic regression tests. The
statistical power of a GWAS is a function of sample size, effect size, causal allele frequency, and
marker allele frequency and its correlation with the causal variant. Population stratification must be
addressed in these analyses. Stratified analysis (e.g., using a CochranMantelHaentzel test),
population structure covariates (e.g., inferred population assignments, or principal component
analysis (PCA) eigenvectors are approaches for dealing with cryptic population structure. Various
strategies exist for testing associations between markers and traits. The most common methods of
association analysis involve fitting one marker at a time. An iterative, stepwise regression, proceeds
by fitting the marker with the strongest association first, then retesting the remaining markers for
significance after. Additional markers are added in a similar fashion until a stopping criterion is
met. A different strategy is to fit all the markers simultaneously as random effects. The distribution
of the markers can then be modeled according to a Bayesian framework. EMMA/R, TASSEL,
ASREML and SAS Proc Mixed and WOMBAT are little software suitable for association analysis.
Genomic breeding value and genomic selection
Genomic selection is the ultimate application of markers in animal breeding. Genomic evaluation
has been developed to predict breeding values using dense marker maps. The introduction of highthroughput single nucleotide polymorphism (SNP) genotyping methods has cleared the way for
implementation of genomic selection. Several studies have shown that genomic selection is
significantly more accurate than traditional selection of young animals, especially for lowheritability traits. This has led to a great need for developing flexible and efficient software for
genomic evaluation in livestock. Methods commonly used to estimate genomic breeding values
(GEBV) are best linear unbiased prediction from mixed model analysis using a genomically
estimated relationship matrix (G-BLUP), random regression BLUP (R-BLUP) and different non
linear methods. For most of the economically important traits in livestock, accuracy of linear
models was shown to be similar to non linear methods or even more accurate. Only for traits that
116
are lowly heritable and controlled by few large QTL, the nonlinear methods were more accurate.
The model to predict GEBVs, considering only additive genetic effects, is described as:
Yi = fixed effects + animali + (SNPijk) + ei
model to estimate marker effects, the accuracy of GEBVs strongly depends on the linkage
disequilibrium between marker and QTL loci that is consistent between the reference population
and the animals for which GEBVs are predicted. The accuracy of the estimated marker effects
depends on the characteristics of the reference population, such as the number of included
phenotypes, sampling of animals from the population and the heritability of the trait.
References
Barbara E. Stranger,Eli A. Stahl and Towfique Raj 2011. Progress and Promise of Genome-Wide Association
Studies for Human Complex Trait. Genetics, 187: 367-383.
Brookes, A.J. 1999. The essence of SNPs. Gene, 234: 177.
Calus M. P. L. 2010. Genomic breeding value prediction: methods and procedures. Animal, 157.
Curi R.A., de Oliveira H.N., Silveira A.C., Lopes C.R. 2005 Association between IGF-I, IGF-IR and GHRH
gene polymorphisms and growth and carcass traits in beef cattle. Livestock Production Science, 94: 159.
Duncan, B.K. and Miller, J.H. 1980.Mutagenic deamination of cytosine residues in DNA. Nature, 287: 560.
Jayakumar, S. and Ved Prakash. Phenomic and genomic tools for analysis of livestock genome.146.
Lander E.S. et al. 2001. Initial sequencing and analysis of the human genome. Nature, 409: 860.
M. Ota, H. Fukushima, J. K. Kulski, and H. Inoko, 2007. Single nucleotide polymorphism detection by
polymerase chain reaction-restriction fragment length polymorphism. Nat Protoc., 2857.
Meuwissen, T.H.E., Hayes, B.J. and Goddard, M. E. 2001. Prediction of total genetic value using genome-wide
dense marker maps. Genetics 157: 1819.
Meyer K. 2007 WOMBATA tool for mixed model analyses in quantitative genetics by restricted maximum
likelihood (REML) J Zhejiang Univ Sci B. 8: 815.
Yan, H., Kinzler, K.W. and Volgelstein, B. 2000. Genetic testing, present and future. Science 289: 1890.
Zhiwu Zhang, Edward S. Buckler, Terry M.Casstevens and Peter J. Bradbury. 2009. Software engineering the
mixed model for genome-wide association studies on large samples Briefings in Bioinformatics. 10: 664.
117
16
________________________________________________________________________________________
Data measurements
Some data measures are listed as Mega: 220106; Giga: 230109 ;Tera: 2401012; Peta: 2501015; Exa:
2601018.Some of the Data Sizesare listed as: 1,000 Bytes (1 KB) is size of an email; Size of Human
Chr-1 is 250MB; 4GB is thesize of DVD; 1,000,000,000,000bytes (1Terra Bytes) is 1/15th Library ofUS
Congress (256 DVDs); 5 TB is the size of primary data fr. Illumina HiSeq2K. Main memory sizes are
listed as: Personal computer: 1 GByte; Top supercomputer: 10 TByte. Disk space (Byte): Single disk
2004: 200 GByte: Top supercomputer: 700 TByte. (Source: Slides by Thomas Ludwig).
Computational Performance (floating point operations per second = Flops) are listed as: Modern
processor: 3 Giga Flops; Top supercomputer: 33 Peta Flops. Network performance (Byte/s):
Personal computer: 10/100 MByte/s; Supercomputer networks: gigabytes/s
Applications in bioinformatics
As the number of sequenced genomes has considerably increased, inadequatecomputational power
has become a bottleneckfor research in evolutionarybioinformatics. Finding all common
genesbetween any two different species that come from asingle gene of the last common ancestor,
referred to as orthologs will take more than 60 years of computationwith a modern personal
computer. Ifa researcher aims to finish the ortholog computation ina week, it requires at least 60
52 = 3120computing nodes for a whole week without interruptions (Kim et al, 2012).Although
many research institutions now provide clustercomputing services that enable users toexecute
multiple computing jobs in parallel, computingresources of this size may not be available
becausescalability of a system is limited by the total hardwarecapacity of the hosting institution,
which is shared by a number of users. A few bioinformatics applications of high performance
computing from literature are listed in the following subsections.
Next Generation Sequencing
Next Generation Sequencing data analysis is one of the most demanding applications in
bioinformatics (Perez-Sanchez et al., 2014). A major limitation associated with NGS data analyses is
the requirement of large data storage and High Performance Computing facilities (Kadarmideen,
2014).Starting from procedure like alignments andvariant calling to more complex challenges like
genome wide annotations andbiomarkers correlation to diseases, NGS analyses are timeconsuming.High performance computing provides advantages in this field of genomics applied to
medicine and healthcare.
Global aligners are very fast with use of particular data representation approaches, such as the
Burrows-Wheeler transform (BWT). Butthey are quite slowto achieve the optimal result through the
backtrackingapproach, despite the use of reliable representations ofdata. It becomes more complex
when local alignments are needed.GPU based solutionssuch as CUSHAW are available, which is a
CUDA (ComputeUnified Device Architecture)compatible short read alignment algorithm for
multiple GPUs sharing a singlehost. It provides support for un-gapped alignment, and results are
comparable with BWT-based aligners such as Bowtie andSOAP2. Another aligner BarraCUDA is
directly based on BWA, anddelivers a high level of alignment fidelity and is comparable to other
mainstreamalignment programs. It can perform alignments with gap extensions, in order
tominimize the number of false variant calls in re-sequencing studies.
Genetic Algorithms based docking
Genetic Algorithm (GA) has been used to find the optimal docking conformation of a ligand with
respect to a protein. All the data relative to the GA state is maintained on the GPU memory,
avoiding data movement through the PCI Express bus. The GA generates the random numbers on
the CPU instead of doing it on the GPU for two reasons (i) it enables one-to-one comparisons of
119
CPU and GPU results, (ii) it reduces the design, coding and validation effort of generating random
numbers on GPU (Perez-Sanchez et al., 2014).
An enhanced version of the PLANTS approach for protein-ligand docking usingGPUs is also
available. It exhibits speedup factors of up to 50x in their GPUimplementationcompared to an
optimized CPU based implementation for the evaluation of interaction potentials in the context of
rigid protein. The GPU implementationhas been carried out using OpenGL to access the GPU's
pipeline and Nvidia's Cglanguage for implementing the shaders programs. The speedup factors
observed are limited by several factors. First, only the generations of the ligand-protein
conformation and the scoring function evaluation are carried out on the GPU, whereas the
optimization algorithm is run on the CPU. This algorithmic decomposition impliestime-consuming
data transfers through PCI Express bus. The optimization algorithm used in PLANTS is the Ant
Colony Optimization (ACO) algorithm.A parallel scheme for this algorithm on a CPU cluster is
proposed, which use multiple ant colonies in parallel, exchanging information occasionally
between them.
DAIRRy-BLUP
DAIRRy-BLUP, a parallel, Distributed-memory RR-BLUP implementation, based on single-trait
observations (y), uses theAverage Information algorithm for restricted maximum-likelihood
estimation of the variance components (De Coninck et al., 2014). DAIRRy-BLUP enables the analysis
of large-scale data sets to provide more accurate estimates of marker effects and breedingvalues. A
distributed-memory framework is required since the dimensionality of the problem determined by
the number ofSNP markers becomes too large to be analyzed by a single computer. DAIRRy-BLUP
enablesthe analysis of very large-scale data sets up to 1,000,000 individuals and 360,000 SNPs.
Increasing thenumber of phenotypic and genotypic records has a more significant effect on the
prediction accuracy than increasing thedensity of SNP arrays.Gengar cluster on Stevin,the highperformance computing (HPC) infrastructure of GhentUniversity has been used,which consists of
194 computing nodes (IBMHS 21 XM blades) interconnected with a 4X DDR Infinibandnetwork (20
Gbit/sec). Each node contains a dual-socket quadcoreIntel Xeon L5420 2.5-GHz CPU (eight cores)
with 16 GBRAM. A one-to-one mappingof processes to CPU cores has been applied to achieve a
high performance.
Computational analysis of large-scale proteome data sets
Computational analysis of shotgun proteomicsdata can be performed in an automated
andstatistically rigorous wayby the freely availableMaxQuant environment. The sophisticated
algorithms and the amount of data require very highcomputational demands. Parallelization
andmemory optimization of the MaxQuant software with the aimof executing it on a large
computer cluster has been implemented (Neuhauser et al., 2013). The analysis mitigates bottlenecks
in overall performance to find that themost time-consuming algorithms are those detecting
peptidefeatures in the mass spectrometry (MS) data as well as the fragment spectrumsearch. These
tasks scale with the number of raw files and can readily be distributed over many CPUs. The
performance of a parallelized version of MaxQuant running on a standard desktop has been
compared with anI/O performance optimized desktop computer (game computer), and a cluster
environment. The modified gaming computerand the cluster vastly outperform a standard desktop
computer when analyzing more than 1000 raw files.The resulting MaxQuant version is highly
parallelizedand memory optimized. Highperformance platform has been applied to investigate
incremental coverage of the human proteome by high resolution MS data originating fromin-depth
cell line and cancer tissue proteome measurements.
Close to 1000 raw files can beefficiently processed in the standard workflow in a matter of afew
days. For the future, both the power ofcomputational hardware and the size of the data acquired
120
inproteomic investigations will increase. For instance, the numberof MS and MS/MS scans used in
standard acquisitions couldincrease several fold over the next few years, just as it has overthe last
several years. Countering this additional computationalload, current desktop chips with 12 virtual
cores already exist. Based on the trends it is expected that the computationaldemands of the
standard workflow for in depth shotgunproteomics can be comfortably handled for the
foreseeablefuture. But specialized tasks, such as searches in six frametranslations of large genomes,
and other extremely computingintensive tasks may benefit from large clusters.
Analysis of SNPs Interaction in Genome Wide Association Studies
Genome-wide association studies (GWAS) lead to systematic discovery of single
nucleotidepolymorphisms (SNPs) which are associated with a given disease. Univariate analysis
approaches may miss important SNP associations that only appear through multivariate analysis in
complex diseases.However, multivariate SNP analysis is currently limited by its inherent
computational complexity.Goudey et al. (2015) present a computational framework that harnesses
supercomputers. They estimate a three-wayinteraction analysis on 1.1 million SNP GWAS data
requiring over 5.8 years on the full Avoca IBM Blue Gene/Qinstallation at the Victorian Life
Sciences Computation Initiative. This is hundreds of times faster than estimates forother CPU based
methods and four times faster than runtimes estimated for GPU methods.It is becoming feasible to
carry out exhaustive analysis of higher order interaction studies on large modern GWAS. Nearlinear scalability of runtimewith the number of threads on a parallel, distributedmemory
supercomputer allows for a reduction in analysisruntime that has not been achieved previously.
Summary
High throughput data analysis in the field of Bioinformatics and Computational Biology can take
advantage from improvement inhigh performance computing systems to overcome computational
limitations. These applications provide the opportunity to create new exciting therapeutic strategies
formore productive andhealthier lifestyles that were unfeasible not so long ago.
Cloud computing services haveemerged as a cost-effective alternative for locallyinstalled
computing clusters. They provide computingresources and data storages that are virtually
withoutlimit, not interrupted by other users applications orsystem maintenance, and charged by
usage only. (Kim et al., 2012).
References
Coninck, A. D., Fostier, J., Maenhout, S. and Baets B. D. 2014. DAIRRy-BLUP: A high-performance computing
approach to genomic prediction. Genetics, 197: 813.
Goudey, B., Abedini1, M, Hopper, J.L., Inouye, M., Makalic, E., Schmidt, D.F., Wagner, J., Zhou, Z., Zobel, J.
and Reumann, M. 2015. High performance computing enablingexhaustive analysis of higher order single
nucleotide polymorphism interaction in Genome Wide Association Studies. Health Information Science
and Systems, 3(Suppl 1):S3.
Kadarmideen H. N. 2014. Genomics to systems biology in animal and veterinary sciences: progress, lessons
and opportunities. Livestock Science, 166: 232248.
Kim, I., Jung, J.Y., DeLuca, T.F., Nelson, T.H. and Wall D.P. 2012. Cloud computing for comparative genomics
with WindowsAzure Platform. Evolutionary Bioinformatics, 8, 527.
Korte
T.
2014.
Supercomputing
vs.
distributed
computing:
A
government
primer.URL:http://www.datainnovation.org/2014/01/supercomputing-vs-distributed-computing-agovernment-primer/
Neuhauser, N., Nagaraj, N., McHardy, P., Zanivan, S., Scheltema, R., Cox, J. and Mann, M. 2013. High
performance computational analysis of large-scale proteomedata sets to assess incremental contribution to
coverage of thehuman genome. Journal of Proteome Research, 12: 2858.
Perez-Sanchez, H.,Cecilia, J.M. and Merelli, I. 2014. The role of high performance computing in
bioinformatics. Proceedings IWBBIO. Granada 7-9 April, 2014.
121
________________________________________________________________________________________
Procedure
i.
Goat testis are collected from slaughter house and brought to laboratory in cold condition
within 2-3 hours of slaughtering.
ii.
The caudal region of epididymis region is sliced out from the testis, adhering tissues
removed and washed in normal saline at room temperature.
iii.
The cauda is given three to four cuts longitudinally and kept in buffer solution for about 30
minutes for sperms to swim out of seminiferous tubules.
iv.
The buffer containing sperms is given a suitable spin in centrifuge so as to concentrate them.
v.
The sperms are then extended in a buffer containing cryo protectant, sugar, buffering salt,
antibiotics etc. at suitable pH and diluted so that their conc. is about 100 million / ml.
vi.
The extended sperms are then filled in a straw and straw sealed.
vii.
The straws are then stacked in a programmable freezer and cooled to 50C @ 0.250C/min. and
then stabilized at this temperature for 30 minutes.
viii.
The straws are then cooled to -200C @ 50C/min and then to -1000C @ 200C/min.
ix.
The straws are then plunged directly in liquid nitrogen.
x.
After suitable period of storage the straws are thawed at 370C and sperms motility
evaluated.
123
2
Cytogenetic and Molecular Screeningof Genetic Defects in Livestock
S. K. Niranjan and R. S. Kataria
National Bureau of Animal Genetic Resources, Karnal, Haryana, India
Cytogenetic screening
Cytogenetic screening involves some main steps- cell division or inducing the cell for cell division,
arresting the cells at metaphase, treat the cells in hypotonic solution, make the spread and finally
staining the chromosomes. Nearly all methods of chromosome bandings rely on these steps in most
importantly on first two. Peripheral blood is most convenient tissue and common source for the
karyotype preparation in livestock species. Since, lymphocytes have the division capability; they
are induced for mitosis by using suitable mitogen and allowed to propagate in suitable culture
medium supplemented with essential ingredients and incubation temperature and period.
Harvesting of the chromosomes is achieved by the inhibitors like colchicine or colcemid, which
inhibit the tubulins and depolymerize the mitotic spindles and ultimately arrest the cell division at
particular stage. Metaphase chromosomes spread make all the chromosomes to stay in the same
plane on the slide. The spreads, which do not overlap, are selected and individual chromosomes are
identified.
About 8-10 ml blood sample should be collected under strictly sterile condition in heparin coated
vacutainer tubes (green top). Sample should clearly mention about the Animal ID on the collection
tube. However, description of samples must be provided separately with Animal ID, breed, sex,
age. However, information about the fertility status of the animal along with other requisites
should also be provided in case of cytological screening.Blood samples must be reached to lab as
soon as possible (not beyond 48 hours after collection for cytological analysis) in cooled (at about
4C) conditions.
Preparation of reagents
Media
8.1 gm RPMI media
0.8 gm NaHCO 3
5.0 ml antibiotics (Actinomycotik: Anti-Anti)
1 ml (2.5 mg/ml) Lectin-Phytoheamoglobin (PHA)
1 ml (1mg/ml) lectin (pokeweed)
1 ml (5 mg/ml) conconavalin A
500 ml autoclaved distilled water
Mix all the content properly and filter (Nalgene Filter units MF75 Series SFCA
membrane, 90mm diameter, pore size 0.45 )
Add 100 ml fetal bovine serum
Mix and store at -20 C
Hypotonic solution
1.667 gmKCl dissolved in 300 Distilled Water, Keep at 37 C (needed about 7 ml/sample).
Fixative solution:
3 Methanol:1 acetic acid (keep in freezer) (needed about 15 ml/sample)
Staining solution (2%Geimsa stain)
49 ml GURR buffer
1 ml Geimsa stain
124
Culture setting:
Take Media 5 ml in 15 ml sterilised tube and add 0.7-1.0 ml whole blood. Mix the content properly.
This step should be done under strict sterilized condition, using laminar flow to avoid any
contamination during culture. Incubate the culture in incubator at 37 C for 72 hours. Mix the
content of tube almost every 12 hours interval. Normally blood cells are settled down in culture
media after few hours. In case of any contamination, culture content is generally not settled and
turns black.
After completion of 72 hours of culture, put out the culture from incubator and add Colchicine
(colcemid) @ 28l per sample and again incubate at 37 C for 1 hour.
Remove the tubes and centrifuge at 2,000 rpm for 20 minutes. After centrifugation, discard the
supernatant cautiously.
Add 7 ml of hypotonic solution and mix it with glass/plastic pipette. Incubate the content in
incubator at 37 C for 20 minutes.
Add 1 ml fixative (chilled) into the content and mix it properlywith the help of glass/plastic
pipette (colour changes to blackish). Centrifuge the tube at 2,000 rpm for 20 minutes.
Discard supernatant cautiously and add 5 mlof fixative solution. Mix the content properly.
Centrifuge at 2,000 rpm for 20 minutes.
Discard supernatant very cautiously, as the sedimented content turns almost colourless and add
4 ml of fixative solution. Mix the content properly. Centrifuge at 2,000 rpm for 20 minutes.
Discard supernatant very cautiously and add 3 mlof fixative solution. Mix the content properly.
Centrifuge at 2,000 rpm for 20 minutes.
Discard half of supernatant and keep 1.5 ml and re-suspend the content for further slide
preparation. Content may also be preserved at -20 C.
Preparation of slide
Wash the glass slide. Keep the glass slides into the icecold water before making the spread.
Take about 0.5 ml culture contentin to the pipette and drop 4-5 droplets on the slide from a
height of about 1 meter. During the dropping the slide should be slightly tilted/angled
towards ground, so that it will cause to bursting of cell and evenly spreading of
chromosomes.
Airs dries the slide for overnight and mark after drying.
Stainingand mounting
Dip the slide in staining solutionGeimsa (2%), for 15- 20 min (slide should be totally dried before
staining). After staining, wash the slides in running tape water and rinse with distilled water.
Dry the slide overnight in incubator (dry completely)
Dip the dired slides in xylene for 15-20 min
Put 2-3 drops of mountant (DPX or Eukitt quick hardening substance)
Mount the coverslip (xylene dipped) on slide and fix properly. remove air bubbles by putting
slight pressure on coverslip. Air dry the slide overnight.
Clean the slide by using xylene.
Microscopic examination
Once stained slides are prepared, they are scanned to identify "good" chromosome spreads (i.e. the
chromosomes are not too long or too compact and are not overlapping), which are photographed.
The images of each chromosome then are cut out and pasted to a backing sheet in an orderly
manner. Alternatively, a digital image of the chromosomes can be cut and pasted using a computer.
If standard staining was used, the orderly arrangement is limited to grouping like-sized
chromosomes together in pairs, whereas if the chromosomes were banded, they can be
unambiguously paired and numbered.
125
Generally, several metaphases are processed because it is not uncommon for a single spread to
artifactually have extra chromosomes or be missing chromosomes. This is particularly important if
one is to diagnose an abnormality in an individual. It also allows one to diagnose cases of
mosaicism, in which an individual has multiple, cytogenetically distinct populations of cells.
One final point, the discussion above has focused on initial evaluation of an individual's
cytogenetic status. If abnormalities are found in peripheral blood, it is sometimes desirable to
determine whether that abnormality is present throughout the individual, and further studies with
tissues other than blood can be performed. Also, analysis of diseased tissues can often provide
useful information. A prime example of this is the cytogenetic evaluation of cancers, which is not
only used diagnostically, but has provided valuable understanding of the pathogenesis of certain
types of neoplasia.
Cytogenetic nomenclature
Nomenclature of chromosome and chromosomal abnormalities is done as per guidelines given by
International System for Human Cytogenetic Nomenclature (ISCN) 2009. Few examples of
cytogenetic nomenclature are as follows.
Karyotypes are presented in a standard form. First, the total number of chromosomes is given,
followed by a comma and the sex chromosome constitution. This shorthand description is followed
by coding of any autosomal abnormalities. A few (simple) examples of this format are:
A normal male cattle: 60, XY
Horse with three X chromosomes (trisomy X): 65, XXX
Female sheep with increased length of the short (p) arm of chromosome 2: 78, XX, 2p+
Male pig with a deletion from the long arm (q) of chromosome 10: 38, XY, 10q46, XX
Normal Female Karyotype of human
46, XY
Normal Male Karyotype of human
p
short arm of chromosome
q
long arm of chromosome
cen
centromere
+
gain of
eg. 47,XX,+21 Female with trisomy 21
loss of
eg. 45,XX,-14,-21,+t(14q21q) Normal female carrier of a robertsonian
translocation between the long arms of chromosomes 14 and 21; karyotype is
missing a normal 14 and a normal 21
4pChromosome 4 with one of the short arm deleted.
:
break 5qter -->5p15: deleted chromosome 5 in a patient with cri du chat syndrome,
with a deletion breakpoint in band p15
::
break and join 2pter-->2q21::8p13-->8pter Description of der(2) portion of t(2,8)
/
mosaicism 46,XX/47,XX,+8 Female with two populations of cells, a normal
karyotype and one with trisomy 8
del
deletion 46,XX,del(5p) Female with deletion of part of short arm of one
chromosome 5
der
derivative chromosome der(1) Translocation
chromosome
derived
from
chromosome 1 and containing the centromere of chromosome 1
dic
dicentric chromosome dic(X;Y) Translocation
chromosome
containing
centromeres from both the X and the Y chromosomes
dup
duplication
fra
fragile site 46, Y fag(X)(q27.3)
Male with fragile X chromosome
i
isochromosome 46,X,i(Xq) Female with isochromosomefro the long arm of the X
chromosome.
ins
insertion
126
inv
mar
r
rcp
rob
t
ter
Molecular screening
Blood collection:Ten ml of venous blood was collected in sterile centrifuge tube containing 0.5 ml of
2.7% EDTA as an anticoagulant and immediately transferred to laboratory in an ice bucket. Sample
should clearly mention about the Animal ID on the collection tube. Blood samples must be reached
to lab as soon as possible (not beyond 48 hours after collection for cytological analysis) in cooled (at
about 4C) conditions.
Isolation of genomic DNA: Genomic DNA was isolated from 10 ml blood by phenol-chloroform
extraction method by standard protocol (Sambrook and Russel, 2001) with slight modifications.
PCR amplification:To amplify genomic region of aparticular gene, following set of primers are used
List of primers
Locus ,Allele
BLAD
Nmae of primer
Forward
Reverse
5'Citrullinemia
Forward
Reverse
DUMPS
Forward
Reverse
Factor XI deficiency Forward
Reverse
5-
Primer sequence
5'-CCTGCATCATATCCACCAG -3'
GTTTCAGGGGAAGATGGAG -3'
5'- GGCCAGGGACCGTGTTCATTGAGGACATC - 3'
5'- TTCCTGGGACCCCGTGAGACACATACTTG -3
5- GCAAATGGCTGAAGAACATTCTG -3
5- GCTTCTAACTGAACTCCTCGAGT -3
5- CCCACTGGCTAGGAATCGTT -3
CAAGGCAATGTCATATCCAC -3
PCR conditions: The following reaction mixture can be used for the amplification of these alleles
S. N. Reaction components
Concentration
Amount
1. Template (Genomic DNA)
140 ng
2.00 l
2. Forward primer
30 pmole
1.00 l
3. Reverse primer
30 pmole
1.00 l
4. 10X PCR buffer (with 1.5 mM MgCl 2 ) 1X
5.00 l
5. dNTPs mix (2 mM)
200 M
5.00 l
6. Autoclaved triple distilled water
34.75 l
7. Taq DNA polymerase (5 U/l)
1.25 U
0.25 l
Total
50.00 l
PCR amplification conditions: General PCR conditions are given below, which needs to be
standardized for each test.
S.N. Steps
Temperature
Time
1. Initial denaturation 94C
3 min.
2. Denaturation
94C
30 sec.
3. Annealing
specified (52-58C) 30 sec.
4. Extension
72C
1 min.
GO TO STEP 2 FOR 35 TIMES
5. Final extension
72C
10 min.
127
Checking of amplified products:After completion of PCR programme, the PCR products were checked
in 1.5% agarose gel electrophoresis at 60V for 1 hour. After electrophoresis, the products were
visualized and documented under gel documentation system.
Restriction enzyme digestion of PCR products:Total four restriction enzymes namely AvaII, HinfI, TaqI,
Tru1I, HaeIII and MspIwere used for the PCR-RFLP as well as haplotype analysis of DQA and DQB
genes.
Enzymes used for PCR-RFLP of BuLA-DQA genes
Restrictionenzymes
Incubationtemp
Recognitionsequence
Hinf I
37 C
5'GANTC3'
Ava II
37 C
5'GGA/TCC3'
HaeIII
37 C
5'GGCC3'
The reaction mix for digestion
The digestion was carried out in 0.2ml PCR tube in a total volume of 15l reaction mix at specified
temperatures.
Reaction component
Amount
Restriction enzyme (10u/l)
1.0 l
10x assay buffer for RE
1.5 l
PCR product
10.0 l
Autoclaved dist. water
2.5 l
Total volume
15l
The samples were incubated overnight to ensure complete digestion. Digested products of PCR
products were checked in 2.0% agarose gel electrophoresis at 70V for 1hr 30 min. After
electrophoresis, the products were visualized and documented under gel documentation system.
Bovine Leukocyte Adhesion Deficiency (BLAD)
Through using PCR- RFLP method, this genetic defect can be identified. PCR-RFLP patterns of
BLAD free and BLAD carrier cows are screened by restriction digestion with Taq I. The PCR
isperformed using primers (5'-CCTGCATCATATCCACCA G-3' and 5'- GTTTCAGGGGAAGAT
GGAG-3'), resulting in amplification of fragment of 343 bp length. The fragment was cut with TaqI
restriction enzyme. After digestion, two bands of 152 and 191 bp indicate homozygote normal
individual, a single band of 343 bp indicates homozygote sick individual, and three bands of 152,
191 and 343 bp indicate heterozygote carrier individual.
Citrullinemia
PCR-RFLP method permits diagnosis of the genotypes for bovine citrullinemia. Amplification of
ASAS locus and detection of mutation at codon 86 was done by PCR followed by restriction
digestion of amplified products. Primers of sense (5'GGCCAGGGA CCGTGTTCATTGAGGACATC
3') and antisense primers (5' TTCCTGGGACCCCGTGAG ACACATACTTG 3 ). The PCR -RFLP of
128
ASAS locus using AvaII enzymeis 103bp and 82bp for normal animals and 185bp, 103bp and 82bp
for carrier animals and only 185bp for diseased (homozygous recessive) animals.
Factor XI Deficiency
Detection of Factor XI deficiency is based on PCR amplification of the target gene fragment.The
primers (5-CCCACTGGCTAGGAATCGTT-3) and (5-CAAGGCAATGTCATATCCAC-3) can be
used to amplify the region containing the mutation in exon 12. For normal FXI allele, PCR amplifya
244 bp long fragment, whereas, in mutated FXI allele in homozygouscondition results in a single
320 bp long fragment amplification. Heterozygous, or carrier, individuals exhibit both 244 and 320
bp long fragments.
129
Reference
Grupe, S.; Dietl, G. and Schwerin, M. 1996.Population survey of Citrullinemia on German Holsteins.Livest.
Prod. Sei., Amsterdam 45: 35.
Marron, B. M., J. L. Robinson, P. A. Gentry and J. E. Beever 2004. Identification of a mutation associated with
factor XI deficiency in Holstein cattle. Animal Genetics, 35:454.
Prakash, B., Balain, D.S., Lathwal, S.S. and Malik, R.K. 1995. Infertility associated with monosomy-X in a
crossbred cattle heifer. Veterinary-Record. 137: 17: 436.
Shuster, D.E., Kehrli, M.E., Ackerman, M.R. and Gilbert, R.O. 1992. Identification and prevalence of genetic
defect that Causes Leucocyte Adhesion Deficiency Diseases in Holstein Cattle. Proceedings of the
National Academy of Sciences of the United States of America 89, 9225.
http://www.radford.edu/~rsheehy/cytogenetics/Cytogenetic_Nomeclature.html
130
________________________________________________________________________________________
Nucleic acid extraction is a key step in laboratory procedures required to perform further molecular
analysis. Successful use of available downstream applications will benefit from the use of highquantity and high-quality DNA. Therefore, it is imperative that the DNA extracted for subsequent
use is devoid of proteins and other inhibitors.DNA can be extracted from fresh or frozen whole
blood, blood stains, sperm cells, cultured cells/tissue, amniotic fluid and hair roots. Basic extraction
procedure remains the same except for minor modifications for each type of material. The phenolchloroform extraction followed by ethanol precipitation is most routinely used method for DNA
isolation. Phenol-chloroform extractionis a liquid-liquid extraction method that separates mixtures
of molecules based on the differential solubility of the individual molecules in two different
immiscible liquids.
DNA extraction methods follow some common procedures aimed to achieve effective disruption
of cells, denaturation of nucleoprotein complexes, inactivation of nucleases and other enzymes,
removal of biological and chemical contaminants, and finally DNA precipitation.Most of the
methods follow similar basic steps and include the use of organic and inorganic reagents and
centrifugation methods.
Since the goal of genomic DNA extraction depends on what will be the applications of the DNA
after isolation, therefore purity, source, quantity and quality of DNA are all issues that need to be
addressed prior to genomic DNA extraction. A plethora of different methods, technologies and kits
are now available to researchers to isolate genomic DNA from cells.Selecting the most suitable
method/technology/kit for DNA extraction depends on the following factors.
Quantity of DNA needed
Molecular weight and size of DNA
Purity of DNA required
Downstream applications of DNA
Time available
Ease of DNA extraction technique or method
Expense or money available
Almost all protocols for isolation of DNA from blood and tissues involve four major steps Lysis of cells using a detergent such as sodium dodecylsulphate (SDS)
Digestion of proteins released from cell lysis with proteinase- K
Extraction of DNA with phenol
Precipitation of DNA with alcohol
Principle of DNA extraction
The basic principle of extraction is that all other components of the chromatin are removed leaving
behind the DNA. The proteins are digested by the enzyme Proteinase- K in presence of SDS, which
acts as a catalyst and also helps in lysis of cells (WBC, bacterial cells etc). Phenol-chloroform
extraction is used to remove the proteins from the aqueous phase. Chloroform eliminates any traces
of phenol which can cause phosphodiester breakage. The pH of phenol should be maintained above
7.8 to prevent DNA from becoming trapped at the inter-phase between the organic and aqueous
phase. DNA is finally precipitated with alcohol or isopropanol from a salt solution of a moderate
131
concentration of monovalent cations.On an average, about 400 to 500 g of DNA can be extracted
from 10 ml of blood.
Materials required
Plastic (Oakridge tubes)
Borosil tubes (autoclaved)
Crushed ice
Weighing balance
Pipettes (sterile)
Sterile plasticwareand glassware
High speed centrifuge
Vortex mixer
Stock solutions
1 M Tris (pH 8.0)
1 M NH 4 CI
1 M KHC0 3
0.5 M EDTA (pH 8.0)
0.5 M NaCl
20% SDS
Proteinase- K (20 mg/ml)
3 M sodium acetate (pH 5.2)
Phenol (equilibrated with Tris at pH > 7.8)
Chloroform
Isoamyl alcohol
Absolute alcohol
70% alcohol
RBS lysis buffer
Ammonium chloride
155mM
Potassium bicarbonate
10 mM
EDTA (pH8)
0.1 mM
DNA extraction buffer
NaCl
Tris (pH 8)
EDTA (pH 8)
400 mM
10 mM
2 mM
3. Re-suspended the pellet in one volume of RBC lysis buffer and keep on ice for 10 minutes.
Centrifuge as in step 2.
4. Steps 2 and 3 are repeated until most of the red blood cells are lysed and clear pellets of white
blood cells are obtained.
5. Add equal volume of DNA extraction buffer to the WBC pellet and mix well by vortexing.
6. Add 20% Sodium dodecyl sulfate (SDS)@ 200l per 10ml of whole blood and mix gently.
(SDS is a popular detergent used to solubilize cell membranes)
7. Add Proteinase-K (20 mg/ml stock solution) @ 40 l per 10ml of blood and incubate at 56oC
overnight.
(Enzymes are combined with detergents to target cell surface or cytosolic components. Proteinase K
cleaves glycoproteins and inactivatesRNases and DNases)
Phenol extraction
1. After overnight incubation, add equal volume of Tris (pH 8.0) saturated phenol to the above
mixture and mix gently by inverting the tube for 5-10 minutes to form a uniform suspension.
2. Centrifuge the mixture at 10000 rpm for 15 minutes at 25oC.
3. Aspirate the upper aqueous phase gently using a wide bore sterile pasture pipette without
disturbing the inter-phase of protein and transfer to a fresh oakridge tube.
The nucleic acid will tend to partition in the organic phase if the phenol has not beenadequately
equiliberated at a pH of 7.8-8.0)
4. Add equal volume of phenol: chloroform: isoamyl alcohol (25:24:1) and mix gently by inverting
the tube until a uniform suspension is formed.
(Isoamyl alcohol prevents frothing during mixing)
5. Centrifuge the mixture at 10000 rpm for 15 minutes at 25oC and again aspirate the upper
aqueous phase gently using a wide bore sterile pasture pipette and transfer into a fresh
oakridge tube.
6. Add equal volume of Chloroform: Isoamyl alcohol (24:1) and mix properly by inverting the
tube several times.
7. The mixture is centrifuged again at 10000 rpm and the aqueous phase is transferred into a fresh
sterile glass tube without disturbing the inter-phase.
DNA precipitation
1. To the separated aqueous phase, add 0.1 volume of sodium acetate (3M, pH 5.2) and mix gently.
(DNA precipitation is achieved by adding high concentrations of salt to DNA-containingsolutions, as
cations from salts counteract the repulsion caused by the negative charge of the phosphate backbone)
2. Add 2 to 2.5 volumes of chilled ethanol and mix gently by inverting the tube to precipitatethe
DNA.
(A mixture of DNA and salts in the presence of solvents like ethanol at final concentrations of 70%80%
cause nucleic acids to precipitate)
3. Spool out the precipitated DNA with the help of Pasteur pipette and transfer to an eppendorf
tube.
4. Wash the DNA with 1ml of 70% ethanol by mixing well and centrifuging at 10000 rpm for 5
minutes at 4oC. Repeat this step again.
(Washing step with 70% ethanol removes excess salts from DNA)
5. Keep the eppendorf tubes open in a sterile incubator at 37oC to dry the pellet by evaporating the
alcohol.
6. Dissolve the DNA in 500l TE buffer (pH 8.0) and incubate at 65C for 30 minutes.
7. Store the DNA sample at 4C for 4-5 days to ensure complete dissolution of DNA in the buffer.
The dissolved DNA can then be stored at -20Cas stock solution for future use.
133
134
14. DNA Elution-Add 200 l of Elution Buffer (ET) (DS0040) and vortex for 1 minute to dissolve the
DNA pelletproperly. Incubate the tube at 65C for 1 hour and at room temperature overnight
torehydrate the DNA. Gently shake the tube several times intermittently during the incubationto
dissolve the DNA completely.
15. Storage of the eluate with purified DNA- The eluate contains pure genomic DNA. For shorttermstorage of the DNA, 2-8C and for long-term storage, -20C is recommended. Avoid
repeatedfreezing and thawing of the sample which may cause denaturing of DNA. The Elution
Buffer willhelp to stabilize the DNA at these temperatures
Estimation of quality and quantity of DNA
Most commonly used methods to estimateDNA concentration
spectrophotometeric measurements and agarose gel electrophoresis.
and
purity
are
UV
135
________________________________________________________________________________________
Sampling procedure
Any of the biological materials like fresh blood, tissue, hair, bone etc. may potentially be used for
DNA analysis. However, fresh blood is preferred as a sample material as high quality of DNA can
easily be obtained from peripheral blood. Sample should be collected from unrelated animals by
visiting the breeding tract of the breed in question and not more than 10% of a herd or village
population should be sampled. Whenever possible, pedigree records should be consulted for
identifying unrelated individuals. To achieve clearer differentiation among closely related
populations/ breeds, FAO recommends that per breed 50 unrelated animals (preferably 25 each of
both the sexes) should be assayed.
DNA extraction
The collected blood samples in vacutainer tubes containing anticoagulant such as EDTA are
transported to the laboratory under chilled condition for further processing. Genomic DNA from
total blood is then isolated using proteinase-K digestion followed by standard phenol/ chloroform
extraction. Both the quality as well as quantity of isolated genomic DNA is assessed and
subsequently stored at 200C/40C for further analysis with microsatellite markers. Blood samples
can also be collected on FTA cards.
Amplification and resolution of microsatellites
Microsatellites can be amplified by polymerase chain reaction (PCR) technique with unlabelled or
labeled primers. On amplification with unlabelled primers the number of repeat units that an
individual has at a given locus can be resolved using polyacrylamide gels by silver staining which
involves three steps 1) Fixing of the DNA band on the gel by 10% acetic acid, 2) Incubation of the
gel in the silver nitrate solution for 30 minutes 3) Developing the DNA bands with the help of
developer. The resulting stained gels are dried and stored for data recording and data analysis.
From the gels, two bands for most individuals can be seen as each individual inherits one length of
nucleotide repeats from the mother and the other from the father (individuals with one band reveal
that the same band has been received from the mother as well as the father).PCR primers labeled
with fluorescent dyes viz., FAM, HEX, NED, PET which have different absorption spectra permit
the simultaneous analysis of microsatellites, which overlap in size, by automated DNA fragment
analyzer/ sequencer. The sequencer allows a much higher resolution of microsatellites than is
possible with the other methods of analysis.
Data processing
A number of software programmes are available for analysis of microsatellite data recorded as
genotype designations for each individual across the microsatellite loci with different analytical
methods that can be downloaded from internet. The data generated in terms of alleles, as
photographs or preserved gels then needs to be analyzed. Two main steps are involved in the
statistical analysis of molecular data in diversity studies:
Genotyping: Each individual can be genotyped manually by scoring the band (alleles) can be scored
manually as two digits or as their interger size in base pair in which case heterozygous individuals
yield two bands and those that are homozygous yield one band.
Entry of band/allele information into the computer: It can be done manually or it can be read from gel
directly by a computer installed with software.
136
137
Control DNA
It is recommended to analyse at least one control DNA sample in every PCR run. As the control
DNA serves as a positive control for troubleshooting problems with the PCR amplification and also
minitors and correlates/compares the fragment sizes obtained in different runs or by different
people.
Preparation of PCR samples
1) Amplify your target product using primer pairs of which the forward is labeled with a capillary
based dye: 6FAM (Blue), PET (Red), VIC (Green) and NED (Yellow)
2) Dilute PCR product in MiliQ water (e.g. 1:20 dilution varies with individual markers)
3) Prepare internal standard by adding 10 l of LIZ 500 standard (stored at 4 oC)
to 1 ml of HiDi formamide (stored in aliquots of 1 ml at -20C) and mix by pipeting
4) Pipette 1 l of dilute PCR product into individual wells of the microtitre plate
5) Add 9 l of the standard/formamide mix into each well and centrifuge briefly
Troubleshooting
Too much signal is the most common problem. For optimal results, the fluorescent signal should be
between 150 6000 RFUs. Above this range, the instrument cannot measure the true value of the
signal and therefore cannot apply the matrix correctly. This results in artifact pull-up peaks that
can appear in other colours. Artifact peaks can corrupt both the automated size calling and the
analysis of co-loaded samples.
If you intend to pool PCR products, it is important to pool PCR products together at the correct
ratios in order to get similar fluorescent intensities across all fragments in the pool. The fluorescent
dyes are detected with different efficiencies; therefore the amount of each dye-labelled product in
the pool will require adjustment to ensure even detection.
A good way to proceed is to test a few combinations of pooled PCR reactions to determine the
pooling ratio that will provide similar fluorescent intensities across all the pooled fragments. Then
carry out a series of dilutions on the pooled reactions in order to determine the optimal
fluorescence for running on the 3130xi.
After determining the optimal pooling ratio and/or dilution ratio, you can then use the same
dilutions for subsequent analyses, as PCR yields should be relatively consistent.
138
Using GENEMAPPER
Creating a new panel
(that contains details of your marker including name, colour, size range of fragment)
Open Genemapper
Tools
Panel manager
Double click root of panel manager
Right click top most cell of panel column
Close the dialog box but panel cell still highlighted
File
New kit
Kit name-Test
Click on test (test highlighted)
File
New Panel-Test Plex 1
Double click on test
Select plex 1
File
Marker Name
A
B
C
D
E
New marker
Enter information for marker
Dye
Dye colour
Max size
NED
Yellow
109
PET
Red
116
VIC
Green
111
FAM
Blue
86
FAM
Blue
251
Apply
Okay
Close panel window
139
Min size
149
150
141
160
311
Data Analysis
Open Genemapper
File
Add samples to project
Browse
Highlight entire folder
Add to list
Add
Ok
Analyse
(choose microsatellite default)
Select column, ctrl D to fill down
select panel of markers (test plex 1)
Select column, ctrl D to fill down
Select internal marker GSLIZ500
Select column, ctrl D to fill down
Run
Open genotypes
Sort by marker /sample no. etc
Export file as text tab delimited
Open table in MS Excel
Edit table to remove the unwanted columns
Using GenAlEx
(Peakall, R. and Smouse P.E. (2012) GenAlEx 6.5: genetic analysis in Excel. Population genetic
software for teaching and research an update. Bioinformatics 28, 2537-2539).
Genetic Analysis in Excel is a popular cross platform package for population genetic analysis that
runs within Microsoft Excel. GenAlEx offers analysis of codominant, haploid and binary genetic
loci and DNA sequences. Both frequency-based (F-statistics, heterozygosity, HWE, population
assignment, relatedness) and distance-based (AMOVA, PCoA, Mantel tests, multivariate spatial
autocorrelation) analyses are provided. In GenAlEx 6.5 we introduce exciting new features
including calculation of new estimators of population structure: GST, GST, Josts Dest, and FST
via AMOVA, Shannon Information analysis, linkage disequilibrium analysis for biallelic data, and
heterogeneity tests for spatial autocorrelation analysis. Data export is provided to more than 30
other software packages.
140
Pop
AA1
AA2
AA3
AA4
AA5
AA6
AA7
AA8
AA9
AA10
AA11
AA12
AA13
AA14
AA15
BB1
BB2
BB3
BB4
BB5
BB6
BB7
BB8
BB9
BB10
BB11
BB12
BB13
BB14
BB15
CC1
CC2
CC3
CC4
CC5
CC6
CC7
CC8
CC9
CC10
CC11
CC12
CC13
CC14
locus1
113
113
113
109
109
109
113
109
109
113
109
113
109
113
113
111
109
109
111
113
111
105
109
113
105
111
111
105
113
105
111
105
111
111
111
111
111
111
111
111
111
107
111
111
45
CC15
111
Pop1
Pop3
113
113
113
113
113
113
117
109
109
113
113
117
109
115
113
115
113
113
113
113
113
111
113
115
109
113
113
117
117
113
111
115
113
111
111
111
113
113
111
111
115
111
111
111
Pop2
locus2
136
136
136
130
136
136
136
130
134
134
134
132
124
136
130
136
132
136
130
136
128
136
130
130
130
130
130
130
134
130
126
128
130
132
130
136
130
128
134
128
132
130
130
130
136
136
136
136
136
144
136
130
134
134
136
132
124
136
130
144
136
144
136
136
128
138
144
136
144
136
136
136
144
136
132
136
136
132
130
146
136
132
136
132
136
136
130
136
locus3
182
182
182
184
180
184
180
182
184
176
182
182
184
180
182
184
184
180
184
184
184
184
182
182
184
184
184
184
184
184
180
178
182
180
184
180
170
178
180
178
184
180
180
180
182
182
184
198
184
184
180
182
198
184
184
184
184
198
184
184
184
184
184
184
184
184
184
182
184
184
184
184
184
184
184
184
198
180
198
180
180
184
198
184
184
182
198
184
locus4
126
122
126
124
122
124
120
120
120
120
120
124
122
124
120
120
120
120
120
120
120
120
120
120
120
126
120
120
120
124
124
124
122
124
128
118
118
124
124
116
118
124
124
118
140
124
128
128
122
124
122
120
120
120
122
128
124
124
124
130
120
120
120
124
124
128
126
128
120
126
124
124
124
138
124
132
124
138
128
128
134
124
134
124
130
124
124
124
locus5
212
218
218
214
212
212
212
212
224
212
212
218
212
214
214
214
220
214
220
218
218
214
220
218
214
220
218
216
220
220
218
214
220
220
218
220
220
218
218
220
218
214
214
220
218
218
224
214
216
220
216
216
224
216
214
218
218
218
218
222
220
218
220
218
218
218
220
218
218
220
218
218
220
222
220
220
220
220
220
220
224
220
220
220
220
214
214
222
locus6
123
123
123
121
123
123
123
123
123
123
121
121
123
121
123
123
123
125
121
123
125
123
123
121
123
121
123
121
123
125
123
123
123
121
117
121
123
123
123
123
121
123
123
121
123
123
131
129
123
123
123
123
123
123
123
129
123
121
123
123
133
133
123
123
133
123
133
125
131
133
127
123
133
133
127
145
125
123
121
123
145
127
145
145
123
133
133
121
113
122
136
184
198
120
124
214
222
121
123
141
Using Popgene32
(Yeh, F.C.; Boyle, T.; Rongcai, Y.; Ye, Z. and Xian, J.M. 1999. POPGENE version 1.31. A Microsoft
window based freeware for population genetic analysis. University of Alberta, Edmonton).
POPGENEis a user-friendly MicrosoftWindow-based computer package for the analysis of genetic
variation among and within natural populations using co-dominant and dominant markers and
quantitative traits. designed specifically for the analysis of co-dominant and dominant markers
using haploid and diploid data. It performs most types of data analysis encountered in population
genetics and related fields. It can be used to compute summary statistics (e.g., allele frequency, gene
diversity, genetic distance, F-statistics, multilocus structure, etc.) for (1) single-locus, single
populations; (2) single-locus, multiple populations; (3) multilocus, single populations and (4)
multilocus, multiple populations. The modules for co-dominant and dominant markers are
currently limited to a maximum of:
1400 populations;
150 groups;
1000 loci;
10 characters (Alpha-numeric) for a locus name (automatically truncates to 10 if more than 10
characters are given).
The number of alleles per locus is limited to 9 (1-9) if you use the numerals to code your alleles or to
52 if you use the alphabetic letters (respectively, capital alphabet A - Z for alleles 1 to 26 and lower
alphabet a -z for alleles 27-52).
Input file format for diploid data, co-dominant marker
/* Diploid alphabetic data of 3 populations each with varying records (genotypes) and 21 loci */
Number of populations = 3
Number of loci = 21
Locus name :
AAT-1 AAT-2 AAT-3 ACO ADH DIA-1 DIA-3 EST-2 GDH G6P HA
IDH MDH-1 MDH-2 MDH-3 MDH-4 PEP-1 PEP-2 PGI-2 PGM SPG-2
AA AAAAAAAAAAAA BB AA AAAAAAAAAAAAAAAAAAAAAAAA
AA AAAA AB BB A3 AA AB BB AA AAAAAAAAAAAAAAAA AB AA AA
AA AAAAAA BC AC AA AB ABAB AA AAAAAAAAAAAAAAAAAAAA
AA AAAAAA BB CC AA BB AB AA AAAAAAAAAAAAAAAAAAAAAA
AA AAAAAA AB AC AA BB AA AAAAAAAAAAAA AC AA AAAA AB AA
AA AAAA AB AB AC AA AB AB AA AAAAAAAAAA AB AA AAAAAAAA
AB AA AAAA BC AC AA AB AB AA AAAAAA AB AA AB AA AAAAAAAA
AA AAAAAA BB AA AAAAAAAAAAAAAAAAAAAAAAAA AC AA AA
AA AAAAAAAA BC AA AB AA AAAAAAAAAAAA AB AA AAAAAAAA
AA AAAA AB BC BC AA BB AB AA AAAAAA AB AA AAAAAAAA AC AA
AA AAAA AB AC AB AA BB BB AA AAAAAAAAAAAAAAAAAAAAAA
AA AAAAAA AB AC AA BB AA AAAAAAAAAAAAAAAAAAAAAAAA
AA AAAAAAAA BC AB AB AA AAAAAAAAAAAAAAAAAA AC AA AA
AA AAAA AB AA AC AA AB AB AA AAAAAAAAAA AB AA AAAAAAAA
AA AAAAAA BB BB AA AAAAAAAAAAAAAAAA AB AA AAAAAAAA
AA AAAAAA AC AC AA BB AB AA AAAAAAAAAAAAAAAAAAAAAA
AA AAAA AB BB BC AA BB AA AC AA AAAAAAAAAAAAAA AC AD AA
AA AAAAAA AB BC AA AB AB AA AAAAAAAAAAAAAAAAAAAAAA
AA AAAAAA BB BC AA BB AA AAAAAAAAAAAA AB AA AAAA AC AA
AA AAAAAAAA BC AA AB AB AA AAAAAAAAAA BC AA AAAAAAAA
AA AAAAAA AB BC AA AB AB AA AAAAAAAAAA AB AA AAAAAAAA
142
AA AAAAAA BD BC AA AB AB AA AAAAAAAAAAAAAAAA AC AE AA
AA AAAA AB CC BB AA BB AA AAAAAAAAAAAAAAAAAAAA AB AA
AA AAAAAA BC BB AA AB AB AA AAAAAAAAAAAAAAAAAAAAAA
AA AAAA AB CC BB AA AA AB AA AAAAAAAAAAAAAAAAAA AB AA
AA AAAAAA BB BC AA AB AB AA AAAAAAAAAAAAAAAAAAAAAA
AA AAAAAA AB BC AA AA BB AA AAAAAAAAAA AB AA AAAAAAAA
AA AAAA AB AC AB AA BB AA AAAAAAAAAAAAAAAAAAAA AB AA
AA AAAA AB BC BB AA BB BB AA AAAAAAAAAA AB AA AAAAAAAA
AA AAAA AB ABAB AA BB AA AAAAAAAAAAAAAAAAAAAAAAAA
AA AAAAAA BB AB AA AAAAAAAAAAAAAAAA BB AA AAAAAAAA
AA AAAAAA AC BC AA BB AA AAAAAAAAAAAAAAAAAAAAAAAA
AA AAAAAA CC BC AA AB AB AA AAAAAAAAAA AB AA AAAA AC AA
AA AAAAAA BB AC AA BB AA AAAAAAAAAAAAAAAAAA CC AA AA
AA AA AB AA BE BB AA BB AB AA AAAAAAAAAA AB AA AAAAAAAA
AA AAAAAA AB BC AA AB AA AAAAAAAAAAAA AB AA AAAAAAAA
AA AAAA BB AA CC AA AB AB AA AAAAAAAAAA BC AA AAAAAAAA
AA AAAA AB AC BC AA BB BB AA AAAAAAAAAAAAAAAAAAAAAA
AA AAAA AB AA BB AA AC AB AA AAAAAAAAAA AB AA AAAAAAAA
AA AAAA AB BB AC AA BB AA AAAAAAAAAAAA AB AA AAAAAAAA
AA AAAAAA AC AC AA AA AB AA AAAA AB AA AA AB AA AAAAAAAA
AA AAAAAAAA AB AA AB AB AA AAAAAAAAAA AB AA AAAAAAAA
AA AAAAAA BC BC AA BB AB AA AAAAAAAAAA AB AA AAAA AC AA
143
5
Approaches for Analysis of Mitochondrial Sequence Data
Monika Sodhi and Manishi Mukesh
ICAR-National Bureau of Animal Genetic Resources, Karnal, Haryana
________________________________________________________________________________________
2. Add 1 l of the mastermix to 10l of PCR product (50 to 100 ng) and set up the following
incubation protocol in the Thermal cycler.
37 C for 120 minutes
85 C for 15 minutes
4 C for infinity
144
3.
4.
5.
6.
7.
Make the final volume of the PCR product to 100l with MilliQ water.
Add 10l of 3 M Na acetate pH 5.5 and 250l of chilled 95 % ethanol.
Mix the tube well and incubate on ice for 20 30 minutes
Centrifuge at 13,000 rpm for 20 minutes and aspirate the supernatant.
Wash the pellet by adding 500l of 70% ethanol at room temperature and centrifuge at top speed
for 5 minutes.
8. Aspirate the supernatant and repeat the 70% ethanol wash once more.
9. Air dries the pellet and resuspend in suitable volume of water and check by agarose gel
electrophoresis. For the product size of 10002000 bp template of 1040 ng is sufficient for
sequencing
Setting up of cycle sequencing reaction
The ready reaction composition:
PCR Product
10-40 ng
Ready reaction Mix
1 l
5 X Sequencing Buffer
1.5 l
Primer (Forward/Reverse) 5 Pmol
Milli Q water
make up the volume 10 l
Mix the content briefly and keep it in a thermal cycler set at following reaction conditions
96C for 1 minute----- For Initial Denaturation
96C for 10 seconds
50C for 5 seconds
60C for 4 minutes for 30 cycles
4C for final storage
Purification of the sequencing product
After the sequencing reaction the products are purified by the following protocol
Add 2l for 125 mM EDTA to stop the reaction and mix well.
Add 2 l for 3 M Sodium acetate pH 4.6 to each reaction well.
Ensure the proper mixing of the contents
Add 50l of 95 % ethanol to each well and incubate at room temperature for 15 minutes
Spin at a speed of 1650g for 45 minutes at room temperature
Invert the plate on paper towel and give a short spin at 180g for removing supernatant.
Add 200l of 75 % ethanol and spin at 1650g for 5 minutes
Invert the plate slowly on paper towel and spin at 180g for 1 minute
Denaturation and sequencing
Add 10 l of Hi Di Formamide denature the products at 950C for 5 minutes and chill on ice
immediately for 5 minutes. The samples are ready for sequencing using automated DNA
sequencer. The Sequences with chromatogram can be visualized and further saved by ABI PRISM
DNA Sequencing Analysis Software.
mtDNA sequence data analysis
Individual chromatograms are checked manually and ambiguous bases are disregarded. Sequences
base calling is performed with Phred, poor sequence data based on signal and spacing and
sequence data of the primers is removed. The final sequences obtained from a panel of samples are
aligned using Sequencher or MEGA version 6.0 or any other software and a contig sequence is
generated for further analysis. A number of softwares are available freely for diversity and
phylogenetic analysis using mt DNA sequence data. These include:
145
After importing the data, go to analysis menu and various tests for intra population diversity and
interpopulation genetic distance can be performed. The commonly conducted analysis for mtDNA
include DNA polymorphism/divergence; haplotype/nucleotide diversity and divergence; Fus Fs
and genetic differentiation and gene flow among populations.
146
General statistics
Popular population
genetics test
DnaSP can compute several measures of DNA sequence variation within and between populations;
gene flow, gene conversion and linkage disequilibrium parameters. In addition, DnaSP can perform
Fus Fs statistics tests. It takes advantage of the Microsoft Windows capabilities, and can handle a
large number of sequences of thousands of nucleotides each on a microcomputer. Furthermore,
DnaSP can easily exchange data with other programs, for example, programs to perform multiple
sequence alignments, phylogenetic tree analysis, or statistical analysis.
Steps to create a network
The input file in Network 4.612 consists of nucleotide multiple sequence alignment (MSA) in RDF
format. To generate this kind of file DNAsp softwarecan be used.
The example of RDF file is
1111111111111111111111111166666666666666666666666666011111112222222222233333339112478811245777999122242149725937326048048
9457227
NC1a TCCGCTCCATTCCCGTCCTGTTCTTA 1
NC1b TCCGCTCCATTCCCGTCCTGTTCTTA 1
NC1c TCCGCTCCATTCCCGTCCTGTTCTTA 1
NC2a TCCGCTCTATTCCCGTCCTGTTCTTA 1
NC2b TCCGCTCTATTCCCGTCCTGTTCTTA 1
NC3 TCCGCTCTGTTCCCGTCCTGTTCTTA 1
NC4 TCCGCTCTGTTCCCGTCCTGTTCTCA 1
NC5a TTCACTTTGTTCCCGCTCTATTCTCA 1
NC5b TTCACTTTGTTCCCGCTCTATTCTCA 1
NC6a TTCACTCTGTTCCCGCCCTATTCTCA 1
NC6b TTCACTCTGTTCCCGCCCTATTCTCA 1
NC7 CTCACTCTGTTCCTGCCCTATTCTCA 1
NC8a TTCACTCTGTTCCCGCTCTATTCTCA 1
NC8b TTCACTCTGTTCCCGCTCTATTCTCA 1
NC9a CTCGCTCTGTTCCCGCTCTATTCTCA 1
NC9b CTCGCTCTGTTCCCGCTCTATTCTCA 1
NC10 TTCGCTCTGTTCCCGCTCTATTCTCG 1
NC11a TTCGCTCTGTTCCCGCTCTATTCTCA 1
NC11b TTCGCTCTGTTCCCGCTCTATTCTCA 1
NC11c TTCGCTCTGTTCCCGCTCTATTCTCA 1
NC11d TTCGCTCTGTTCCCGCTCTATTCTCA 1
NC11e TTCGCTCTGTTCCCGCTCTATTCTCA 1
NC12a TTCGCTCTGTCCCCGCTCTATTCTCA 1
NC12b TTCGCTCTGTCCCCGCTCTATTCTCA 1
NC12c TTCGCTCTGTCCCCGCTCTATTCTCA 1
NC12d TTCGCTCTGTCCCCGCTCTATTCTCA 1
NC12e TTCGCTCTGTCCCCGCTCTATTCTCA 1
NC12f TTCGCTCTGTCCCCGCTCTATTCTCA 1
NC12g TTCGCTCTGTCCCCGCTCTATTCTCA 1
NC12h TTCGCTCTGTCCCCGCTCTATTCTCA 1
NC12i TTCGCTCTGTCCCCGCTCTATTCTCA 1
NC13 TTCGCTCTATCCCCGCTCTATTCTCA 1
NC14 TTCGCTCTGTTCCCGTTCTATTCTCA 1
NC15a TTCGCTCTGTTCCCGTTCTACTCTCA 1
NC15b TTCGCTCTGTTCCCGTTCTACTCTCA 1
147
NC16 TCCGCTCTGTTTCCGCCCTGTTTTCA 1
NC17 TCCGTTCTGTTCCCGCCCTGCTCTCA 1
NC18a TCCGTTCTGTTCCCGCCCTGTCCTCA 1
NC18b TCCGTTCTGTTCCCGCCCTGTCCTCA 1
1010101010101010101010101010101010101010101010101010
148
6
SNPs detection, Genotyping and Submission
R.S. Kataria, S.K. Niranjan, S.K. Mishra and Karanveer Singh
ICAR-National Bureau of Animal Genetic Resources, Karnal (Haryana)
________________________________________________________________________________________
Figure 1: Input file for NEBcutter showing the restriction enzyme site 4th nucleotide being polymorphic C/G.
Figure 2: Output file from NEBcutter showing the restriction enzyme sitesat polymorphic site due to
nucleotide C at polymorphic site.
149
Figure 3: Input file for NEBcutter showing the restriction enzyme site 4th nucleotide being polymorphic C/G.
Figure 4: Output file from NEBcutter showing the restriction enzyme sitesat polymorphic site due to
nucleotide G at polymorphic site. Note the abolition of sites when it is C and creation of new site
when it is G.
150
Figure 6: An output file of tetra-ARMS PCR primer designing tool, showing four primers' choice along with
sequences, melting temperature and expected products size information.
151
individual, and the experimental method(s), protocols, and conditions used to assay the
variation.Forpreparing
a
submission
to
dbSNPthere
are
online
instructions
[http://www.ncbi.nlm.nih.gov/SNP/get_html.cgi?whichHtml=how_to_submit]. A short tag or
abbreviation called Submitter HANDLE uniquely defines each submitting laboratory and groups
the submissions within the database.
153
Figure 9: Summary of current release (Build 142) of dbSNP, showing new submissions and build statistics @
http://www.ncbi.nlm.nih.gov/SNP/ snp_summary.cgi.
Searching dbSNP
The
SNP
database
can
be
explored
from
the
dbSNP
homepage[http://www.ncbi.nlm.nih.gov/SNP/] by using the Entrez SNP searchbox at the top
ofthe page or by using the links to eight basic dbSNP search options located just below the
EntrezSNPSearchbox. For single record query in dbSNP, Search by IDs query module is used to
select SNPs based on dbSNP record identifiers. These include referenceSNP (refSNP) cluster ID
numbers (rs#), submitted SNP Accession numbers (ss#), and local (or submitter) IDs for the same
variations.There
are
different
options
available
at
[http://www.ncbi.nlm.nih.gov/books/NBK44371/#Search.how_do_i_search_dbsnp] for searching
SNPs of interest.
154
7
Web Resources and Tools for Genomic Research
S K Niranjan, ManikaSehgal and R S Kataria
ICAR- National Bureau of Animal Genetic Resources, Karnal, Haryana
________________________________________________________________________________________
For starting a molecular work, it is always essential to get references about any kind of nucleotide
sequence(s), gene(s), extragenic region(s), chromosome, genome, rRNA, cDNA, EST, amino acid
sequence(s) of any species or at least common species. A number of databases available on the net
can be used for the search of such kind of data. Worldwide, three public databases are bearing
major responsibility to store and share almost all type of nucleotide and protein sequence data:
GenBank at the NCBI, DNA Database of Japan (DDBJ) and European Molecular Biology Laboratory
(EMBL) Nucleotide Sequence Database at EBI, England. GenBank, one of the largest databases
possess 173,353,076 non-WGS, non-CON records containing 161,822,845,643 base pairs of sequence
data. In addition, there are 175,779,064 WGS records containing 719,581,958,743 base pairs of
sequence data (GenBank Release 202.0; June, 2014). From 1982 to the present, the number of bases
in GenBank has been doubled in approximately every 18 months. Some other specific databases like
Whole-genome shotgun (WGS), Ensembl, Pfam etc. are also available. For candidate gene analysis,
we generally use the NCBI, Ensemble databases for search of genomic data. Here, we have enlisted
different databases, which can be used for search of the reference sequences.
WEB RESOURCES
Databases used in genomic and proteomics research
NCBI
National Center for Biotechnology Information (NCBI). www.ncbi.nlm.nih.gov
GenBank
GenBank is the NIH genetic sequence database, an annotated collection of all
publicly available DNA sequences. www.ncbi.nlm.nih.gov/genbank/
RefSeq
NCBI Reference Sequence Database is a collection of sequences, which provides a
comprehensive, integrated, non-redundant, well-annotated set of sequences,
including genomic DNA, transcripts, and proteins.www.ncbi.nlm.nih.gov/refseq
PubMed
PubMed comprises more than 23 million citations for biomedical literature from
MEDLINE, life science journals, and online books. Citations may include links to
full-text
content
from
PubMed
Central
and
publisher
web
sites.www.ncbi.nlm.nih.gov/pubmed
OMIM
OMIM is a comprehensive compendium of human genes and genetic phenotypes. Its
official home is omim.org.www.ncbi.nlm.nih.gov/omim
dbSNPs
Database of single nucleotide polymorphisms (SNPs) and multiple small-scale
variations that include insertions/deletions, microsatellites, and non-polymorphic
variants.www.ncbi.nlm.nih.gov/snp/
EST
The EST database is a collection of short single-read transcript sequences from
GenBank. These sequences provide a resource to evaluate gene expression, find
potential variation, and annotate genes. http://www.ncbi.nlm.nih.gov/est
dbEST
dbEST is a division of GenBank that contains sequence data and other information
on "single-pass" cDNA sequences, or "Expressed Sequence Tags", from a number of
organisms.www.ncbi.nlm.nih.gov/genbank/dbest
WGS
Whole Genome Shotgun (WGS) sequencing projects are incomplete genomes or
incomplete chromosomes that are being sequenced by a whole genome shotgun
strategy.www.ncbi.nlm.nih.gov/genbank/wgs
155
HTG
The High Throughput Genomic (HTG) Sequences division contains unfinished DNA
sequences generated by the high-throughput sequencing centers. Sequence data in
this division are available for BLAST homology searches against either the "htgs"
database or the "month" database, which includes all new submissions for the prior
month. It was done in a coordinated effort among the International Nucleotide
Sequence
databases,
DDBJ,
EMBL,
and
GenBank.www.ncbi.nlm.nih.gov/genbank/htgs
EMBL
European Bioinformatics Institute; Website: www.embl.org/
1000 Genomes -It is a catalog of shared human genetic variation in population groups worldwide.
www.1000genomes.org/
ArrayExpress -This is a database of functional genomics experiments that can be queried and the
data downloaded. It includes gene expression data from microarray and high
throughput sequencing studies.
www.ebi.ac.uk/arrayexpress/
Database of Genomic Variants archive-The Database of Genomic Variants archive (DGVa) is a
repository that provides archiving, accessioning and distribution of publicly
available genomic structural variants, in all species.
www.ebi.ac.uk/dgva/
PromoterWise-It compares two DNA sequences allowing for inversions and translocations, ideal
for promoters.
EBI Metagenomics- A resource for the analysis and archiving of metagenomic data.
www.ebi.ac.uk/metagenomics/
EMBOSS Tools-Selected EMBOSS tools for sequence analysis, providing: pairwise sequence
alignment, sequence format conversion, sequence translation and back-translation,
and sequence statistics. www.ebi.ac.uk/Tools/emboss/
Ensemble
The Ensembl project produces genome databases for vertebrates and other
eukaryotic species. www.ensembl.org/index.html
European Nucleotide Archive- http://www.ebi.ac.uk/ena/home. The European Nucleotide
Archive (ENA) provides a comprehensive record of the world's nucleotide
sequencing information, covering raw sequencing data, sequence assembly
information and functional annotation.
Immuno Polymorphism Database- http://www.ebi.ac.uk/ipd/ The Immuno Polymorphism
Database (IPD), was developed in 2003 to provide a centralised system for the study
of polymorphism in genes of the immune system. The IPD project was established by
the HLA Informatics Group of the Anthony Nolan Research Institute in close
collaboration with the European Bioinformatics Institute.
Pfam
http://pfam.sanger.ac.uk/The Pfam database is a large collection of protein families,
each represented by multiple sequence alignments and hidden Markov models
(HMMs).Sanger Centre
Rfam
http://rfam.sanger.ac.uk/This database is a collection of RNA families, each
represented by multiple sequence alignments, consensus secondary structures and
covariance models (CMs).
GenomeNet www.genome.jp. GenomeNet is a Japanese network of database and computational
services for genome research and related research areas in biomedical sciences,
operated by the Kyoto University Bioinformatics Center.
GenomeNet Database Resources
DBGET: Integrated Database Retrieval System
156
KEGG:
Kyoto
Encyclopedia
of
Genes
and
Genomes
www.genome.jp/kegg/pathway.html
KEGG PATHWAY - Systems information: pathways
KEGG BRITE - Systems information: ontologies
KEGG Organisms - Organism-specific entry points
KEGG GENES - Genomic information
KEGG LIGAND - Chemical information
GenomeNet Bioinformatics Tools
KEGG
database resource for understanding high-level functions and utilities of the
biological system, such as the cell, the organism and the ecosystem, from molecularlevel information. used for mapping of molecular datasets in genomics,
transcriptomics, proteomics and metabolomics for biological interpretation.
KEGG PATHWAY collection of manually drawn pathway maps for various prokaryotes and
eukaryotes representing molecular interaction and reaction networks for Metabolism
(Carbohydrate, Energy, Lipid, Nucleotide, Amino acid, Other amino Glycan
Cofactor/vitamin, Other secondary metabolite, Xenobiotics Chemical structure),
Genetic Information Processing, Environmental Information Processing, Cellular
Processes, Organismal Systems and Human Diseases
DDBJ (DNA Data Bank of Japan) http://www.ddbj.nig.ac.jp/
DNA Data Bank of Japan (DDBJ) is the sole nucleotide sequence data bank in Asia,
which is officially certified to collect nucleotide sequences from researchers and to
issue the internationally recognized accession number to data submitters. Since we
exchange the collected data with ENA/EBI; European Bioinformatics Institute and
NCBI; National Center for Biotechnology Information on a daily basis, the three data
banks share virtually the same data at any given time. The virtually unified database
is called "INSD; International Nucleotide Sequence Database". DDBJ collects
sequence data mainly from Japanese researchers, but of course accepts data and issue
the accession number to researchers in any other countries.
GenomeReviews-European Bioinformatics Institute, www.ebi.ac.uk/GenomeReviews
UniProt
European Bioinformatics Institute, www.uniprot.org/.The mission of UniProt is to
provide the scientific community with a comprehensive, high-quality and freely
accessible resource of protein sequence and functional information.
UNIProtKB-Protein knowledgebase consists of two sections: Swiss-Prot, which is
manually annotated and reviewed. TrEMBL, which is automatically annotated and is
not reviewed.It, includes complete and reference proteome sets.
UniProt/SwissProt- (Swiss Institute of Bioinformatics) www.ebi.ac.uk/swissprot/
Protein Data Bank in Europe (PDBe) -http://www.ebi.ac.uk/pdbe
PDBe is the European resource for the collection, organisation and dissemination of
data on biological macromolecular structures. In collaboration with the other
worldwide Protein Data Bank (wwPDB) partners - the Research Collaboratory for
Structural Bioinformatics (RCSB) and BioMagResBank (BMRB) in the USA and the
Protein Data Bank of Japan (PDBj) - we work to collate, maintain and provide access
to the global repository of macromolecular structure data.
PDBj
Protein Data Bank Japan; http://pdbj.org/
It maintains a centralized PDB archive of macromolecular structures and provides
integrated tools, in collaboration with the RCSB, the BMRB in USA and the PDBe in
EU. PDBj is supported by JST-NBDC and Osaka University
157
www.PDB
http://www.wwpdb.org/
The Worldwide Protein Data Bank (wwPDB) consists of organizations that act as
deposition, data processing and distribution centers for PDB data.1 Members are:
RCSB PDB (USA), PDBe (Europe) and PDBj (Japan), and BMRB (USA). The wwPDB's
mission is to maintain a single PDB archive of macromolecular structural data that is
freely and publicly available to the global community.
PDB-Protein Data Bank- www.rcsb.org/An Information Portal to Biological Macromolecular
Structures.
PROSITE (Swiss Institute of Bioinformatics) prosite.expasy.org/
UniProt/PIR (National Biomedical Research Foundation) http://pir.georgetown.edu/
Primer Designing
A number of primer designing tools are available on internet. Most of the designing tools or
programmes are paid but some are online free to use. Among different programmes freely available
online, Primer-3, PrimerBlast and PerlPrimer, are most easy and users friendly.
PRIMER3 programme (http://bioinfo.ut.ee/primer3-0.4.0/)
It is a widely used program for designing PCR primers. It can also design
hybridization probes and sequencing primers. The primer3 has many different input
parameters that you control and that tell primer3 exactly what characteristics make
good primers for your goals. This programme gives a choice to specify the target like
simple sequence repeat site or SNP, to exclude an specific region, included region,
product size length, 3 stability of the primer, primer size, melting temperature (Tm),
primer GC content, complementarity etc.
Primer-BLAST It was developed at NCBI to help users make primers that are specific to the input
PCR template. It uses Primer3 to design PCR primers and then submits them to
BLAST search against user-selected database. The blast results are then
automatically analyzed to avoid primer pairs that can cause amplification of targets
other than the input template.
PerlPrimer
It is an open-source GUI application written in Perl that designs primers for
standard PCR, bisulphite PCR, real-time PCR (QPCR) and sequencing. It aims to
automate and simplify the process of primer design.
OLIGO Primer Analysis
Software is the essential tool for designing and analyzing sequencing and PCR
primers, synthetic genes, and various kinds of probes including siRNA and
molecular beacons. Based on the most up-to date nearest neighbor thermodynamic
data, Oligo's search algorithms find optimal primers for PCR, including TaqMan,
highly multiplexed, consensus or degenerate primers. Multiple file batch processing
is possible. It is also an invaluable tool for site directed mutagenesis.
ExonPrimer It helps to design intronic primers for the PCR amplification of exons. The script
needs a cDNA and the corresponding genomic sequence as input. It aligns these
sequences using Blat and designs PCR primers to amplify each exon using Primer3.
The positions of the exons are deduced from the alignment of the genomic and the
cDNA sequences. Insertions/deletions up to 6 base pairs are bridged by
postprocessing. Exons with small introns in-between are combined. The user can
define the maximum exon size. Exons larger than this size will be divided into
several parts.
GeneFisherInteractive PCR Primer Design is another good site for primer designing.There are
certain programmes, which allow primer designing from the amino acid sequences.
Few sites are: Reverse Translate a Protein, iCODEHOP.
158
159
FASTM/S/F These specialist programs allow searches of databases using a group of short
peptides as the query.
BLAST
NCBI BLAST is the most commonly used sequence similarity search tool. It uses
heuristics to perform fast local alignment searches.Protein Nucleotide Vectors
WU-BLAST is similar to NCBI BLAST but combines multiple parameter options into a simpler
'sensitivity' setting. Protein Nucleotide
PSI-BLAST allows users to construct and perform a BLAST search with a custom, positionspecific, scoring matrix which can help find distant evolutionary relationships. PHIBLAST functionality is also available to restrict results using patterns. Protein
Statistical Analysis of Protein Sequences (SAPS) http://www.ebi.ac.uk/Tools/seqstats/saps/
SAPS evaluate a wide variety of protein sequence properties using statistics.
Properties considered include compositional biases, clusters and runs of charge and
other amino acid types, different kinds and extents of repetitive structures, locally
periodic motifs, and anomalous spacings between identical residue types.
Sequence Analysis from GenomeNet Database Resources www.genome.jp
BLAST / FASTA - Sequence similarity search
MOTIF - Sequence motif search
CLUSTALW / MAFFT / PRRN - Multiple alignment
Alignment
Clustal Omega is a new multiple sequence alignment program that uses seeded guide trees and
HMM
profile-profile
techniques
to
generate
alignments.
http://www.ebi.ac.uk/Tools/msa/clustalo/
ClustalW2-PhylogenyCommonly used phylogenetic tree generation methods provided by the
ClustalW2 program.
http://www.ebi.ac.uk/Tools/phylogeny/clustalw2_phylogeny/
DaliLite
Pairwise alignment of protein structures. DaliLite computes optimal and suboptimal
structural alignments between two protein structures. It compares all chains in the
first structure against all chains in the second (unless specific chain IDs are given).
The resulting superimposed coordinate files can be downloaded or viewed
interactively in Jmol. http://www.ebi.ac.uk/Tools/structure/dalilite/
Multiple Sequence Alignment (MSA) http://www.ebi.ac.uk/Tools/msa/
ClustalOmegaNew MSA tool that uses seeded guide trees and HMM profile-profile techniques to
generate alignments. Suitable for medium-large alignments.
ClustalW2
Popular MSA tool that uses tree-based progressive alignments. Suitable for medium
alignments.
DbClustal
Create a Multiple Sequence Alignment from a protein BLAST result using the
DbClustal program.
Kalign
Very fast MSA tool that concentrates on local regions. Suitable for large alignments.
MAFFT
MSA tool that uses Fast Fourier Transforms. Suitable for medium-large alignments.
MUSCLE
Accurate MSA tool, especially good with proteins. Suitable for medium alignments.
MView
Transform a Sequence Similarity Search result into a Multiple Sequence Alignment
or reformat a Multiple Sequence Alignment using the MView program.
T-Coffee
Consistency-based MSA tool that attempts to mitigate the pitfalls of progressive
alignment methods. It is suitable for small alignments.
WebPRANK The EBI has a new phylogeny-aware multiple sequence alignment program which
makes use of evolutionary information to help place insertions and deletions.
Pairwise Sequence Alignment http://www.ebi.ac.uk/Tools/psa/
160
Global Alignment: Global alignment tools create an end-to-end alignment of the sequences to be
aligned. There are separate forms for protein or nucleotide sequences.
Needle
EMBOSS Needle creates an optimal global alignment of two sequences using the
Needleman-Wunsch algorithm.
Stretcher
Stretcher uses a modification of the Needleman-Wunsch algorithm that allows larger
sequences to be globally aligned.
Local Alignment Local alignment tools find one, or more, alignments describing the most similar
region(s) within the sequences to be aligned. There are separate forms for protein or
nucleotide sequences.
Water
Water uses the Smith-Waterman algorithm (modified for speed enhancements) to
calculate the local alignment of two sequences.
Matcher
Matcher identifies local similarities between two sequences using a rigorous
algorithm based on the LALIGN application.
LALIGN
LALIGN finds internal duplications by calculating non-intersecting local alignments
of protein or DNA sequences.
Genomic alignment tools concentrate on DNA (or to DNA) alignments while accounting for
characteristics present in genomic data.
Wise2DBA
Wise2DBA (DNA Block Aligner) aligns two sequences under the assumption that the
sequences share a number of colinear blocks of conservation separated by potentially
large and varied lengths of DNA in the two sequences.
GeneWise
GeneWise compares a protein sequence to a genomic DNA sequence, allowing for
introns and frameshifting errors.
SNP Analysis
HaploBlock SNP Haplotyping and Linkage Disequilibrium Mapping using Models of Haplotype
Block Variation. HaploBlock is a software program which provides an integrated
approach to haplotype block identification, haplotyping SNPs (or haplotype phasing,
resolution or reconstruction) and linkage disequilibrium (LD) mapping (or genetic
association studies). HaploBlock is suitable for high density haplotype or genotype
SNP marker data and is based on a statistical model which takes account of
recombination hotspots, bottlenecks, genetic drift and mutations and has a Markov
Chain at its core.
bioinfo.cs.technion.ac.il/haploblock/
POPGENE
It is User-friendly computer freeware for the analysis of genetic variation among and
within populations using co-dominant and dominant markers. It computes both
comprehensive genetic statistics (e.g., allele frequency, gene diversity, genetic
distance, G-statistics, F-statistics) and complex genetic statistics (e.g., gene flow,
neutrality
tests,
linkage
disequilibria,
multi-locus
structure).
http://www.ualberta.ca/~fyeh/
ARLEQUIN Population genetic analysis package that includes haplotype estimation by the
expectation maximization (EM) algorithm and LD analysis for locus pairs;
significance tested by permutation method" - http://anthro.unige.ch/arlequin/
Protein Analysis Web Resources
Protein Functional Analysis tools described on this page are provided using our new
bioinformatics analysis tools framework. At present a subset of the protein functional
analysis tools available at EBI are available in the new framework.
http://www.ebi.ac.uk/Tools/pfa/
CENSOR
Identify and/or mask repeat sequences in protein sequence data.
161
162
WEB TOOLS
Candidate genes can be analysed in various ways by using a number of bioinformatics tools and
programmes available on net. However, there are always chances for misinterpretation of the data
by using the programmes without knowing its concept or principle. Therefore, it is very essential to
know how about the programmes, before handling these tools. Before taking any decision, it is
necessary that these programmes say about the theorem on which they are based upon, their
strength i.e. ability to analyse the data and weakness or constraints. Therefore, before analysing the
data with help of any web based software programmes it is essential to read about the programme
particularly knowhow. Most of the time, these programmes get upgraded frequently; therefore, it is
also essential to get informed about the updated programmes.
Primer designing using PRIMER3 programme
The specificity and efficiency of a primer depend on several factors which must be taken into
account while designing primers. The optimal length of general PCR primers ranges between 18-24
bases. However, for multiplexing purpose the length may be as long as 30 to 35 bp.If the primer is
too short, it results in low specificity, hence, thereby induces non-specific amplification. On the
contrary, very long primers tend to decrease the template-binding efficiency at normal annealing
temperature due to the higher probability of forming secondary structures such as hairpins. Longer
primers also require more time to anneal with the complementary target sequence and to denature
in the next recycling step. It makes the PCR to compromise with the quantity of the amplicon. In
general, the optimal G/C content is between 45-55%, with an acceptable range of 40-60%. The G/C
content ultimately determines the annealing temperature. Permissible T m difference between the
primers is less than 5C, preferably within 2C.Primer pair T m mismatch can lead to poor
amplification. The primer with the higher T m will misprime at lower temperatures, while the other
primer with the lower T m may not work at higher temperature. The 3-terminus of the primer is
very important, since the DNA amplification occurs in 5 to 3 direction. It increases efficiency of the
primers. G/C clamp refers to the presence of G or C within the last 4 bases from the 3-end of
primers. G/C clamp thus prevents mispriming and enhances specific primer-template binding.
Steps for Primer Designing:
1. Open the online Primer3 (version 4) software by using the
URLhttp://www.frodo.wi.mit.edu/.
2. Paste the nucleotide sequence (in FASTA format) in the Box for source sequence in the
Primer3 page.
163
3. Set the required parameters. However, the parameters mentioned by default can be set.
4. Click on the Pick Primers option to get the primers.
5. Out put file on window will show 3-5 sets of primer. Prefeered primer setcan be selected
based on product size and the target covered, several primer parameters.
Sequence Submission
Sequence may be submitted to the any three major sequence databases i.e. NCBI-GenBank, DDBJ
and EMBL. However, preferred method is NCBI GenBank. We can submit the sequences through
using BankIt or Sequin of NCBI.
Bankit: It is online submission tool at NCBI. BankIt is used for a single sequence or a small batch of
different sequences. It is preferred method if the feature annotation for your sequences is not
complicated. It is needed to open an account in the name of submitter. Once a submitter registers to
use BankIt, the submitter's contact information is saved and is automatically displayed each
subsequent time the submitter logs in to submit. BankIt allows submitters to navigate and edit
previously visited pages.Sequence data can be either cut-and-pasted as text or uploaded as file.
BankIt does not have direct update option. The GenBank Submissions Handbook [Internet] can be
consulted
for
the
GenBank
submission
either
using
bankIt
or
Sequin
(http://www.ncbi.nlm.nih.gov/books/NBK63585/)
164
The input sequence file for Sequin has to be in FASTA format. If it is protein coding gene sequence,
a separate FASTA file of amino acids is also required. There are drop down menu to select the
options for the information required like species name, sequence features like cds start site,
completeness of sequence, UTRs etc. Once the sequin file is in order i.e. error free, a message will
come in the end to submit the file to NCBI through email at the email ID- gb-sub@ncbi.nlm.nih.gov.
We have the choice to stop our data from putting in public domain by giving a release date. Once
the sequence file is submitted to GenBank, an accession number is given to the file sent through
email to the submitting author within 2-3 days of submission. Later after the file is processed, a flat
file for checking and approval is sent and the sequence is released in the public domain on due date
or immediately after publication of data, whichever is earlier. Larger submissions should be made
with a command-line program, Tbl2asn. It automates the creation of sequence records for
submission to GenBank and uses many of the same functions as Sequin.
PolyPhen-2 (Polymorphism Phenotyping v2):It is a tool which predicts possible impact of an
amino acid substitution on the structure and function of a human protein using straightforward
physical and comparative considerations. http://genetics.bwh.harvard.edu/pph2/
165
Phylogenetic Analysis
A phylogenetic tree is a graph composed of branches and nodes. Phylogeny deduces the correct
tree for molecular sequence data that define evolution of genes and proteins families. It also
estimates the time of divergence between organisms since the time of sharing a common ancestor.
Generally, we generate inferred trees from available data based on some model, which should be
very near to the true tree based on actual events occurred during evolution. For example, with 10
taxonomic units, about 34 million rooted trees can be generated; however, an exhaustive search
examines all possible trees and selects the one with the most optimal features, such as the shortest
overall sum of the branch lengths.
Phylogenetic tree is developed in five stages viz. selection of sequences for analysis, multiple
sequence alignment, determination of statistical model of nucleotide/amino acid evolution, tree
building and tree evaluation. Multiple alignment is a critical step in phylogenetic analysis. If you
misalign or wrongly align a group of sequences, the tree developed by that means will not reflect
true biological evolution. It is to be sure that all the sequences are homologous. It can be further
tested by performing pairwise alignment, if expect value is significant. Always remove nonhomologous sequence(s) from the group. For proteins, which share a domain but not the other
region, should be analysed for the shared domain only. For more number of sequences, a heuristic
algorithm is used to identify an optimal tree. Rather, heuristic algorithm discards a vast numbers of
non-useful trees. Heuristic algorithms have an inherent trade-off between search time and
confidence in the search result. One can assume that they provide an approximation of the best
tree. Phylogenetic trees are built based upon two concepts viz. distance based and character based.
Distance based methods work upon the number of DNA or amino acid changes occurred during
pairwise comparison. Commonly used distance based methods are Minimum Evolution (ME),
Fitch-Margoliash (FM), UPGMA and Neighbor Joining (NJ) methods. NJ method is fastest.
Maximum Parsimony and Maximum likelihood are commonly used character based methods.
UPGMA method assumes that the rate of evolution has remained constant throughout the
evolutionary history of the included sequences/taxa, therefore, it produces a rooted tree. Maximum
Parsimony can be used when there is very high sequence similarity, whereas, maximum likelihood
may be better when there is very low sequence similarity. Reliability of a tree can be evaluated by
using the bootstrap method. It evaluates the accuracy of tree through evaluating the probability for
the members of a clade to be a part of the true tree. Higher the score (or closer to 100), more
significant grouping of the branches.
Phylogenetic Analysis using MEGA 6: Phylogenetic tree can be drawn by using various software
programmes. MEGA6 includes many statistical methods for the study of molecular evolution. It
may be downloaded from www.megasoftware.net. It also contains a fully functional Web Browser
for retrieval of sequence(s) directly from web exploration and allows to directly access the NCBI for
sequence alignment and inferring the phylogenetic tree. Phylogenetic tree, using MEGA 6 software
166
can be derived relying upon the models of DNA or amino acid substitution(s). These models are
also used to evaluate the evolutionary distance between sequences and estimation of divergence
time. Commonly used substitutions models are Number of differences, p-distance, Jukes-Cantor,
Tajima-Nei, Kimura-2 parameter ,Tamura 3-Parameter, Tamura-Nei, Maximum Composite
Likelihood, Nei- Gojobori for nucleotide and Number of differences, p-distance, poisson and
Dayhoff for amino acid sequences.
Steps for the MEGA6 are given in the programme itself. Major steps include-Assembling data for
analysis, building sequence alignment using MUSCLE and CLUSTALW, evolutionary analysis
(computing basic statistical quantities for sequences, computing evolutionary distances using
different nucleotide substitution and amino acid substitution, Synonymous and non-synonymous
substitution models), constructing phylogenetic tree using different methods (also includes
statistical and bootstrap tests for reliability), molecular clock test (including Tajimas Test for
relative substitution test), Tests of selection (operated) based on synonymous/nonsynonymous
tests and Tajimas test of neutrality.
Steps for MEGA6
Download MEGA6 from the web and open MEGA6
Go for Align at left, Edit/Build alignment
Select Create New alignment and click OK
Select DNA for DNA Sequence or Protein for Protein seq analysis under Datatype for
Alignment
It will open next Window. Go edit. Select Insert sequence from file
Select file (FASTA/txt file) from source(You can also make txt file by pasting sequence in
Notepad and save, then select this file ex. Test.txt).
167
Sequence will be automatically come at MEGA window. You can change name by selecting
Test at extreme left showing under Species/abbrv)
If you dont have reference seq then select Web and then Do BLAST Search. It will
automatically link you with NCBI BLAST, If you have ref seq, then convert in FASTA/ .txt
file by pasting on Notepad and do same procedure.
Select BLAST similar to NCBI BLAST.It will show same results as in NCBI BLAST.
Now select sequences, you want to align as reference sequences.Then click add To
Alignment
It will open a window M6: Input Sequence Label. Here you can label each sequence
separately as First word (for naming first ref seq e.g. Buffalo/ Bubalus bubalis), Second
Word..Or you can escape by clicking OK directly (i.e. without naming)
You can change name in same way as we changed for Test Sequence.
Now click Alignment under M6: Alignment Explorer Window
168
Select align by Clustal W or can select any other. If the sequence is cDNA/mRNA, you
can opt ----(codon) option.
Click OK when being asked for select all?Then M6: ClustalW parameters window will
open.Click OK from that window. (Keep the values at default).
It will lead to alignment of sequences under M6: Alignment explorer window. Now you
can remove unaligned part by selection and then delete from keyboard ( you can select
like in Excel or can select unaligned part individually) removal will make better results.
Delete unaligned part from both ends.
Now open Data and Phylogenetic Analysis from dropbox.
It will open Confirm about protein coding nucleotide sequence data? Click Yes if your
seq is protein coding (select yes, even if it has introns inside also)
Now come to MEGA6.06 (6140226) and select Phylogeny. Select option
Construct/Test Nieghbor-Joining Tree or other.
169
It will ask about Use the active file would you like to use currently active data.
Click Yes. It will open M6: Analysis Preferences window.
This window can give you the preference about what kind of tree and on what basis it is to
be generated. Further, what method will be used for tree generation. Pl. keep remember for
these preferences/ options, whatever as default or you are modifying. All yellow strips
under this, you can make change as per options given (for ex. Below tree is being generated
by Tajima Nei Model under Model/Method
Go at Test of phylogeny option. Select Bootstrap method from dropbox.
At next option Number of bootstraps application you can increase/decrease bootstrap
values. However default value 500 will be OK. For others, you can take default values.
Then go for Compute. It will open M6: Tree Explorer which will have tree. For details
about tree, click caption in this window. It will give you details about method of
phylogeny. It is needed during publication so save it. You can change tree type by selecting
caption showing trees (no title is given), select radiation or other. For copying the tree, go
to Image then copy to clipboard and paste the tree at desired place like Word file etc.
You can copy the content from caption also.
170
You can save the tree as desired place. Next time you can open this file directly, which will
show you tree as well as window (MEGA6.06) with aligned sequence files. You can estimate
divergence by selecting Distance then Compute pairwise distance
It will ask use the active file say Yes. Again it will lead to M6: Analysis preference
you can keep default values but remember these values. Then click Compute.
Will open new window, showing M6: pairwise distances
References
Breslauer, K.J., Frank, R., Blcker, H. and Marky, L.A. 1986. Predicting DNA duplex stability from the base
sequence. PNAS(USA)., 83: 3746-3750. (http://www.pnas.org/content/83/11/3746).
Markoff, A., Savov, A., Vladimirov, V., Bogdanova, N., Kremensky, I. and Ganev, V. 1997. Optimization of
single-strand conformation polymorphism analysis in the presence of polyethylene glycol. Clin. Chem.,
43(1): 30-3. (http://www.clinchem.org/content/43/1/30.long). (A correction has been published in
http://www.clinchem.org/content/43/4/692).
Rozen, S. and Skaletsky, H.J. 2000.Primer3 on the WWW for general users and for biologist programmers. In:
Krawetz S, Misener S (eds) Bioinformatics Methods and Protocols: Methods in Molecular Biology.
Humana Press, Totowa, NJ, pp 365-386. Source code available at http://fokker.wi.mit.edu/primer3/.
SantaLucia, J, Jr. 1998. A unified view of polymer, dumbbell, and oligonucleotide DNA nearestneighborthermodynamics.PNAS.95: 1460-1465.DOI:10.1073/pnas.95.4.1460
Tamura K, Stecher G, Peterson D, Filipski A, and Kumar S (2013) MEGA6: Molecular Evolutionary Genetics
Analysis version 6.0. Molecular Biology and Evolution:30 2725.
Thornton, B. and Basu, C. 2011. Real-Time PCR (qPCR) Primer Design Using Free Online Software.
Biochemistry and Molecular Biology Education, 39: 145-154. DOI: 10.1002/bmb.20461
Ye J., Coulouris G., Zaretskaya I., Cutcutache I., Rozen S. and Madden T. 2012. Primer-BLAST: A tool to
design target-specific primers for polymerase chain reaction. BMC Bioinformatics, 13(1): 134.
doi:10.1186/1471-2105-13.
Zuker, M. 2003. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids
Res.,31(13), 3406-3415. doi: 10.1093/nar/gkg595 .
171
8
Statistical Procedures for Identification of Quantitative Trait Loci
Upasna Sharma and R K Vijh
ICAR- National Bureau of Animal Genetic Resources, Karnal (Haryana)
________________________________________________________________________________________
A QTL is a region of any genome that is responsible forvariation in the quantitative trait of interest.
The goal ofidentifying all such regions that are associated with aspecific complex phenotype might
at first, seem quitesimple, especially with all the genomic and computationaltools available to help
us.Detecting a QTL was the motivation for many scientific investigations and was an achievable
goal. Presently, the trend is on locating the multiple interacting QTL that are associated with
multiple traits, by continually evolving sophisticated statistical analyses. As more and more new
technologies, methodologies are developed;we must remember that no single technological
advance or statistical method will unravel the genomic mystery. Instead, it will be the
conglomeration of ideas, techniques and analyses that provide the end to this Endeavour.
Unfortunately, the taskof QTL detection and their interaction among several others is difficult
because of the sheer number of QTL, and thepossible Epistasis or interactions between QTLs.
Tocombat this,QTL experiments can be designed with theaim of containing the sources of variation
to a limitednumber, so that dissection of a complex phenotypemight be possible. In general, a large
sample of individualshas to be collected to represent the total population,to provide an observable
number of recombinants andto allow a thorough assessment of the trait under investigation.Using
this information, coupled with one ofseveral methodologies to detect or locate QTL, associations
between quantitative traits andgenetic markers are made as a step towards understandingthe
genetic basis of complex traits.The first step in any QTL-mapping experiment isusually to construct
populations that originate fromhomozygous, inbred parental lines.The resultingF1 lines will tend to
be heterozygous at all markers andQTL. From the F1 population, crosses are made, and the
segregation of markersand QTL are statistically modeled. In general, experimentersassume that
markers are segregating randomly,but if, in fact, markers are subject to Segregation distortion, it is
not possible to anticipate how the resultingestimates of recombination will be affected, as wellas
any potential QTL locations. Once the data iscollectedon each individual, statistical
associationsbetween the markers and quantitative trait are establishedthrough statistical
approaches that range fromsimple techniques, such as analysis of variance (ANOVA), to models
that include multiple markers andinteractions. The simpler statistical approaches tend tobe
methods of QTL detection that assess differences inthe phenotypic means for single-marker
genotypicclasses. The actual location of QTL involves an estimatedgenetic map with known
distances betweenmarkers, and evaluations of likelihood function that ismaximized over the
established parameter space.
Single-marker tests
Simple, single-marker tests (for example, using t-test,ANOVA and simple linear regression
statistics)that assess the segregation of a phenotype with respectto a marker genotype, indicate
which markersare associated with the quantitative trait of interestand, therefore, point to the
existence of potential QTL. Typically, the null hypothesis tested is that themean of the trait value is
independent of the genotypeat a particular marker. The null hypothesis isrejected when the test
statistic is larger than a crucialvalueand the implication isthat a QTL is linked to the marker under
investigation.Although the t-test, ANOVA and simple linearregression approach are all equivalent
to each otherwhen their hypotheses are testing for differences inthe phenotypic means, they fail to
provide a closed form estimate of QTL location, or recombination frequencybetween the marker
172
and the QTL. This isbecause the QTL effect and the location are confounded,or are unable to be
estimated separately.
Interval Mapping
Confounding, in these situations, is addressed by(incrementally) fixing the location of the QTL
andestimating the QTL effect between intervals of markers.These intervals of markers lead
naturally to amethod that estimates both QTL effect and the location, known as Interval
Mapping'.Detecting QTL by this type of single-markerapproach is a simple procedure that can be
accomplishedwith any standard statistical analysis softwarepackage, and has the potential to
identify numeroussignificant markers. Two important issues should beconsidered when assessing
these statistical results. Thefirst consideration is sample size. The number of individualsstudied
provides information for the estimationof phenotypic means and variances. A largesample of
individuals provides the opportunity toobserve recombinant events and to estimate
parameterswith greater accuracy and, therefore, a greaterability to detect QTL through a singlemarker test.
The second issue concerns the problem of multipletesting and arises when many markers are
investigatedthrough independent statistical tests. This problem iscoupled with the level of
statistical significance that isset by the investigator and can lead to detection offalse-positive QTL.
Typically, an investigator is willingto tolerate incorrectly detecting a QTL in, for example,5% of
cases. Therefore, given a 5% level of significance, and 100 positive, unique marker tests, five ofthe
100 markers would detect QTL incorrectly. Thisproblem can be accounted for through a
multipletest adjustment, such as Bonferroni, orTukey, that will correct the level of significance
accordingto how many independent statistical tests are made.Single-marker analyses are still used
as a means toidentify markers that are segregating with a trait.Mostof these applications deal
primarily with detectingindividual markers, rather than genomic regions, andare a quick and
efficient means to screen large populationsfor specific traits, such as disease resistance.
Typically, when investigations focus on questions ofgenomic location, then more sophisticated
methods ofQTL analysis, which rely on the estimated order ofmarkers, are used. The added
information that is gainedfrom knowing the relationships between markers isessential to QTL
methodologies that aim to locate QTL. Genetic mapsSingle-marker analyses investigate individual
markersindependently, and without reference to their positionor order.When markers are placed in
genetic(linear) map order, so that the relationships betweenmarkers are understood, the additional
genetic informationgained from knowing these relationships providesthe necessary setting to
address confoundingbetween QTL effect and location. A genetic map alsoprovides a genetic
representation of the chromosomeon which the markers and QTL reside.Pairwise information, or
recombination, is first estimatedfor all markers that are segregating asexpected, and then any
marker that is linked to anyother marker is placed in the same linkage group. Thelinear
arrangement of markers into linkage groups, orchromosomes, provides the genetic map for
locatingQTL that are relative to intervals of markers (or statisticallyrelated sets of markers). In
addition to supplying the structure in which to search for QTL,the estimated genetic map benefits
the estimation ofmissing marker information by using the surroundingmarker genotypes to infer
knowledge of the missingmarker genotypes.When using genetic maps in this way, it is importantto
distinguish between recombination eventsand genetic distance. The essential difference is
thatgenetic distances are additive, whereas recombinationunits are not because they are
probabilities andalsobecause of genetic interference. Recombination unitsand genetic distance can
be translated between byusing a map function (such as the Haldane andKosambi map functions).
The practical value of agenetic map is that the QTL can be mapped moreeasily in an interval of
defined genetic distance. Themethods for linearly ordering the molecular markersrely on
minimizing the recombination between pairsof markers.As the estimated genetic distance
173
QTL are eliminatedbecause the state of the QTL genotype and QTL numberare known before the
estimation of their effects andinteractions.Multi-trait QTL mapping can also benefit from
thecomputational framework of Sen and Churchill by simplyextending from a single phenotype to
multiple correlatedphenotypes, and by dissecting the problem in asimilar manner. The additional
information gainedfrom knowing the covariation between multiple traits isthe same as the
treatment originally detailed by Jiangand Zeng (1995), but the computational mechanics of
thesolution follow the Sen and Churchill approach.Although the Sen and Churchill view has been
shown tobenefit QTL mapping, it might have an even largerpotential for accommodating other
types of problemand data structure.
Joint trait analysis:
Several data for mapping quantitative trait loci (QTL) contain observations on multiple traits or on
one or several traits in multiple environments. With such data, we can ask questions like the
following: Does a QTL have pleiotropic effects on multiple traits? Does a QTL show
genotypeenvironment interaction? What is the natureof genetic correlationbetween differenttraits?
Is the correlation due topleiotropy or linkage in certain regions of a genome?Statistically this
involves multiple trait analysis, because the expression of a trait in different environments can be
regardedas different traits or different trait states. Presently the QTLs for various traits are
analysedseparately. This approach does not take advantage of the correlated structure of data and
has a number of disadvantages for mapping QTL and also for understanding the nature of genetic
correlations. The statistical powers of hypothesis tests tend to be lower and the sampling variances
of parameter estimation tend to be higher for separate analysis. Also, it would be difficult to test a
number of biologically interesting questions involving multiple traits by analyzing different traits
separately. Different traits are correlated genetically due to pleiotropy and linkage. With
observations on a number of polymorphic genetic markers and on a number of quantitative traits, it
is possible to dissect a portion of genetic variation and co-variation among traits by localizing and
estimating responsible QTL. It is also possible to test whether the genetic correlation is due to
pleiotropy or linkage for certain regions of a genome.
Many data in QTL studies contain multiple traits. These traits are often correlated genetically
and non-genetically (or environmentally). One way to analyze these data is to map QTL on each
trait separately. Alternatively and preferably, different traits are analyzed together to map QTL
affecting one or more traits by taking the correlated structure of data into account. There are
generally three advantages for this joint analysis. First, the joint analysis may increase statistical
power of detecting QTL. Second, the joint analysis can improve the precision of parameter
estimation. Third and probably most importantly, the joint analysis provides appropriate
procedures to test a number of biologically interesting hypotheses involving multiple traits.
The single marker regression analysis, Interval mapping, composite interval mapping and Joint
trait analysis procedures shall be utilized using the software "QTL Cartographer" which has been
provided along with the buffalo test data to run the analysis.
QTL Cartographer (http://statgen.ncsu.edu/qtlcart/index.php)
How to use the software Win QTL cartographer
Single-marker analysis
When to use?
For quick scanning of the entire genome (all chromosomes) to find best possible QTLs and identify
missing (or incorrectly formatted) data. Use single-marker analysis first to ensure your data file is
clean; then move on more sophisticated analysis methods, such as Interval Mapping and
Composite Interval Mapping.
176
How it works?
Single-marker analysis is based on the idea that if there is an association between a marker
genotype and trait value, it is likely that a QTL is close to that marker locus.
Comments
Single-marker analysis can be somewhat useful for a quick look at data, but it has been superceded
by Interval Mapping and Composite Interval Mapping. IM and CIM are more thorough and
accurate indicators of QTL. The prime value of WinQTLCart's single-marker analysis is its
identification of missing data that could affect later analysis.
Running a single-marker analysis
1. Open a mapping source data file (an .MCD file) into the WinQTLCart main window.
2. Select Method>Single-Marker Analysis. WinQTLCart analyzes the data and displays the single
marker analysis controls in the form pane. The information pane on the right includes the
analysis results.
3. Select a trait for display from the Trait Selection pull-down list. All the traits present in the file
will be on the list.
4. For each trait, the information pane on the right displays WinQTLCart's statistical summary of
the file. (You can view this summary in a larger window by clicking the Result button in the
Statistical Summary group box, just to the left of the information pane.)
5. In the Single Marker Analysis group box, click Result to view the analysis result for the selected
trait. You can change the font used by the display window to make the results easier to read.
Click the Save button in this group to save the marker analysis results to a text file.
6. In the Statistical Summary group box, click Result to view the summary in a larger display
window. Click the Save button to save the statistical results to a text file.
The statistical summary includes:Basic summary of the data, A histogram for the quantitative trait.
WinQTLCart's summary of missing individuals that should be present, as indicated by the data.
If markers show 0% data, there was likely an import problem.
Summary of marker segregation Combines LR map QTL and Q stats
7. Click the Graphic File button to save the results to a QTL mapping result file (*.QRT). You
can open this .QRT file later to view the results as a graph.
8. Click Close to end the single-marker analysis session and return to the Form View of Source
Data.
Interval Mapping
What it is?
Interval mapping (IM) is an extension of single-marker analysis. In single-marker analysis, only one
marker is used in QTL mapping but effects are underestimated and the QTL position cannot be
determined. Interval mapping provides a systematic way to scan the whole genome for evidence of
QTL. IM uses two observable flanking markers to construct an interval within which to search for
QTL. A map function (either Haldane or Kosambi) is used to translate from recombination
frequency to distance or vice visa. Then, a LOD score is calculated at each increment (walking step)
in the interval. Finally, the LOD score profile is calculated for the whole genome. When a peak has
exceeded the threshold value, we declare that a QTL have been found at that location.
When to use it?
IM is a good general standard to use for all datasets.
Use it in combination with or as part of a process including
You may wish to start with a single-marker analysis and then run IM to further refine the analysis.
177
High-level process
Here's a quick overview of how to use WinQTLCart's IM implementation. The first few times you
run this analysis, go with the WinQTLCart default values for the form's parameters. The defaults
provide the best all-around parameter settings, especially for initial analysis sessions.
1. Select the IM analysis method.
2. Select the chromosome(s) and trait(s) you want to analyze.
3. Select a threshold level to apply to the selected trait(s). Select either By manual input (the
WinQTLCart default) or By permutations (to have WinQTLCart determine an optimum
threshold). See setting the threshold level for more information on the impact of each of these
choices.
4. Click OK to start the calculations for the threshold level.
5. Following threshold calculation set IM form parameters. Select a walk speed in cM.It's
recommended you use the same walk speed for your entire dataset. Don't reset the walk speed
between runs or your results will not be comparable.
6. Click Start to begin the analysis.
Composite Interval Mapping
What it is?
Composite interval mapping (CIM) adds background loci to simple interval mapping (IM). CIM fits
parameters for a target QTL in one interval while simultaneously fitting partial regression
coefficients for "background markers" to account for variance caused by non-target QTL. "In theory,
CIM gives more power and precision than simple IM because the effects of other QTL are not
present as residual variance. Furthermore, CIM can remove the bias that would normally be caused
by QTL that are linked to the position being tested." Background markers are usually 20-40cM
apart.
High-level workflow
Here's a quick overview of how to use WinQTLCart's CIM implementation. The first few times you
run this analysis, go with the WinQTLCart default values for the form's parameters. The defaults
provide the best all-around parameter settings, especially for initial analysis sessions.
1. Select the CIM analysis method.
2. Select the chromosome(s) and trait(s) you want to analyze.
3. Select a threshold level to apply to the selected trait(s). Select either by manual input (the
WinQTLCart default) or By permutations (to have WinQTLCart determine an optimum
threshold). See the Setting the threshold level topic for more information on the impact of each
of these choices.
4. Click OK to start the calculations for the threshold level. This may take from several minutes to
several hours to run.
5. Following threshold calculation set CIM form parameters. Select a walk speed in cM.It's
recommended you use the same walk speed for your entire dataset. Don't reset the walk speed
between runs or your results will not be comparable.
6. Click Start to begin the analysis. The analysis may take from 20 minutes to several hours to run.
Multiple Interval Mapping
What it is?
Multiple interval mapping (MIM) uses multiple marker intervals simultaneously to fit multiple
putative QTL directly in the model for mapping QTL. The MIM model is based on Cockerham's
model for interpreting genetic parameters and the method of maximum likelihood for estimating
genetic parameters. MIM is well suited to the identification and estimation of genetic architecture
parameters, including the number, genomic positions, effects and interactions of significant QTL
and their contribution to the genetic variance.
178
High-level process
Here's a quick overview of how to use WinQTLCart's MIM implementation:
1. Select the MIM analysis method.
2. Pick a trait you want to work with. (MIM works with only one trait at a time.)
3. Decide if you want to create a model using WinQTLCart's default search procedures or an
4. alternative (such as Forward, Backward, or CIM).
5. Run the analysis to generate the model.
6. Refine the model as needed by editing individual cells in the model, adding or deleting QTL,
7. searching and testing QTLs or epistatics, and re-estimating. This part of the analysis can
8. iterate for as long as you want to search for QTLs.
9. Save the model as a .MDS file (or as a result file using the Refine Model function).
References
Jansen, R. C. andStam, P. 1994. High resolution of quantitativetraits into multiple loci via interval mapping.
Genetics 136,14471455.
Jansen, R. C. 1992. A general mixture model for mappingquantitative trait loci by using molecular
markers.Theor.Appl. Genet. 85, 252.
Jansen, R. C. 1995. Genetic Mapping of Quantitative Trait Loci inPlants a Novel Statistical Approach.Ph.D.
thesis,CIPdata KoninklijkeBiblotheek, Den Haag, The Netherlands.
Jansen, R. C. 1993. Interval mapping of multiple quantitative traitloci. Genetics 135, 205211.
Jiang, C. andZeng, Z.-B 1995.Multiple trait analysis of geneticmapping for quantitative trait loci.Genetics
140,1111.
Kao, C. H., Zeng, Z.-B.and Teasdale, R. D. 1999. Multiple intervalmapping for quantitative trait loci.Genetics
152,1203.
QTL CARTOGRAPHER: A Reference Manual and Tutorial forQTL Mapping.19952001. Department of
Statistics, North Carolina StateUniversity, Raleigh, North Carolina.
Zeng, Z.-B. 1993. Theoretical basis of precision mapping ofquantitative trait loci.Proc. Natl Acad. Sci. USA
90,10972.
179
9
RNA Isolation and Real time-Quantitative Polymerase Chain Reaction
Manishi Mukesh, Ankita Sharma, Kiran Thakur, Preeti Verma and Indrajit Ganguly
ICAR-National Bureau of Animal Genetic Resources, Karnal, Haryana
________________________________________________________________________________________
RNA isolation
Principle:
RNA (Ribonucleic acid) is a polymeric substance present in living cells and many viruses,
consisting of a long single-stranded chain of phosphate and ribose units with the nitrogen bases
adenine, guanine, cytosine, and uracil, which are bonded to the ribose sugar. RNA is used in all the
steps of protein synthesis in all living cells and carries the genetic information for many viruses.
The isolation of RNA with high quality is a crucial step required to perform various molecular
biology experiment. TRIzol Reagent is a ready-to-use reagent used for RNA isolation from cells and
tissues. The reagent, a mono-phasic solution of phenol and guanidine isothiocyanate, is an
improvement to the single-step RNA isolation method. During sample homogenization or lysis,
TRIZOL Reagent maintains the integrity of the RNA, while disrupting cells and dissolving cell
components. TRIzol works by maintaining RNA integrity during tissue homogenization, while at
the same time disrupting and breaking down cells and cell components. Addition of chloroform,
after the centrifugation, separates the solution into aqueous and organic phases. RNA remains only
in the aqueous phase. After transfer of the aqueous phase, the RNA is recovered by precipitation
with isopropyl alcohol.
Following protocol is use to isolate RNA from peripheral blood mono-nuclear cells (PBMC):
Note: All the steps should be done on ice and while wearing latex free gloves.
1. Thaw the trizolated frozen cells and homogenize properly using hand held homogenizer
(Labgen, Cole Parmer, USA).
2. Add 1l linear acrylamide (Ambion, USA) per ml of trizol and vortex the contents and
centrifuge at 10,000g for 10 minutes at 4oC.
3. Transfer the supernatant into a fresh 1.5ml tube and add 200 l chloroform/ml trizol. Then mix
it vigorously for 30 sec. and keep at room temperature for 2-3 min., Centrifuge the content of the
tubes again at 10,000g for 10 min. at 4C.
4. Gently aspirate the upper aqueous phase (containing RNA) without taking the contamination of
interface, and transfer to a fresh tube.
5. For denaturation, add 600l acid: phenol: chloroform (5:1) to the aqueous phase and centrifuge
at 13,000xg for 15 min at 4C.
6. Take separated upper aqueous phase carefully in a fresh tube. To this add 500l of isopropanol
and keep for 30 minutes at RT. Centrifuge the mixture at 15,000g for 15min at 4oC.
7. Discard the supernatant carefully and add 1ml of 75% ethanol to the pellet then, vortex for 1 min
to wash RNA. Centrifuge the contents at 15,000g for 5 min at 4oC and discard the supernatant.
8. Air dry the RNA pellet and dissolve in 30-50 l RNA storage solution (1mM Na-citrate). For
quantification, take O.D of RNA using Nanovue plus (GE, Healthcare).
Purification of RNA
To remove the traces of genomic DNA, RNeasy Mini kit columns (Qiagen, Germany) along with on
column digestion by RNAse free DNase enzyme (Qiagen, Germany) were used.
Principle:
The RNeasy procedure represents a well-established technology for RNA purification. This
technology combines the selective binding properties of a silica-based membrane with the speed of
180
micro spin technology. A specialized high-salt buffer system allows up to100g of RNA longer than
200 bases to bind to the RNeasy silica membrane. Ethanol is added to provide appropriate binding
conditions, and the sample is then applied to an RNeasy Mini spin column, where the total RNA
binds to the membrane and contaminants are efficiently washed away. High-quality RNA is then
eluted in 30100l water. With the RNeasy procedure, all RNA molecules longer than 200
nucleotides are purified.
Steps:
1. Adjust each sample volume to 100l with RNase-free water. Add 350l of buffer RLT and mix
well. Immediately add 250l ethanol (96-100%) to the diluted RNA, and mix well again by
pipetting.
2. Transfer the sample (700 l) to an RNeasy Mini spin column placed in a 2 ml collection tube
(supplied). Close the lid gently, and centrifuge for 15 s at 8000 x g (10,000 rpm). Discard the
flow-through.
3. To the RNeasy spin column, add 350l buffer RW1and centrifuged for 15 sec at 10,200 rpm to
wash the spin column membrane. The flow-through was discarded carefully.
4. Add 80 l DNase Mix (10 l DNase I + 70 l RDD buffer) to the spin column membrane and
place it on benchtop for 15 min.
5. Add 350l of buffer RW1 was added to the RNeasy spin column and centrifuged for 15sec at
10,200rpm. The flow-through was discarded.
6. To wash the spin column membrane, 500l RPE buffer was added to the RNeasy spin column
and centrifuged for 15sec at 10,200rpm. Further after discarding the flow-through, 500 l buffer
RPE was added again to the RNeasy spin column and centrifuged for 2 min at 10,200rpm.
7. The RNeasy spin columns were placed in a new 2 ml collection tube and the old collection tubes
were discarded with flow-through and centrifuged at full speed for 1 min.
8. To elute the RNA, 30l of RNase free water was added directly to the spin column membrane
placed in a new 1.5ml collection tube and centrifuged for 1 min at 10,200 rpm. The step was
repeated twice to get maximum and pure yield.
Evaluation of RNA quality/integrity
1. Total RNA concentration and purity was measured using a Nanovue plus (GE, Healthcare). The
purity of RNA (A 260 /A 280 ) for all samples was above 1.9.
2. RNA denaturing agarose gel was performed to check the integrity of all the extracted RNA.
The extracted RNA was stored at -800C till further use.
Real time Quantitative Polymerase Chain Reaction
The polymerase chain reaction (PCR) is a scientific technique in molecular biology to amplify a
single or a few copies of a piece of DNA across several orders of magnitude, generating thousands
to millions of copies of a particular DNA sequence. Polymerase Chain Reaction was developed in
1984 by the American biochemist, Kary Mullis. In traditional (endpoint) PCR, detection and
quantitation of the amplified sequence are performed at the end of the reaction after the last PCR
cycle, and involve post-PCR analysis such as gel electrophoresis and image analysis. In real-time
quantitative PCR (qPCR), the amount of PCR product is measured at each cycle. This ability to
monitor the reaction during its exponential phase enables users to determine the initial amount of
target with great precision.In real-time PCR, the amount of DNA is measured after each cycle by
the use of fluorescent markers that are incorporated into the PCR product. The increase in
fluorescent signal is directly proportional to the number of PCR product molecules (amplicons)
generated in the exponential phase of the reaction. Fluorescent reporters used include doublestranded DNA (dsDNA)-binding dyes, or dye molecules attached to PCR primers or probes that are
incorporated into the product during amplification. The change in fl uorescence over the course of
181
the reaction is measured by an instrument that combines thermal cycling with scanning capability.
By plotting fluorescence against the cycle number, the real-time PCR instrument generates an
amplification plot that represents the accumulation of product over the duration of the entire PCR
reaction.
Overview of real-time PCR
qPCR steps
There are three major steps that make up a qPCR reaction. Reactions are generally run for 40 cycles.
1. Denaturation- The temperature should be appropriate to the polymerase chosen (usually 95C).
The denaturation time can be increased if template GC content is high.
2. Annealing- Use appropriate temperatures based on the calculated melting temperature (Tm)
of the primers (5C below the Tmof the primer).
3. Extension- At 7072C, the activity of the DNA polymerase is optimal, and primer extension
occurs at rates of up to 100 bases per second. When an amplicon in qPCR is small, this step
is often combined with the annealing step using 60C as the temperature.
Real-time PCR fluorescence detection systems:
Several different fluorescence detection technologies can be used for real time PCR, and each has
specific assay design requirements. All are based on the generation of a fluorescent signal that is
proportional to the amount of PCR product formed. The three main fluorescence detection systems
are:
DNA-binding agents (e.g., SYBR Green and SYBR GreenER technologies)
Fluorescent primers (e.g., LUX Fluorogenic Primers and Amplifluor qPCR primers)
Fluorescent probes (e.g., TaqMan probes, Scorpions, Molecular Beacons)
DNA-binding dyes
The most common system for detection of amplified DNA is the use of intercalating dyes that
fluoresce when bound to dsDNA. SYBR Green I and SYBR GreenER technologies use this type
of detection method. The fluorescence of DNA-binding dyes significantly increases when bound to
double-stranded DNA (dsDNA). The intensity of the fluorescent signal depends on the amount of
dsDNA that is present. As dsDNA accumulates, the dye generates a signal that is proportional to
the DNA concentration and can be detected using real-time PCR instrument
Probe-based detection systems
Probe-based systems provide highly sensitive and specific detection of DNA and RNA and use the
phenomenon of Fluorescent Resonance Energy Transfer (FRET). TaqMan probes require a pair of
PCR primers in addition to a probe with both a reporter (as FAM (6-carboxyfluorescein)) and a
quencher dye ((TAMRA (6-carboxytetramethylrhodamine)) attached. The probe is designed to bind
to the sequence amplified by the primers. During qPCR, the probe is cleaved by the 5 nuclease
activity of the Taq DNA polymerase; this releases the reporter dye and generates a fluorescent
signal that increases with each cycle
Primer-based detection systems
Primer-based fluorescence detection technologies can provide highly sensitive and specific
detection of DNA and RNA. In these systems, the fluorophore is attached to a target-specific PCR
primer that increases in fluorescence when incorporated into the PCR product during amplification.
Passive reference dyes (such as ROX dye) are frequently used in real-time PCR to normalize the
fluorescent signal of reporter dyes and correct for fluctuations in fluorescence that are non-PCR
based.
182
Procedure:
Gene amplification by qPCR is perform using qPCR system [Applied Biosystem Step one plus (ABI,
California), LightCycler 480 (Roche)]. Each reaction in a 96 well plate was comprised of 10 l mix.
1. Thaw the cDNA samples and slightly vortex; add the 4 l of cDNA samples in each of the
duplicate well.
2. Prepare the master mix as per given in the table 1 including forward and reverse primer (0.4 l
each) , nuclease free water (0.2 l) and the available SYBER Green ( Roche or Thermoscientific, 5
l)
For each gene, samples to be run in duplicate (technical replicates) along with 6 point relative
standard curve plus the non-template control (NTC). The amplification conditions of the reactions
are: 10 min at 95 C, 40 cycles of 15 s at 95 C (denaturation) and 1 min at 60 C (annealing +
extension). A dissociation protocol with an incremental temperature of 95 C for 15 s plus 65 C for
15 s was used to investigate the specificity of the qPCR reaction and the presence of primer dimers.
Table 1. : Reaction mixture for qPCR (10l reaction)
Sr. No.
1
Master Mix
2
3
4
5
Constituents
cDNA
Volume
4.0 l
5.0 l
0.4 l (10pm)
0.4 l (10pm)
0.2 l
10 l
Data Analysis:
Melting curve analysis
The specificity of a real-time PCR assay is determined by the primers and reaction conditions used.
However, there is always the possibility that even well designed primers may form primer-dimers
or amplify a nonspecific product. There is also the possibility when performing qRT-PCR that the
RNA sample contains genomic DNA, which may also be amplified. The specificity of the qPCR or
qRT-PCR reaction can be confirmed using melting curve analysis. When melting curve analysis is
not possible, additional care must be used to establish that differences observed in Ct values
between reactions are valid and not due to the presence of nonspecific products.
Melting curve
Normalization methods
Variations at any stage of the process will prevent the ability of researchers to compare data and
will lead to erroneous conclusions if not factored out of the study. Sources of variability include the
nature and amount of starting sample, the RNA isolation process, reverse transcription, and lastly
183
real-time PCR amplification. Normalization is essentially the act of neutralizing the effects of
variability from these sources. While there are individual normalization strategies at each stage of
real-time PCR, some are more effective than others.
Normalizing to a reference gene The use of a normalizer gene, (also called a reference gene or
housekeeping gene) is the most thorough method of addressing almost every source of variability
in real-time PCR. However, for this method to work, the gene must be present at a consistent level
among all samples being compared. An effective normalizer gene controls for RNA quality and
quantity, differences in reverse transcription efficiency, and real-time PCR amplification efficiency.
If the reverse transcriptase transcribes or the DNA polymerase amplifies a target gene in two
samples at different rates, the normalizer transcript will reflect the variability.
General process
1. Viewing the amplification plots for the entire plate
2. Setting the baseline and threshold values
3. Using the methods detailed in this section to determine results.
Relative Quantification
Relative quantification describes the change in expression of the target gene in a test sample relative
to a calibrator sample. The calibrator sample can be an untreated control or a sample at time zero in
a time-course study (Livak and Schmittgen, 2001). Relative quantification provides accurate
comparison between the initial levels of template in each sample.
Calculation methods for relative quantification
Relative standard curve method- Running the target and endogenous control amplifications in
separate tubes and using the relative standard curve method of analysis requires the least
amount of optimization and validation.
Comparative Ct method (Ct)- to use the comparative Ct method, a validation experiment
must be run to show the efficiencies of the target and endogenous control amplification should
be optimal. This methods contain double normalization first with endogenous control and then
with calibrator sample.
Formula used for calculation:
2^ (-(Ct) (Livak and Schmittgen, 2001)
Steps:
1. Prepare the set up for the plate and start the run.
2. Collect the Ct values and calculate the average of duplicates for each sample..
3. Determine the Ct by subtracting the average Ct of your endogenous control from the average
of your target.
4. Determine the Ct by subtracting the Ct of your calibrator from the Ct of your test sample
or treated sample.
5. The calculate the fold change ratio with the formula- 2^ (-(Ct) .
References
Rio D.C., Ares M. Jr., Hannon G.J. and Nilsen T.W. 2010. Purification of RNA using TRIzol (TRI reagent).Cold
Spring Harb Protoc. (6). doi: 10.1101/pdb.prot5439.
184
10
Expression Microarray Methodology Using Agilent Whole Genome Chip
Manishi Mukesh, Ankita Sharma, Monika Sodhi
ICAR- National Bureau of Animal Genetic Resources, Karnal, Haryana
________________________________________________________________________________________
Step-1: Sample preparation
Serial Dilution
First
1:20
1:20
1:20
1:20
1:20
1:20
Second
third
1:25
1:25
1:25
1:25
1:25
1:25
1:20
1:20
1:20
1:20
1:10
1:20
Fourth
1:10
1:4
1:2
2
2
2
2
2
2
For example, to prepare the Agilent One-Color Spike Mix make dilution appropriate for 25 ng of
total RNA starting sample:
1. Create the First Dilution:
a. Label a new sterile 1.5 mL microcentrifuge tube Spike Mix First Dilution.
b. Mix the thawed Spike Mix vigorously on a vortex mixer.
c. Heat at 37C in a circulating water bath for 5 minutes.
d. Mix the Spike Mix tube vigorously again on a vortex mixer.
e. Spin briefly in a centrifuge to separate contents to the bottom of the tube.
f. Into the First Dilution tube, put 2 L of Spike Mix stock.
g. Add 38 L of Dilution Buffer provided in the Spike-In kit (1:20).
h. Mix thoroughly on a vortex mixer and spin down quickly to collect all of the liquid at the
bottom of the tube. This tube contains the First Dilution.
2. Create the Second Dilution:
a. Label a new sterile 1.5 mL microcentrifuge tube Spike Mix Second Dilution.
b. Into the Second Dilution tube, put 2 L of First Dilution.
c. Add 48 L of Dilution Buffer (1:25).
d. Mix thoroughly on a vortex mixer and spin down quickly to collect all of the liquid at the
bottom of the tube. This tube contains the Second Dilution.
3. Create the Third Dilution:
a. Label a new sterile 1.5 mL microcentrifuge tube Spike Mix Third Dilution.
b. Into the Third Dilution tube, put 2 L of Second Dilution.
c. Add 38 L of Dilution Buffer (1:20).
186
d. Mix thoroughly on a vortex mixer and spin down quickly to collect all the liquid at the
bottom of the tube. This tube contains the Third Dilution.
4. Create the Fourth Dilution:
a. Label a new sterile 1.5 mL microcentrifuge tube Spike Mix Fourth Dilution.
b. Into the Fourth Dilution tube, add 10 L of Third Dilution to 30 L of Dilution Buffer for the
Fourth Dilution (1:4).
c. Mix thoroughly on a vortex mixer and spin down quickly to collect all of the liquid at the
bottom of the tube. This tube contains the Fourth Dilution (now at a 40,000-fold final
dilution).
d. Add 2 L of Fourth Dilution to 25 ng of sample total RNA as listed in Table 1 and continue
with cyanine 3 labeling using the Agilent Low Input Quick Amp Kit protocol as described in
Step 2.
Storage of Spike Mix dilutions
Store the Agilent RNA Spike-In Kit, One-Color at 70C to 80C in a non-defrosting freezer for up
to 1 year from the date of receipt. The first dilution of the Agilent One-Color Spike Mix positive
controls can be stored up to 2 months in a non-defrosting freezer at 70C to 80C and
freeze/thawed up to eight times. After use, discard the second, third and fourth dilution tubes.
Step-3: Prepare labeling reaction
For each assay, make sure that the volume of the total RNA sample plus diluted RNA spike-in
controls does not exceed 3.5 L. Because the 1x reaction involves volumes of less than 1 L, prepare
components in a master mix and divide into the individual assay tubes in volumes >1 L. When
preparing 4 samples, use the 5x master mix. When preparing 8 samples, use the 10x master mix.
1. Add 200 ng of total RNA to a 1.5-mL microcentrifuge tube in a final volume of 1.5 L. (from
working RNA concentrations of 100 ng/L).
2. Add 2 L of diluted Spike Mix to each tube. Each tube now contains a total volume of 3.5 L.
3. Prepare and add T7 Promoter Primer:
a. Mix the T7 Promoter Primer and water to prepare the T7 Promoter Primer Master Mix as
listed in Table 2.
Table 2. T7 Promoter Primer Mix
Component
T7 Promoter Primer (green cap)
Nuclease-free water (white cap)
Total Volume
b.
Add 1.8 L of T7 Promoter Primer Mix to the tube that contains 3.5 L of total RNA and
diluted RNA spike-in controls. Each tube now contains a total volume of 5.3 L.
c. Denature the primer and the template by incubating the reaction at 65C in a circulating
water bath for 10 minutes.
d. Place the reactions on ice and incubate for 5 minutes.
4 Prewarm the 5X first strand buffer at 80C for 3 to 4 minutes to ensure adequate resuspensions
of the buffer components. For optimal resuspension, briefly mix on a vortex mixer and spin the
tube in a microcentrifuge to drive down the contents from the tube walls. Keep at room
temperature until needed.
187
b. Briefly spin each sample tube in a microcentrifuge to drive down the contents from the tube
walls and the lid.
c. Add 4.7 L of cDNA Master Mix to each sample tube and mix by pipetting up and down.
Each tube now contains a total volume of 10 L.
d. Incubate samples at 40C in a circulating water bath for 2 hours.
e. Move samples to a 70C circulating water bath and incubate for 15 minutes.
f. Move samples to ice. Incubate for 5 minutes.
g. Spin samples briefly in a microcentrifuge to drive down tube contents from the tube walls
and lid.
Stopping Point. If you do not immediately continue to the next step, store the samples at 80C.
6 Prepare and add Transcription Master Mix:
a. Immediately prior to use, gently mix the components listed in Table 4 in the order indicated for
the Transcription Master Mix by pipetting at RT. The T7 RNA polymerase blend is a blend of
enzymes. Keep the T7 RNA polymerase on ice and add to the Transcription master mix.
Table 4. Transcription Master Mix
Component
Nuclease-free water (white cap)
5X Transcription Buffer (blue cap)
0.1 M DTT (white cap)
NTP mix (blue cap)
T7 RNA Polymerase Blend (red cap)
Cyanine 3-CTP
b. Add 6 L of Transcription Master Mix to each sample tube. Gently mix by pipetting. Each tube
now contains a total volume of 16 L.
c. Incubate samples in a circulating water bath at 40C for 2 hours.
Stopping Point. If you do not immediately continue to the next step, store the samples at 80C.
188
Yield (g)
5
3.75
1.65
0.825
Step- 6: Hybridization
Prepare the 10X Blocking Agent
1. Add 500 L of nuclease-free water to the vial containing lyophilized 10X Blocking Agent
supplied with the Agilent Gene Expression Hybridization Kit, or add 1250 L of nuclease-free
water to the vial containing lyophilized large volume 10X Blocking Agent (Agilent p/n 51885281).
2. Mix by gently vortexing. If the pellet does not go into solution completely, heat the mix for 4
to 5 minutes at 37C.
3. Drive down any material adhering to the tube walls or cap by centrifuging for 5 to 10 seconds.
10X Blocking Agent can be prepared in advance and stored at 20C for up to 2 months. After
thawing, repeat the vortexing and centrifugation procedures before use.
Prepare hybridization samples
Add 500 L of nuclease-free water to the vial containing lyophilized 10X Blocking Agent supplied
with
1. Equilibrate water bath to 60C.
2. For each microarray, add each of the components as indicated in the tables 6 below to a 1.5 mL
nuclease-free microfuge tube:
190
CAUTION: Do not incubate sample in the next step for more than 30 minutes. Cooling on ice and
adding the 2x Hybridization Buffer will stop the fragmentation reaction.
4. Incubate at 60C for exactly 30 minutes to fragment RNA.
5. Immediately cool on ice for one minute.
6. Add 2x GEx Hybridization Buffer HI-RPM to the 4-pack microarray format at the appropriate
volume to stop the fragmentation reaction as mentioned in Table 7.
Table 7: Hybridization mix
Components
cRNA from Fragmentation Mix
2x GEx Hybridization Buffer HI-RPM
7. Mix well by careful pipetting. Take care to avoid introducing bubbles. Do not mix on a vortex
mixer; mixing on a vortex mixer introduces bubbles.
8. Spin for 1 minute at room temperature at 13,000 rpm in a microcentrifuge to drive the sample
off the walls and lid and to aid in bubble reduction.
9. Use immediately. Do not store.
10. Place sample on ice and load onto the array as soon as possible.
Prepare the hybridization assembly
1. Load a clean gasket slide into the Agilent SureHyb chamber base with the label facing up and
aligned with the rectangular section of the chamber base. Ensure that the gasket slide is flush
with the chamber base and is not ajar.
2. Slowly dispense the volume of hybridization sample (see Table 8) onto the gasket well in a
drag and dispense manner.
Table 8: Hybridization Sample
Components
Volume Prepared
Volume to Hybridize
3. Slowly place an array active side down onto the SureHyb gasket slide, so that the
Agilent-labeled barcode is facing down and the numeric barcode is facing up. Make sure
the sandwich-pair is properly aligned.
4. Place the SureHyb chamber cover onto the sandwiched slides and slide the clamp assembly
onto both pieces.
191
Wash Buffer
GE Wash Buffer 1
Temperature
Room temperature
Time
Disassembly
1st wash
2nd wash
2
3
GE Wash Buffer 1
GE Wash Buffer 2
Room temperature
Elevated
temperature
1 minute
5 minute
192
11
Genotype and Phenotype Association Studies in Livestock
S P Dixit, Anurodh Sharma and Jayakumar Sivalingam
ICAR- National Bureau of Animal Genetic Resources, Karnal (Haryana)
________________________________________________________________________________________
Input file preparation: The genotype information available in the excel sheet may be directly
exported into the SAS software and may be readily used as an input file.
Gene and genotype frequencies in animals: The gene and genotype frequencies can be estimated
by using the SAS software.
Genotype and association: Statistical analysis can be carried out using PROC GLM of SAS version
9.3 to find out the association between the genotypes of the polymorphic SNPs of the genes studied
with the traits of interest. Duncans Multiple Range Test (DMRT) as modified by Kramer (1957) was
used for testing the differences among least-squares means.
proc glm data=datasetlsmeans;
class fixed effects;
model dependent variables = fixed effects / solution;
lsmeans fixed effects;
manova h=_all_ / printe printh;
run;
Haplotype construction and association: The linkage disequilibrium analysis was carried out by
using PROC Allele procedure of SAS software, version 9.3 (2011). The SNPs that were found to be
in the linkage disequilibrium were further used for the construction of the haplotypes using
arlequin version 3.0 software (Excoffier et al 2005). Then the association of haplotypes and trait of
interest can be studied using SAS version 9.3. The command that has been used for the genotype
and association study may be used for haplotype and association study, by using the haplotypes
thats been constructed instead of genotypes.
Whole genome association study: The whole genome association study can be carried out using
the genotype information of the SNPs and the trait of interest using the PLINK software.
193