You are on page 1of 5

Assignment

Artificial intelligence


Names:
Muhammad Waqas khan 15479
Arslan Hamid 15456
Topic: 5 Research Reviews on Artificial Intelligence
Date: August 18, 2014
ThisAssignment reports on-going research projects in Artificial
Intelligence (AI).
1. Introduction
The post-genomic era has been characterized by two different scenarios:
on the one hand, the huge amount of available biological data sets all
over the world requires suitable tools and methods both for modeling
biological processes and analyzing biological sequences; on the other,
many new computational models and paradigms inspired and developed
as metaphors of biological systems are ready to be applied in the context
of computer science. Hence, the need to either develop new models or
exploit and analyze the available genomes is considered a priority task
for the bioinformatics research community.There are at least 26 billion
base pairs (bp) representing the various genomes available in the server
of the National Center for Biotechnology Information (NCBI). Besides
the human genome with about 3 billion bp, many other species have
their complete genome available there. Cohen (Cohen, 2004) explained
the needs of biologists to utilize and help interpret the vast amounts of
data that are constantly being gathered in genomic research. He also
pointed out the basic concepts in molecular cell biology, and outlined
the nature of the existing molecules are expressed. (2) Aligning the
sequences in DNA sequences in order to check for similarity or
differences. The alignment procedure could be performed locally (DNA
fragment level) or globally (genome level).
Review 1

DNA sequence reconstruction
Sequencing scheme
A SBH chip consists of a fixed numbers of features. Each feature can
accommodate one probe. A probe is a string of symbols from the
alphabet S=Z{A,C,G,T,-}, wheredenotes the blank symbol. SBH
provides information about k-mers present in the DNA string, but does
not provide information about the positions of the k-mers.Moreover, SP
is said to be the spectrum of sequence SEQ if SP is a multi-set of all k-
long substrings of SEQ, assuming that the number of occurrences of
each k-mer is also known. For example, SEQ ZATGCAGGTCC and
SPZ {ATG,AGC,CAG,GCA,CGT,GTC,TCC,TGC}.
A sequencing algorithm is an algorithm that, given a multi-set of
k-mers SPZ{SP1,.,SPnKC1}, decides if the spectrum defines a unique
DNA sequence SEQ, and, if yes, reconstruct the sequence SEQ from its
spectrum.2.2. Traditional solutions for the SBH problem The
fundamental computational problem in SBH is thereconstruction of a
sequence from its spectrum, the list of all k-mers that are included in the
sequence along with their multiplicities. The traditional solutions for the
SBH problem, which are briefly discussed in this paper, are the
Hamiltonian path, also known as the Traveling Salesman Problem
(TSP), the Eulerian path problem (EPP), and the positional SBH
(PSBH).

2. DNA sequencing with artificial intelligence
The field of molecular biology is described as tailor-made for AI and
machine learning approaches (Shavlik, Hunter, & Searls, 1995). This is
due to the nature of AI approaches that performs well in domains where
there is an immense amount of data but little theory. Since the
introduction of AI to this field,numerous algorithms have been designed
and applied to study different data sets. Most of these researches
compare a new method with the traditional ones, affirm the effectiveness
and efficiencies of their methods in particular data sets. Sequencing of
DNA is among the most important tasks in molecular biology. DNA
chips are considered to be a more rapid alternative to more common gel-
based methods of possible probes eight nucleotides in length (octamers)
generating 65,536 unique probes spaced on a 1.6 cm2 array ( Fodor,
Read, Pirrung, Stryer, Lu and Solas, 1991). For example, consider the
DNA target sequence ATTGATTCG, with length N Z9 and a DNA chip
with all possible probes of length n Z


A DNA chip with probe length
N will have 4n positions in the grid on the DNA chip. Thus, for a probe
length 4 there exist 256 grid positions, each associated with a unique
probe sequence. All possible 4-nucleotide probes would exist in the set:
{AAAA, AAAT,., and TTTT}

Applications of artificial intelligence in sequencing
by hybridization
Multi-agent systems, in which several agents coordinate their knowledge
and activities, offer a natural way to view and characterize intelligent
systems. Intelligence and interaction are deeply inevitably coupled, and
multi-agent systems reflect this insight. In this type of environments, an
application is usually distributed across multiple agents capable of
intelligent coordination. Recently a distributed multi-casts ant system,
called DIMANTS, was designed by Bertelle, Dutot, Guinand, and Olivier
(2002) for the SBH problem. It is a heuristic approach based on social
insects organizations. The proposed a model consists of reactive agents
which try to solve the problem by moving over the vertices and edges of
the new graph model called SBH-graph. The associated computational
problem consisted in rebuilding the original sequence from the SBH-
graph. The agents, in DIMANTS, interact by chemical messages, which
are pieces of paths deposited in the vertices.

3. Intelligent systems in bioinformatics
In the post-genome era, research in bioinformatics has been
overwhelmed by the experimental data (Tan & Gilbert, 2003). The
complexity of biological data ranges from simple strings (nucleotides
and amino acids sequences) to complex graphs (biochemical networks;
from 1D (sequence data) to 3D (protein and RNA structures).
Considering the amount and
complexity of the data, it is becoming impossible for an expert to
compute and compare the entries with the current databases. Thus AI
and machine learning techniques have been used to analyze biological
data sets in order to discover and mine the patterns and similarities
existing in various databases. Tan and Gilbert (2003) performed an
empirical comparison
of rule-based learning systems (decision trees, one rule, decision rules),
statistical learning systems (nave Bayes,instance based, SVM and
artificial neural networks) and ensemble methods (stacking, bagging and
boosting) on some available data of E. coli, Yeast, Promoters, and HIV.
They have reported a comparison of different supervised machine
learning
techniques in classifying biological data. They also confirmed that none
of the single methods could consistently perform well over all the data
sets. Their work also showed that combined methods perform better than
the individual ones.Kasturi and Acharya (2004) proposed an
unsupervised machine-learning algorithm that identifies clusters of
genes using combined data(promoter sequences of genes/DNA binding
motifs, gene
ontologies, and location data). The outcome of their experiments showed
that the combined learning approachidentified correlated genes
effectively.

You might also like