You are on page 1of 5

EFFICIENT GENOMIC SEQUENCING THROUGH HARDWARE

ACCELERATION

ABSTRACT:-Sequence alignment is an important component in bioinformatics. Different


protein sequence and DNA sequence aligners are used for this purpose. Accelerators are
becoming increasingly commonplace in delivering high performance computing. In
this paper we reviewed the literature and find the hardware accelerated platform like GPUs,
FPGAs, Big data that is used for efficient, scalable and for faster DNA or protein sequencing

Introduction

DNA sequencing is the process of determining the CLASSIFICATION OF LITERATURE


accurate order nucleotide along chromosome and
genomes.DNA sequencing is essential for a deep In this section we classified the literature based
understanding of human genetics[2],Next on hardware platform.we analyzed and
Generation Sequencing also known as high compared some of the approaches used and
throughput Sequencing, NGS has changed the proposed by authors in the research papers to
study of Genomics, NGS allows to sequence DNA accelerate DNA and Protein Sequencing.
more quickly and inexpensive than previous
Sanger, So this fast reduction in the cost of DNA
Sequencing is making it an accessible method for
Using GPU’s
researchers to use it at level that was never before Graphical Processing Unit is a computer part
possible.the sequencer can produce millions of that makes a computer work and tick over
short reads in parallel, small fragments of DNA speedily. GPU is used to tackle computational
called short reads the NGS output so called Short challenges, and commonly used in many
reads. NGS platform are able to generate large bioinformatics tools to accelerate the
amounts of DNA Sequencing date ranges up to computationally intensive algorithms and
hundreds of GB's, However, Due to large volume of improve their performance.
data it would take long time to compute, for this
purpose there is an urgent need to develop In paper [21] the author presents a Graphical
scalable high performance computational Processing Units(GPUs) accelerated Smith-
solutions to address these challenges. For this Waterman for protein sequence
purpose the author used CPU, GPU’s, FPGA’s, big alignment.Smith-waterman(S-W) set of rules is a
data techniques, cluster based and distributed superior sequence alignment method for organic
based computing to solve these challenges. databases, however its computational
complexity makes it too sluggish for realistic GCUPS.the inter-task implementation with
purposes. Heuristics primarily based reorganized dataset is 18.19x faster than with
approximate techniques like FASTA and BLAST originaldataset.Reorganizing the dataset makes
offer quicker solutions but at the cost of intra-task implementation 2.19x faster.With
decreased accuracy. Also the increasing extent original dataset, it is better to use intra-task
and ranging lengths of sequences require implementation.The intra-task implementation
execution proficient rebuilding of these with original dataset is 7.5x faster than the inter-
databases. Subsequently to come up with a task implementation with original dataset. The
precise and quick arrangement it’s far fairly intra-task implementation has the highest
preferred to hurry up with the S-W set of rules, throughput as high as 23.56 GCUPS
an elite protein arrangement for Graphics
processing Units (GPUs). The brand new In paper [23] the author presents, to accelerate
the PFA on GPU to improve the performance of
implementation improves overall performance
GATK HC. This paper is extended version of
by advancing the database association also; work published in [22].After executing all the
decreasing the quantity of memory gets to wipe implementations and comparing performance using
out transfer speed bottlenecks. The different real data sets, Pair HMMs Forward
implementation is referred to as Database algorithm achieved a speed of upto5.47x over
Optimized Protein alignment (DOPA) and it existing GPU-based implementations. The naïve
intra task implementation is the fastest over all the
achieves a performance of 21.4 Giga cell updates
GPU-base implementation when the number of
per second which is 1.13 times higher than the read haplotypes pairs in each chunk is small.The
quickest GPU implementation up to now. inter-task implementation can’t use the GPU
resources efficiently when the size of the datasets
In paper [22] the authors evaluate two different reduced to 200 pairs.
implementation methods to accelerate the pair-
HMMs forward algorithm using different In paper[24] theauthor presents, a GPU
acceleration of the GATK haplotype Caller, and
datasets to compare the performance. Inter-task
load-balanced multi-process optimization that
parallelization and intra-task parallelization are divides the genome into regions of different
the way to accelerate PFA on GPU. In inter-task sizes to ensure a more equaldistribution of
parallelization the whole processing is mapped computation load between different processes
to a single thread. Several copies of algorithm and to address its implementation limitation
running in parallel. Each thread implements the which forces the sequential executionof the
algorithm independently. In intra-task program and prevents effective utilization of
hardware acceleration.In single-threaded mode,
parallelization the algorithm is mapped to a
the GPU-based GATK HC is 1.71xfaster than the
single thread block. This method requires a baseline implementation and 1.21x faster
whole block of threads to compute a single copy thanthe vectorized GATK HC implementation. In
of algorithm. This reduces the no of copies of the multi-threaded mode, the GATK HC workflow
algorithm that can executed in parallel on the limits the performance improvement achievable
GPUs compared to the inter-task by accelerating the pair-HMMs kernel. In multi-
process mode, the GPU-based GATK HC
parallelization.Operation of PFA are in the
implementation is the fastest. In addition, the
floating-point domain.The GPU implementations GPU-based implementation achieves up to 2.04x
with reorganized dataset achieve larger and 1.40x speedup in load-balanced multi-
throughput than with original dataset.The inter- process mode over the baseline implementation
task implementation with reorganized dataset and vectorized GATK HC implementation in non-
achieved the biggest throughput, which is 12.79 load-balanced multi-process mode.
the design on the convey supercomputing
platform.A number of architectural features
In paper[25] the authorpresents, a high
have been implemented to improve the
performance GPU accelerated set of API(GASAL)
performance of the design, such as early exit
for pairwise sequence alignment of DNA and
points to increase the utilization of the array for
RNA sequences. The GASAL APIs provide
small sequence sizes, as well as on-chip
accelerated kernels for local, global as well as
buffering to enable the processing of long
semi-global alignment, allowing the computation
sequences effectively. FPGA implementation of
of the alignment score, and optionally the start
the pair-HMMs forward algorithm is upto67x
and end positions of the alignment.this library
faster.
contain functions that enable fast alignment of
sequences and can be easily integrated in In paper [27] the authors proposes the first
computer programs developed for NGS data accelerated implementation of BWA-MEM.
analysis. One to one as well as all-to-all and one- BWA-MEMis a popular genome sequence
to-many- pairwise alignment can be performed. algorithm widely used in NGS genomics pipeline,
it is the latest and generally recommended for
The sequences are first packed into unsigned 32-
high quality queries as it is faster and more
bit integer, followed by performing the accurate[28] A characteristics work load of this
alignment. The total execution time is the sum of algorithm is to align millions of DNA reads
data packing, data copying and alignment kernel against a reference genome, the BWA-MEM
times. Without computing start-position GASAL processes reads in batches in which the kernel
performs the alignment in much less time as processes a batch of reads The BWA-MEM
compared to SSW and NVBIO. With start- algorithm alignment procedure generally
consists of three main kernels 1-SMEM
position computing the speed of GASAL and
Generation 2-Seed Extension 3-Output
NVBIO is nearly the same. Hence GASAL is 2-4x Generation[29]. BWA-MEM implements multi-
faster and gives more speedup over state-of-the- threaded execution of all three kernels. The
art libraries making it a good choice for authors propose and evaluate a number of
sequence alignment FPGA-based systolic array architectures,
presenting optimizations generally applicable to
variable length Smith-Waterman execution.By
optimizing one of the three main kernels of BWA-
MEM, 45% increase in application performance
was observed.By implementing the seed extension
kernel as a systolic array, 3x faster performance is
achieved than software only execution.The
USING FPGAs dynamic programming type of algorithm used in
seed extension kernel is a much better fit for
Field Programmable Gate Arrays(FPGAs) with
execution on an FPGA
their flexible and reprogrammable substrate, are
a natural fit for a computationally intensive In paper [30] the authors proposes highly
algorithm. optimized Smith-Waterman implementation on
Intel FPGAs using OpenCL. This implementation
In paper [26] the authors proposed a novel is both faster and more efficient than other
systolic array design to accelerate the pair- current Smith-Waterman implementations.
HMMs forward algorithm on FPGAs, analyze a Compared to using normal hardware description
number of optimization techniques to improve languages, using a high-level language such as
performance and presents an implementation of OpenCL has two main benefits. First, OpenCL is
more expressive (our Processing Element kernel In paper [] the authors a new Spartk based
code in OpenCL requires 90 lines of code framework called SparkGA that allow a DNA
compared to about 450 lines of VHDL). pipeline to run efficiently & cost effectively on a
scalable computational cluster. SparkGA uses in-
Second,OpenCL development has more
memory computation capabilities to improve
convenient testing and de- bugging the performance of the framework. SparkGA is
capabilities.obtaining a theoretical performance of about 71% faster than other state of the art
214 GCUPS. solutions, In recent years a number of bigdata
framework have emerged to efficiently manage and
In paper [31] the authors process large datasets in a easy way. SparkGA
addresses the problem by implementing a memory
effiecient load balancing step, SparkGA runs the
pipeline in three differenct steps: DNA mapping &
Using Big Data static load balancing, dynamic load balancing &
SAM to BAM and marking of duplicates and
variant discovery.Ensuring Scalability, allow to run
Big data is a term used to refer to data sets that
the framework on low cost nodes upto 16 GB of
are too large or complex for traditional data memory, achieving accuracy is 99.9981%. When
processing application software to deal with. deployed on a 20 node IBM power* cluster, Spark
GA can complete the GATK best practice pipeline
In paper[] the authors using the apache spark in 90 minutes. It is 71% faster than the state-of-the-
big data framework. The simultaneous art solutions.
multithreading improves the performance of
BWA for all systems, increasing performance by
up to 87% for Spark. Spark has up to 27% better
performance for high system utilization. Spark is
Conclusion
able to sustain high performance when the In this paper, we studied various Papers related to
system is over-utilized. Spark versions divide sequence alignment algorithms. The S-W algorithm
the input dataset of short reads into a number of proved to be the most accurate one to carry out
smaller files referred to as chunks.Spark system sequence alignment; however it needs an
is more capable of handling higher number of exceptionally long time to complete making it the
threads. most suitable alternative for hardware acceleration.

REFERENCES

In paper[] the authors propose StreamBWA, a


new framework that allows the BWA mem
program to run on a cluster in a distributed
fashion, at the same time while the input data is
[21] LaiqHasanMarijnKentieZaid Al-Ars
being streamed into the cluster.this streaming
distributed approach is approximately 2x faster 33rd Annual International Conference of
than the non-streaming approach.compared to the IEEE EMBS
SparkBWA, StreamBWA is almost 5x faster. Our
framework consist of two utilites. One is the Boston, Massachusetts USA, August 30 -
StreamBWA and the other is the chunker September 3, 2011
program.
[22] ShanshanRen,KoenBertels,Zaid Al-Ars
2016 IEEE International Conference on
Bioinformatics and Biomedicine (BIBM)

[23] ShanshanRen, KoenBertels and Zaid Al-


Ars

2018 Evolutionary Bioinformatics Volume


14: 1–12

[25] Nauman Ahmed, Hamid Mushtaq,


KoenBertels and Zaid Al-Ars

2017 IEEE International Conference on


Bioinformatics and Biomedicine (BIBM)

[26] ShanshanRen, Vlad-MihaiSima, Zaid AI-


Ars

2015 IEEE International Conference on


Bioinformatics and Biomedicine (BIBM)

[27]

[28] H. Li. Burrows-Wheeler Aligner.http://bio-


bwa.sourceforge.net/.Accessed: 2014-11-
04.

[29] Ernst Joachim Houtgast, Vlad-MihaiSima,


KoenBertels and Zaid Al-Ars

An FPGA-Based Systolic Array to Accelerate


the BWA-MEM Genomic Mapping
Algorithm

[30] Ernst Joachim Houtgast, Vlad-MihaiSima,


Zaid Al-Ars

2017 IEEE 17th International Conference


on Bioinformatics and Bioengineering

You might also like