You are on page 1of 21

RNAseq

Jonathan Pevsner

Outline
RNAseq: design principles

How RNAseq works


Data analysis

RNAseq: experimental principles


Experimental design principles: randomization, replication

RNAseq specific effects


Sequencing depth

Paired-end sequencing
Biases of NGS

Sample size calculation


Validation
Fang Z, Cui X. Brief. Bioinf. (2011) 12:280

Experimental design for RNAseq


Disc1
k/o

wt

barcode 1

barcode 5

barcode 2

barcode 6
pool, sequence

barcode 3

barcode 7

barcode 4

barcode 8

barcode 9

demultiplex

Experimental design for RNAseq


Disc1
k/o

wt

plus
virus

wt

Pcm1
k/o

wt

date of RNA
isolation: best
to coordinate
sample size
can vary, but try
for n>3
barcoding
strategy
depth of
coverage

Outline
RNAseq: design principles

How RNAseq works


Data analysis

Data generation
mRNA or RNA

Data analysis

raw reads

remove
contaminant DNA

remove artifacts

fragment RNA
correct errors
reverse transcribe
to cDNA

assemble
transcripts

Ligate sequence
adaptors

Post-process
transcripts

Select size range

Sequence cDNA
ends
Martin JA Wang Z NRG (2011) 12:671

Align reads to
transcripts to
quantify
expression

RNAseq single reads: map to genomic DNA,


detect alternative splicing events

Ozsolak F, Milos PM NRG (2011) 12:87

RNAseq paired-end reads: map to genomic DNA,


get better map of transcript structure
and of chimeric sequences

Ozsolak F, Milos PM NRG (2011) 12:87

Reference-based transcriptome assembly


(a) Splice-align reads to the genome

Martin JA Wang Z NRG (2011) 12:671

Reference-based transcriptome assembly


(b) Build graph of alternative splicing events

Martin JA Wang Z NRG (2011) 12:671

Reference-based transcriptome assembly


(c) Traverse graph to assemble variants

Martin JA Wang Z NRG (2011) 12:671

Reference-based transcriptome assembly


(d) Assemble isoforms

Martin JA Wang Z NRG (2011) 12:671

Quality metrics for assessing transcriptome assemblies


Accuracy: % of correctly assembled bases using references
Completeness: % expressed reference transcripts covered by
all the assembled transcripts
Contiguity: % of expressed reference transcripts covered by a
single, longest-assembled transcript
Chimerism: % of chimeras due to misassemblies (spans two
or more different reference genes)
Variant resolution: % of transcript variants resolved

Martin JA Wang Z NRG (2011) 12:671

Emerging technologies for single-cell gene


expression profiling

Ozsolak F, Milos PM NRG (2011) 12:87

Outline
RNAseq: design principles

How RNAseq works


Data analysis

Bowtie: short read aligner

TopHat: align RNAseq reads to


genome, find splice sites
Cufflinks: assembles
transcripts, finds differentially
expressed transcripts

CummRbund: explore data


Trapnell C, Nature Prot. (2012) 7:563

Galaxy for RNAseq analysis:


web-based collection of tools for bioinformatics analysis

Galaxy for RNAseq analysis

Tools panel includes Tophat,


Cufflinks for RNAseq analysis

Galaxy for RNAseq analysis

History panel shows data files for


analysis (can be included in
transparent, reproducible
workflows)

Galaxy for RNAseq analysis

Display panel shows data


files; here fastq files from
RNAseq with raw sequence
data and base quality scores

You might also like