You are on page 1of 31

talks

Soma%c variant discovery

Overview of the complete pipeline


Ul%mate goal is to iden%fy soma%c varia%on in sequencing data

RAW READS
SEQUENCE DATA

processing
+ analysis

GENOMIC
VARIATION
Bedrock of soma%c variant discovery: T/N pair comparisons

TUMOR
NORMAL
Table of Contents

1. Types of soma%c variants


2. Cancer-specic challenges
3. Pipeline for soma%c SNV discovery

PART 1:
TYPES OF SOMATIC VARIANTS
Types of variants

Meyerson, Gabriel and Getz. Nat. Rev. Genet. (2010)


Typical number of soma%c muta%ons
Spectral karyotypes (SKY)
Normal Cell Cancer Cell

In a genome of 3 x 109 bases:


1000s to 10,000s soma%c single nucleo%de varia%ons (sSNVs)
100s to 1,000s soma%c small inser%ons and dele%ons (sINDELs)
100s to 1,000s soma%c structural varia%ons (sSVs)
100s to 1,000s soma%c copy number altera%ons (sCNAs)

h[p://www.path.cam.ac.uk/~pawesh/BreastCellLineDescrip%ons/HCC1954.html
Various pa[erns of rearrangements

CRC-0002 CRC-0003 CRC-0004

CRC-0005 CRC-0006 CRC-0007

CRC-0008 CRC-0009 CRC-0010


# of rearrangements Some cancer types are associated with certain pa[erns

CLL Colon GBM Melanoma Mul%ple Myeloma Ovarian Prostate


Data signature of rearrangements

How reads map according to expected chromosomal layout:

chr1q21 chr1q22 chr2q11 chr2q12

"weird" pairs

Probable real physical layout:

chr1q21 chr2q12 chr2q11 chr1q22


What the data looks like around a rearrangement event

Chr1 Chr6
TUMOR
NORMAL
NORMAL Not to be confused with mere germline dele%ons

10.7 Kb
TUMOR
na12878 (trio)
Beware ar%facts from mapping anomalies

Chr18 Chr6
TUMOR
NORMAL
Beware also ar%facts from homology

16.80 Mb 25 Kb 16.81 Mb

to chr1 : 20,840,192

to chr1 : 144,004,088
TUMOR

pairmates are
from mul%ple
distant loca%ons
NORMAL

pairmates are
oriented randomly
Types of variants

Meyerson, Gabriel and Getz. Nat. Rev. Genet. (2010)


PART 2:
CANCER-SPECIFIC CHALLENGES
Remember T/N pair comparisons

TUMOR
NORMAL
Two main types of false posi%ves

1. NO EVENT, JUST NOISE 2. GERMLINE EVENT (in T+N)


TUMOR

TUMOR
NORMAL

NORMAL
At risk: Every base At risk: ~1000 germline / Mb (known)
Source: Misread bases Source: Low coverage in normal
Sequencing ar%facts
Misaligned reads

FPTotal = FPNoEvent + FPGermline


Addi%onal source of false nega%ves: Tumor in Normal

TUMOR
NORMAL

Due to contamina%on of the normal sample by tumor cells


Appears to be a germline event and leads to rejec%on of real variant
Now, throw in the variant allelic frac%on problem

33% N 67% T

T
N
T

(1) Purity = 67%


(2) Local copy number in tumor = 4
(3) Number of mutated copies per cancer cell = 1

-> Allelic frac%on = 2/10 = 0.2

The (variant) allelic frac%on is the frac%on of alleles (DNA molecules) from a locus
that carry the variant -> Also the expected frac%on of suppor%ng reads
Carter et al. Nat. Biotechnol. (2012)
Which is even worse when the tumor involves subclones

Only this subclone


T has the variant!
N
T

(1) Purity = 67%


(2) Local copy number in tumor = 4
(3) Number of mutated copies per cancer cell = 1
(4) Cancer cell frac%on (CCF) = 0.5

-> Allelic frac%on = 1/10 = 0.1

Carter et al. Nat. Biotechnol. (2012)


AN
Discovering low-frac%on variants requires deeper A LY S I S
sequencing

6.3
b For MuTect
6.3 1.0
6.3

AF=0.4
0.8
1.00 AF=0.2
6.3

Sensitivity
0.6
0.95

6.3 AF=0.1
6.3
0.4
0.90
AF=0.05
0.2
0.85
0 5 10 15 20
0
0 10 20 30 40 50 0 10 20 30 40 50 60
1
False positive rate (Mb ) Tumor sample sequencing depth
Calculation (Q35) Calculation (Q35) f = 0.4
MuTect STD MuTect STD (virtual tumors) f = 0.2
Sensi%vity
MuTect HC (recall) decreases with the variant
MuTectaHC llele frac%on
(virtual tumors) f = 0.1
f = 0.2 MuTect HC (downsampling) f = 0.05
MuTect HC + PON (downsampling)

Cibulskis et al. Nat. Biotechnol (2013)


PART 3:
SOMATIC VARIANT DISCOVERY
WORKFLOW
Soma%c Variant Discovery Workow

Indels coming
soon! (M2)

+ some post-processing
to rescue TiN variants
and eliminate ar%facts
Mapping and pre-processing

Same as germline:
BWA + Picard + GATK

Done separately for each
sample in a tumor/normal pair

Cancer-specic pre-processing

Co-realignment of T/N pairs


Indel realignment together
with nwayOut to keep them
separate


Es%ma%on of cross-sample
contamina%on
Variant discovery

Call SNPs
with MuTect
Indels coming
soon! (M2)
Post-processing

Rescuing real variants that are rejected due to


TiN contamina%on

Filtering to eliminate ar%facts


Addi%onal analyses

Annota%on with Oncotator (see Day 5)


ABSOLUTE (stromal contamina%on)
dRanger/Breakpointer (rearrangements)

And more to be published soon


h[p://www.broadins%tute.org/cancer/cga/

Soma%c Variant Discovery Workow

Indels coming
soon! (M2)

+ some post-processing
to rescue TiN variants
and eliminate ar%facts
talks

Further reading
Documenta%on coming soon to the GATK website

In the mean%me, see
h[p://www.broadins%tute.org/cancer/cga/Home

You might also like