What Is DNA Copy Number

What is DNA copy number?
Normally, each somatic cell contains 2 copies of every

chromosome.
One of the earliest observed copy number changes is trisomy

of chromosome 21 in Downs Syndrome.
In fact, it became apparent later that chromosome aberrations

come in all forms and sizes.
High density DNA copy number data
Array-based Comparative Genomic Hybridization
Figures from Garnis et al. (2004)
DNA Copy Number Data from Different Platforms
Why analyze DNA copy number?

Cancer genomics

Douglas et al. (2004), colorectal cancer.

Copy number polymorphisms in Hapmap samples
Statistical methods for single sample, total copy

number segmentation
1. Circular Binary Segmentation algorithm of Olshen et al.

(2004)

number segmentation

(2004)
2. HMM based methods (Fridlyand et al. (2004), Lai et al.
(2007))

number segmentation

(2004)
(2007))
3. Wavlet based methods of Hsu et al. (2005)

number segmentation

(2004)
(2007))
4. Cluster ALong Chromosomes method of Wang et al.
(2005)

number segmentation

(2004)
(2007))
4. Cluster ALong Chromosomes method of Wang et al.
(2005)
5. Many others: CBS, HMM, GLAD, CNV, CGHseg,
Quantreg,Wavelet, Lowess, ChARM, GA, L1
Regularizaiton, ACE...
HMM Model of Fridlyand et al. (2004)

This is a classic application of hidden Markov models:
The underlying states 1, . . . , K represent the true copy

number.
Given state k , the observed intensity levels are N(k , 2 ).
The transition matrices and emission parameters are

estimated by EM.
The AIC or BIC criterion is used to choose K .
A Bayesian Model for Inference
When we estimate model parameters,

confidence intervals are desirable!

1. Confidence bands on estimated copy number.

2. How certain are we that [i, j] contains a CNV?

3. Confidence intervals on the aberration boundaries.

3. Confidence intervals on the aberration boundaries.
4. Confidence intervals on global measures of complexity",
such as total number of aberrations.
Observations
1. For array-CGH data, there is a known baseline at 0.
Observations

2. Due to mosaicism, the data is drawn from mixtures of
discrete copy number levels, and thus is continuous.
Observations

2. Due to mosaicism, the data is drawn from mixtures of
discrete copy number levels, and thus is continuous.
3. In some tumors the number of distinct levels is very high.
Fitted Levels
Heterogeneity of cancer samples
Image from: http://science.kennesaw.edu/ mhermes/cisplat/cisplat19.htm
Stochastic Change Model
St {baseline, changed}
baseline state: t = 0,
changed state: t N(, v ).
If St jumps, t takes on new value. Otherwise t = t 1.
If St jumps, t takes on new value. Otherwise t = t 1.

yt = t + t ,
t N(0, 1)
P(St = changed St1 = baseline) = p

P(St = different changed state St1 = changed) = b
P(St = baseline St1 = changed) = c
P(St = changed St1 = baseline) = p

P(St = different changed state St1 = changed) = b
P(St = baseline St1 = changed) = c
This can be modeled with a 3-state Markov model with transition
matrix:
1 p 12 p 12 p
P= c
a
b .
c
b
a
Estimating t , St
We can compute:
E(t y1:n )
P(St = changed y1:n )
P(CNV at [i,j] y1:n )
smoothed" estimate of mean

probability of CNV at t
probability of aberration at [i, j]
Estimating t , St
The posterior distribution of t given n (1 t n), which is a
mixture of normal distributions and a point mass at 0:
ijt N(ij , vij ).

t n t 0 +
1itjn
Estimating t , St
The posterior distribution of t given n (1 t n), which is a
mixture of normal distributions and a point mass at 0:
ijt N(ij , vij ).

t n t 0 +
1itjn
The parameters of this distribution can be computed by

recursive formulas.
E(t y1:n )
ijt ij ,
1itjn
where
P(St = changed y1:n )
P(CNV at [i,j] y1:n )
ijt ,
t = t
At ,
/
ijt = ijt At ,
At = t +
1itjn
t = pt [(1 p)
pt+1 + c
qt+1 ] c,
/
{
qi,t (p
pt+1 + b
qt+1 ) p,
ijt =
aqi,t
qj,t+1 i,t t+1,j /(pi,j ),
i t = j,
i t < j.
ijt ,
Hyperparameter Estimation
The model was defined as:
yt = t + t ,
t N(0, 1)
St modeled by a 3-state Markov model with transition matrix:
1 p 12 p 12 p
P= c
a
b .
c
b
a
Hyperparameter Estimation
The model was defined as:
yt = t + t ,
t N(0, 1)
St modeled by a 3-state Markov model with transition matrix:
1 p 12 p 12 p
P= c
a
b .
c
b
a
The hyperparameters of this model are , , v , a, b, c , p.
Likelihood of the data as a function of these hyperparameters
can be expressed by recursive formulas. Maximum-likelihood
values, computed by the EM algorithm, are used.
Confidence Bands for BT474
Inference on Measures of Genome Complexity

What Is DNA Copy Number

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

What Is DNA Copy Number

Uploaded by

Copyright:

Available Formats

What is DNA copy number?

Normally, each somatic cell contains 2 copies of every

What is DNA copy number?

One of the earliest observed copy number changes is trisomy

What is DNA copy number?

In fact, it became apparent later that chromosome aberrations

High density DNA copy number data

Array-based Comparative Genomic Hybridization

Figures from Garnis et al. (2004)

DNA Copy Number Data from Different Platforms

Why analyze DNA copy number?

Why analyze DNA copy number?

Why analyze DNA copy number?

Statistical methods for single sample, total copy

1. Circular Binary Segmentation algorithm of Olshen et al.

Statistical methods for single sample, total copy

1. Circular Binary Segmentation algorithm of Olshen et al.

Statistical methods for single sample, total copy

1. Circular Binary Segmentation algorithm of Olshen et al.

Statistical methods for single sample, total copy

1. Circular Binary Segmentation algorithm of Olshen et al.

Statistical methods for single sample, total copy

1. Circular Binary Segmentation algorithm of Olshen et al.

HMM Model of Fridlyand et al. (2004)

The underlying states 1, . . . , K represent the true copy

Given state k , the observed intensity levels are N(k , 2 ).

The transition matrices and emission parameters are

The AIC or BIC criterion is used to choose K .

A Bayesian Model for Inference

When we estimate model parameters,

A Bayesian Model for Inference

When we estimate model parameters,

A Bayesian Model for Inference

When we estimate model parameters,

A Bayesian Model for Inference

When we estimate model parameters,

A Bayesian Model for Inference

When we estimate model parameters,

1. For array-CGH data, there is a known baseline at 0.

1. For array-CGH data, there is a known baseline at 0.

1. For array-CGH data, there is a known baseline at 0.

Heterogeneity of cancer samples

Image from: http://science.kennesaw.edu/ mhermes/cisplat/cisplat19.htm

Stochastic Change Model

Stochastic Change Model

changed state: t N(, v ).

If St jumps, t takes on new value. Otherwise t = t 1.

Stochastic Change Model

changed state: t N(, v ).

If St jumps, t takes on new value. Otherwise t = t 1.

Stochastic Change Model

P(St = changed St1 = baseline) = p

Stochastic Change Model

P(St = changed St1 = baseline) = p

smoothed" estimate of mean

ijt N(ij , vij ).

ijt N(ij , vij ).

The parameters of this distribution can be computed by

P(St = changed y1:n )

P(CNV at [i,j] y1:n )

Confidence Bands for BT474

Inference on Measures of Genome Complexity

You might also like