You are on page 1of 81

frame[plain]

Module 1 Introduction to Flow Cytometry Analysis in R

Ryan Brinkman
Senior Scientist, BC Cancer Agency Associate Professor, Department of Medical Genetics, UBC Vancouver, British Columbia, Canada

CBW: Flow Cytometry Data Analysis using R

June 17, 2013

Module Objectives

This module won't teach you how to sh. (That's Module 2) It will teach you that there is such a thing as shing. And that sh are tasty. Even when eaten raw... and wriggling

Module 1: Introduction

bioinformatics.ca

Computational Analysis Kicks Ass

Part I: Hypothesis

Automated algorithms have reached a level of maturity that enables them to match and in many cases exceed the results produced by human experts.

Part II: Overview of available tools in BioConductor

Part III: Illustration of their use for diagnosis and discovery (8x)

Module 1: Introduction

bioinformatics.ca

Automated Flow Cytometry Data Analysis

Why should you care?

1985
Samples: Colours: Events: Data: CPU: RAM: Power 3 Mhz 2 MB 1 3 50,000

2012
466 13 400,000

16,000x 600x 12 @3 GHz 48 GB/node 7,000,000x

Fruit:

seeds, colour (p<0.05)*

Murphy Cytometry (1985)

Aghaeepour et al.

Bioinformatics

(2012) *Barone BMJ (2000)

Manual Analysis of High-Dimensional Data

Could possibly be improved upon?

Time consuming, especially for discovery Analysis guided by history with limited, intuitive exploration Rarely (ever?) examine entire multidimensional dataset Signicant cross-individual variability (>10%) No appropriate statistical basis to assess relative signicance Not fun (?)

Unfortunately, the use of three or more independent uorescent parameters complicates the analysis of the resulting data signicantly. Murphy Cytometry (1985)

Despite the technological advances in acquiring [30] parameters per single cell, methods for analyzing multidimensional single-cell data remain inadequate. Qiu et al. Nature Biotechnology (2011)

Module 1: Introduction

bioinformatics.ca

What Automated Analysis Needs to Deal With

Large number of dimensions, events, samples Mutifactorial formats Need quick, robust, fully automated processing Need to maintain data & metadata relationships

No commercially available software solves these issues*


Bashashti et al. Adv Bioinformatics (2009) PMID 20049163 *Robinson et al. Expert Opinion Drug Discovery (2012) PMID 22708834 Le Meur Curr Opin Biotechnol (2013) PMID 23062230

Module 1: Introduction

bioinformatics.ca

Solution: Free, Open Source Statistical Programming

R is a free/libre open source, robust statistical programming environment for Windows, Mac & Linux that oers a wide range of statistical and visualization methods BioConductor provides R software modules for biological and clinical data analysis A

scripted approach to high throughput data analysis Non-interactive, self-documented, reproducible Breaks problem into smaller pieces (packages) Modules can plug-in & swap-out
Collaborative development
http://bioconductor.org

Integrates with other software tools via open data standards

Module 1: Introduction

bioinformatics.ca

Data processing & visualization (12/28)

28 R packages for Flow Analysis

owCore* Read/write & process ow data plateCore* Analyze multiwell plates owUtils* Import gates, transformation and compensation owQ* Quality control of ungated data owStats* Advanced statistical methods and functions ncdfFlow Advanced methods for large dataset processing QUALIFIER Quality control and assessment of gated data owViz Visualization (e.g., histograms, dot plots, density plots) owPlots* Graphical displays with statistical tests owWorkspace* Importing FlowJo workspaces iFlow GUI for exploratory analysis and visualization owTrans* Estimates parameters for data transformation OpenCyto Simplies data processing *Peer-reviewed manuscript available

Module 1: Introduction

bioinformatics.ca

14 R packages for Automated Gating

owClust* Clustering using t-mixture model with Box-Cox transformation owMerge* owClust + entropy-based merging owMeans* k-means clustering and merging using the Mahalanobis distance SamSpectral* Ecient spectral clustering using density-based down-sampling owQB Q&B analysis owPeaks* Unsupervised clustering using k-means & mixture model owFP* Fingerprint generation owPhyto* Analysis of marine biology data FLAME* Multivariate nite mixtures of skew & tailed distributions owKoh Self-organizing maps NMF-curvHDR* Density-based clustering and non-negative matrix factorization owCore/Stats* Sequential gating and normalization w/ Beta-Binomial model PRAMS* 2D Clustering and logistic regression SPADE* Density-based sampling, k-means clustering & minimum spanning trees

Module 1: Introduction

*Peer-reviewed manuscript available

bioinformatics.ca

2 Packages for Post-Gating Signicance Assessment


owType* Automated phenotyping using 1D gates extrapolated to multiple dimensions RchyOptimyx* Cellular hierarchies correlated with outcome of interest

*Peer-reviewed manuscript available

Module 1: Introduction

bioinformatics.ca

BioConductor's Open, Extensible Infrastructure

Packages are Interoperable & Interchangeable


flowUtils flowCore flowQ plateCore Quality Assessed fdaNorm

Compensated Data

guassNorm

Normalized Data

area vs. (width/height) gate viability marker gate Questionable Samples & Events Removed logicle arcSinh, etc. Transformed Data flowType

>20 algorithms Populations Identified [Gated Data]

mclust kmeans

Matched Populations Ratio Median

RchyOptimyx Discovery heatmaps

Cell Proportions and MFIs

flowDensity FeaLect Diagnosis

High throughput / dimensional analysis: man v. machine

Strain et al., Advances in Bioinformatics, 2009

Module 1: Introduction

bioinformatics.ca

RStudio

Le Meur Curr Opin Biotechnol (2013)

Module 1: Introduction

bioinformatics.ca

Getting Started: r-project.org

Module 1: Introduction

bioinformatics.ca

Getting Started: bioconductor.org

Module 1: Introduction

bioinformatics.ca

bioconductor.org/install

Module 1: Introduction

bioinformatics.ca

bioconductor.org.org/help

Module 1: Introduction

bioinformatics.ca

bioconductor.org/help/workows/high-throughput-assays/

Module 1: Introduction

bioinformatics.ca

BioConductor Vignettes

Each Bioconductor package contains at least one vignette Vignettes provide a task-oriented description of functionality Vignettes contain interactive, executable examples You can access the PDF version of a vignette from R:
browseVignettes(package = owMeans)

Opens browser with links to the vignette PDF & plain-text R le containing the code used in the vignette.

Module 1: Introduction

bioinformatics.ca

Example Package Page

Module 1: Introduction

bioinformatics.ca

Vignettes: Peer-reviewed Executable Documentation

Module 1: Introduction

bioinformatics.ca

Documentation Peer-reviewed by Scientists

Module 1: Introduction

bioinformatics.ca

BioConductor's Open, Extensible Infrastructure

Packages are Interoperable & Interchangeable


flowUtils flowCore flowQ plateCore Quality Assessed fdaNorm

Compensated Data

guassNorm

Normalized Data

area vs. (width/height) gate viability marker gate Questionable Samples & Events Removed logicle arcSinh, etc. Transformed Data flowType

>20 algorithms Populations Identified [Gated Data]

mclust kmeans

Matched Populations Ratio Median

RchyOptimyx Discovery heatmaps

Cell Proportions and MFIs

flowDensity FeaLect Diagnosis

Getting Started: Coerce Data & QA

Compensated owUtils & owCore Data owQ & plateCore

Quality Assessed

Problem:

Detect systematic and stochastic eects that are

not likely to be biologically motivated

Systematic errors often indicate the need for adjustment in sample handling or processing Aberrant samples should be identied & potentially removed from downstream analyses to avoid spurious results

Solution:

Exploratory Data Analytic (EDA) tools (graphical

methods) can review ungated FCM data in a time & cost eective manner

Module 1: Introduction

Le Meur et al., Cytometry A, 2007 Hahne et al., BMC Bioinformatics, 2009 bioinformatics.ca

QA: One of These Samples is Not Like the Others

Median FSC/SSC grouped by well columns Nonparametric K-S on dierence of medians Pairwise comparisons between columns or between one column and the rest of the plate.

Module 1: Introduction

bioinformatics.ca

owQ: Summary web page

Module 1: Introduction

bioinformatics.ca

Quality Checking of Gated Flow Cytometry Data

QA with QUALIFIER

owQ: QA on ungated data QUALIIFIER: ID deviant samples by monitoring the consistencies of the underlying statistical properties Can uses the FlowJo gating template Outlier detections and visualization is ecient and interactive netcdfFlow enables analysis of very large datasets

Finak et al., BMC Bioinformatics 2012

Module 1: Introduction

bioinformatics.ca

QA with QUALIFIER

QA with QUALIFIER

QA with QUALIFIER: : Flourescence Stability

Module 1: Introduction

bioinformatics.ca

Data Normalization

Quality Assessed

fdaNorm guassNorm

Normalized Data

Problem:

Between-sample variation can pose a signicant

challenge for analysis

Hard to match (label) biologically relevant cell populations across samples due to technical variation in sample acquisition, instrumentation dierences

Solution:

Remove technical between-sample variation by

aligning prominent features (landmarks) in the raw data on a per-channel basis

Hahne et al., Cytometry A, 2009

Module 1: Introduction

bioinformatics.ca

Normalization Example: Laser Change

Laser switch on instrument moved a subset of populations -> labelling & static gate problem

Module 1: Introduction

bioinformatics.ca

Data Normalization

raw data
CD3

gaussNorm
CD3

fdaNorm
CD3

200 400 600 800 1000

200 400 600 800 1000

0 200 400 600 800 1000

raw data
0.008 0.008

gaussNorm
0.008 0.010

fdaNorm

0.006

0.006

0.006

q q q q

q q q q

q q q

q q q

q q

0.004

q q

q q

0.004

q q q q q q

0.004

q q qq q q q q q q q q q q q q q q q q q q q q q q q q

q q q q q q qq q q q q qq q q q q

qq q

q q q

q q

q q q q q q q q

0.002

q q qq q q q

q q q q

q q q

q q q q q

0.002

0.002

q q q q q

q q q q q q q q q q q q q q qq q q q q q

qq q q

0.000

0.000

0.000

q q

200

400

600

800

200

400

600

800

200

400

600

800

Module 1: Introduction

bioinformatics.ca

Data Normalization

Before

After

Module 1: Introduction

bioinformatics.ca

Data Transformation

Problem: Solution:

Eective data processing depends on

transformations (e.g., logicle) Optimize parameter choice for dierent

transformations to improve visualization, gating & clustering

Finak et al.

BMC Bioinformatics

, 2010

Module 1: Introduction

bioinformatics.ca

Data Transformation

Module 1: Introduction

bioinformatics.ca

Automated Gating

Advantages of automated approaches for data analysis:

Free labour (except for computer time) Can be as accurate (more?!) than human gating Better chance of nding interesting populations in high-D data Allow scientists to do valuable science

See Module 4 - Cell Population Identication

Module 1: Introduction

bioinformatics.ca

Population Labelling

Problem: Solution:

Labelling of populations is sample dependant Cluster the clusters

Module 1: Introduction

bioinformatics.ca

Labelling (Step 1): Cluster the cluster the centers...

Module 1: Introduction

bioinformatics.ca

Labelling (Step 2): ... then assign labels to clusters

Module 1: Introduction

bioinformatics.ca

Labelling Populations

Module 1: Introduction

bioinformatics.ca

Labelling: But algorithms don't know what a T cell is

...yet - see Module 6: Additional FCM Analysis Resources Slide courtesy Holden Maecker

Module 1: Introduction

bioinformatics.ca

Data In: Discovery/Diagnosis Out

See Module 4: Cell Population Identication

Module 1: Introduction

bioinformatics.ca

Tools in Practice (and workshop dataset): HIV Onset

United States Military HIV Natural History Study


PBMCs of

466

HIV

+ personnel and beneciaries from Army,

Navy, Marines, and Air Force.

13

surface markers and KI-67 (cell proliferation).

Clinical Data: Survival times including 135 events

a An event is dened as progression to AIDS or initiation of HAART.

Module 1: Introduction

bioinformatics.ca

Manual Gating Results

Frequency of

long-lived Memory Cells

(CD127 ) has a

positive correlation. Frequency of

cells with high proliferation

(KI-67 ) has a

negative correlation.

Can we nd what they have found? Can we nd more?

Module 1: Introduction

bioinformatics.ca

Automated Analysis with Positive Control Manual Result

No false positives with our analysis platform

Previous manual results:


1 2

Frequency of correlation. Frequency of correlation.

long-lived Memory Cells

(CD127+ ) has a positive (KI-67+ ) has a negative

cells with high proliferation

New automated results:


1 2 3

Frequency of short-lived cells with high proliferation (CD127 KI-67+ ) has a negative correlation. Frequency of terminal eector T-cells has a negative correlation. Frequency of transitional memory T-cells has a negative correlation.

Module 1: Introduction

bioinformatics.ca

Manual analysis:

Computational analysis:
Eventfree Proportion 0.2 0.4 0.6 0.8 1.0 1.0
Lowest (371/86%) Highest (59/14%) Lowest (387/90%) Highest (43/10%)

1.0

0.8

0.6

0.4

0.2

0.0

0.0

5 10 Y ears from Cell Sample

15

10

15

0.0

0.2

p < 8.6e13

p < 1.8e06

0.4

0.6

0.8

Lowest (356/83%) Highest (74/17%)

p < 4.6e10

10

15

Module 1: Introduction

bioinformatics.ca

Example 2: GC lymphoma vs. Reactive Lymphoid Hyperplasia


1 2 3 4 5 6
Lymphoma with a germinal center-type phenotype (N=52) vs. Reactive lymphoid hyperplasia (N=48) 8-color B-cell tube 5,660 phenotypes were extracted by owType ROC analysis to ID phenotypes with a strong predictive power Phenotypes were analyzed by RchyOptimyx CD5-CD19+CD10+CD38- not ID'd by manual analysis

Craig et al., Submitted

Specicity of 91.75%; Sensitivity of 65.4%

Manual Re-analysis: CD10+CD38-

Mantei and Wood, Flow Cytometric Evaluation of CD38 Expression Assists in Distinguishing Follicular Hyperplasia from Follicular Lymphoma, Cytometry Part B 2009

Module 1: Introduction

bioinformatics.ca

RchyOptimyx: Discovers the most signicant cell populations dierentiating 2 groups

Example 3: Lyoplate vs. Liquid Reagents

Villanova et al.

PLoS ONE

, (In Press)

Module 1: Introduction

bioinformatics.ca

Manual validation of automated results

Liquid vs. Lyoplate Reagents

Lyoplate: better detection of cytokines & activation markers Increased overall brightness

Module 1: Introduction

bioinformatics.ca

Example 4: Dierential diagnosis 22 DLBCL vs. 50 FOLL

Automated identication of samples for clinical review

Routine standard of care mandates 1% review of cases Use computational classication to ID best cases to review

Prediction

DLBC 4 14

FOLL Diagnosis 46 8 FOLL DLBC

DLBCL 2 16

FOLL 48 6

Remove discrepancies in assignment between two methods: N=60 Prediction DLBC 0 12 FOLL Diagnosis 44 4 FOLL DLBC

Module 1: Introduction

bioinformatics.ca

Example 4: DLBCL incorrectly predicted as FOLL

Reports, ow data re-reviewed by pathologist:

1 2 3

Lymphoma with no appreciable normal B cell component Composite lymphoma 2X more normal (reactive/polyclonal) B cells than malignant B cells present in the ow sample; partial involvement by lymphoma

A variant of DLBCL but no indication of a FOLL or normal B cell component (partial involvement). However, the ow report and re-review of the plots indicates that no malignant cells were present. Likely ow vs. histology sample discrepancy.

Module 1: Introduction

bioinformatics.ca

Automated analysis with negative control manual result

Example 5: Parkinson's Disease

41 patients with accompanying diagnosis (PD vs Normal) Test data set 137 patients owType/RchyOptimyx used to identify best population to separate groups Classier performed poorly: PPV=0.52, NPV=0.36 Manual analysis also had no signicance populations when corrected for multiple testing

Module 1: Introduction

bioinformatics.ca

Example 6: Lyoplate Automated Analysis

What can be done to improve large studies?

Standardizing panels facilitates cross-center comparisons


Sucient maturity in cellular immunology for consensus of denitions for most commonly studied immune cells

NIH, FOCIS, FITMaN & HIPC B, T, NK cells, monocytes, DCs & activation status

Standardized reagents reduce variability


Reagent/staining variability can confound clinical studies Lyophilization can stabilize FCM reagents against degradation Reagents for analysis of cryopreserved PBMCs

Maecker et al.

Nature Reviews Immunology

(2012)

Module 1: Introduction

bioinformatics.ca

Lyoplate Automated Analysis


Most variation in cross-center studies due to gating

Variation reduced from 30% using 1 manual gater

Maecker et al.

BMC Immunology

(2005)

Module 1: Introduction

bioinformatics.ca

Lyoplate Populations of Interest

Lyoplates: Automated Analysis Based on Density Estimates

Bc e l l s

Me mo r y Bc e l l s

T r a n s i t i o n a l Bc e l l s

Gating CD16- from monocytes population

manual: 98.8/.96 vs auto: 98/1

(a) Manual Gating

(b) Automated

Module 1: Introduction

bioinformatics.ca

Gating DC from CD14- population

manual: 2.17 vs auto: 2.3

(c) Manual Gating

(d) Automated

Module 1: Introduction

bioinformatics.ca

Gating CD11c+ and CD123+ from DC

manual: 18.5/14.7 vs auto: 20.21/12.98

(e) Manual Gating

(f) Automated

Module 1: Introduction

bioinformatics.ca

Gating NK cells from live population

manual: 17.0 vs auto: 16.49

(g) Manual Gating

(h) Automated

Module 1: Introduction

bioinformatics.ca

Gating CD56 and CD16 from NK cells

manual: 34.5/3.62 vs auto: 35.92/4.2

(i) Manual Gating

(j) Automated

Module 1: Introduction

bioinformatics.ca

manual: 81.9 vs auto: 81.8

Gating live cells

(k) Manual Gating

(l) Automated

Module 1: Introduction

bioinformatics.ca

Gating plasmablasts from CD3-CD20- populations

manual: 2.38 vs auto: 2.22

(m) Manual Gating

(n) Automated

Module 1: Introduction

bioinformatics.ca

Gating transitionals from lymphocytes

manual: 9.38 vs auto: 10.56

(o) Manual Gating

(p) Automated

Module 1: Introduction

bioinformatics.ca

Lyoplates: Manual vs. Automated on Targeted Populations

Corrected for donor and center-level eects

F r a c t i o no f C e l l s

Me mo r y I g D - N a i v e

P l a s ma B l a s t s

Bc e l l s

C D 3 -

Me mo r y I g D +

Ma n u a l A u t o ma t e d Ma n u a l A u t o ma t e dMa n u a l A u t o ma t e d

Lyoplates: Manual vs. Automated on Targeted Populations

Corrected for donor and center-level eects

F r a c t i o no f C e l l s

Me mo r y I g D - N a i v e

P l a s ma B l a s t s

Bc e l l s

C D 3 -

Me mo r y I g D +

Ma n u a l A u t o ma t e d Ma n u a l A u t o ma t e dMa n u a l A u t o ma t e d

Diagnostic panel for specic cancer

Example 7: Clinical Diagnosis

Problem:

3 patient categories

Healthy Specic type of cancer Other disease

Analysis:

Use owDensity to follow gating hierarchy and bin

samples based on population proportions from training set (based on K-nearest neighbour)

Module 1: Introduction

bioinformatics.ca

Gating Strategy: Disease

Module 1: Introduction

bioinformatics.ca

Gating Strategy: Other Disease

Module 1: Introduction

bioinformatics.ca

Gating Strategy: Other Disease

Module 1: Introduction

bioinformatics.ca

owDensity: Gating on Threshold

Module 1: Introduction

bioinformatics.ca

owDensity: Gating on Distribution

Module 1: Introduction

bioinformatics.ca

Normal vs. Specic Cancer vs. Other Disease

Example 7: Clinical Diagnosis

Normal 15 0 0

Cancer 0 8 0

Other Disease 1 2 13

< predicted

as

Normal Cancer Other Disease

Module 1: Introduction

bioinformatics.ca

Example 8: International Mouse Phenotyping Consortium

Massive ow data generation

20,000 lines (2Fx1M) generated (1/gene) over next 5 years 2 x 10-12D FCS les for each of 60,000 mice 120,000 FCS les and 25 other phenotype measurements

Module 1: Introduction

bioinformatics.ca

owDensity vs. 3 human experts for NK cells

Acknowledgements
R/BioConductor.org ow cytometry infrastructure Genentech Robert Gentleman, all BioConductor contributors FlowCAP Coordinating Committee Nima Aghaeepour (BCCA), Greg Finak (FHCRC), Raphael Gottardo (FHCRC), Tim Mosmann (U Rochester), Richard H. Scheuermann (UTSW) Data providers and participants owcap.owsite.org HIV NIH/USMIL Mario Roederer, Pratip K. Chattopadhyay BCCA Nima Aghaeepour, Adrin Jalali, Kieran O'Neill, Habil Zare GC Lymphoma UPMC Fiona Craig, Stephen Ten Eyke BCCA Nima Aghaeepour DLBCL vs. FOLL BCCA Andrew Weng, Nima Aghaeepour, Faysal El Khettabi Parkinson's Disease UNMC Howard E Gendelman BCCA Kieran O'Niell FlowRepository BCCA Josef Spidlen, Karin Breuer CytoBank Chad Rosenberg, Nikesh Kotecha $ Funding NIH (NIBIB, NIAID), HIP-C, TFRI & TFF, CCS, MSFHR, WHCF

COFFEE BREAK 30min

You might also like