Professional Documents
Culture Documents
Ryan Brinkman
Senior Scientist, BC Cancer Agency Associate Professor, Department of Medical Genetics, UBC Vancouver, British Columbia, Canada
Module Objectives
This module won't teach you how to sh. (That's Module 2) It will teach you that there is such a thing as shing. And that sh are tasty. Even when eaten raw... and wriggling
Module 1: Introduction
bioinformatics.ca
Part I: Hypothesis
Automated algorithms have reached a level of maturity that enables them to match and in many cases exceed the results produced by human experts.
Part III: Illustration of their use for diagnosis and discovery (8x)
Module 1: Introduction
bioinformatics.ca
1985
Samples: Colours: Events: Data: CPU: RAM: Power 3 Mhz 2 MB 1 3 50,000
2012
466 13 400,000
Fruit:
Aghaeepour et al.
Bioinformatics
Time consuming, especially for discovery Analysis guided by history with limited, intuitive exploration Rarely (ever?) examine entire multidimensional dataset Signicant cross-individual variability (>10%) No appropriate statistical basis to assess relative signicance Not fun (?)
Unfortunately, the use of three or more independent uorescent parameters complicates the analysis of the resulting data signicantly. Murphy Cytometry (1985)
Despite the technological advances in acquiring [30] parameters per single cell, methods for analyzing multidimensional single-cell data remain inadequate. Qiu et al. Nature Biotechnology (2011)
Module 1: Introduction
bioinformatics.ca
Large number of dimensions, events, samples Mutifactorial formats Need quick, robust, fully automated processing Need to maintain data & metadata relationships
Module 1: Introduction
bioinformatics.ca
R is a free/libre open source, robust statistical programming environment for Windows, Mac & Linux that oers a wide range of statistical and visualization methods BioConductor provides R software modules for biological and clinical data analysis A
scripted approach to high throughput data analysis Non-interactive, self-documented, reproducible Breaks problem into smaller pieces (packages) Modules can plug-in & swap-out
Collaborative development
http://bioconductor.org
Module 1: Introduction
bioinformatics.ca
owCore* Read/write & process ow data plateCore* Analyze multiwell plates owUtils* Import gates, transformation and compensation owQ* Quality control of ungated data owStats* Advanced statistical methods and functions ncdfFlow Advanced methods for large dataset processing QUALIFIER Quality control and assessment of gated data owViz Visualization (e.g., histograms, dot plots, density plots) owPlots* Graphical displays with statistical tests owWorkspace* Importing FlowJo workspaces iFlow GUI for exploratory analysis and visualization owTrans* Estimates parameters for data transformation OpenCyto Simplies data processing *Peer-reviewed manuscript available
Module 1: Introduction
bioinformatics.ca
owClust* Clustering using t-mixture model with Box-Cox transformation owMerge* owClust + entropy-based merging owMeans* k-means clustering and merging using the Mahalanobis distance SamSpectral* Ecient spectral clustering using density-based down-sampling owQB Q&B analysis owPeaks* Unsupervised clustering using k-means & mixture model owFP* Fingerprint generation owPhyto* Analysis of marine biology data FLAME* Multivariate nite mixtures of skew & tailed distributions owKoh Self-organizing maps NMF-curvHDR* Density-based clustering and non-negative matrix factorization owCore/Stats* Sequential gating and normalization w/ Beta-Binomial model PRAMS* 2D Clustering and logistic regression SPADE* Density-based sampling, k-means clustering & minimum spanning trees
Module 1: Introduction
bioinformatics.ca
Module 1: Introduction
bioinformatics.ca
Compensated Data
guassNorm
Normalized Data
area vs. (width/height) gate viability marker gate Questionable Samples & Events Removed logicle arcSinh, etc. Transformed Data flowType
mclust kmeans
Module 1: Introduction
bioinformatics.ca
RStudio
Module 1: Introduction
bioinformatics.ca
Module 1: Introduction
bioinformatics.ca
Module 1: Introduction
bioinformatics.ca
bioconductor.org/install
Module 1: Introduction
bioinformatics.ca
bioconductor.org.org/help
Module 1: Introduction
bioinformatics.ca
bioconductor.org/help/workows/high-throughput-assays/
Module 1: Introduction
bioinformatics.ca
BioConductor Vignettes
Each Bioconductor package contains at least one vignette Vignettes provide a task-oriented description of functionality Vignettes contain interactive, executable examples You can access the PDF version of a vignette from R:
browseVignettes(package = owMeans)
Opens browser with links to the vignette PDF & plain-text R le containing the code used in the vignette.
Module 1: Introduction
bioinformatics.ca
Module 1: Introduction
bioinformatics.ca
Module 1: Introduction
bioinformatics.ca
Module 1: Introduction
bioinformatics.ca
Compensated Data
guassNorm
Normalized Data
area vs. (width/height) gate viability marker gate Questionable Samples & Events Removed logicle arcSinh, etc. Transformed Data flowType
mclust kmeans
Quality Assessed
Problem:
Systematic errors often indicate the need for adjustment in sample handling or processing Aberrant samples should be identied & potentially removed from downstream analyses to avoid spurious results
Solution:
methods) can review ungated FCM data in a time & cost eective manner
Module 1: Introduction
Le Meur et al., Cytometry A, 2007 Hahne et al., BMC Bioinformatics, 2009 bioinformatics.ca
Median FSC/SSC grouped by well columns Nonparametric K-S on dierence of medians Pairwise comparisons between columns or between one column and the rest of the plate.
Module 1: Introduction
bioinformatics.ca
Module 1: Introduction
bioinformatics.ca
QA with QUALIFIER
owQ: QA on ungated data QUALIIFIER: ID deviant samples by monitoring the consistencies of the underlying statistical properties Can uses the FlowJo gating template Outlier detections and visualization is ecient and interactive netcdfFlow enables analysis of very large datasets
Module 1: Introduction
bioinformatics.ca
QA with QUALIFIER
QA with QUALIFIER
Module 1: Introduction
bioinformatics.ca
Data Normalization
Quality Assessed
fdaNorm guassNorm
Normalized Data
Problem:
Hard to match (label) biologically relevant cell populations across samples due to technical variation in sample acquisition, instrumentation dierences
Solution:
Module 1: Introduction
bioinformatics.ca
Laser switch on instrument moved a subset of populations -> labelling & static gate problem
Module 1: Introduction
bioinformatics.ca
Data Normalization
raw data
CD3
gaussNorm
CD3
fdaNorm
CD3
raw data
0.008 0.008
gaussNorm
0.008 0.010
fdaNorm
0.006
0.006
0.006
q q q q
q q q q
q q q
q q q
q q
0.004
q q
q q
0.004
q q q q q q
0.004
q q qq q q q q q q q q q q q q q q q q q q q q q q q q
q q q q q q qq q q q q qq q q q q
qq q
q q q
q q
q q q q q q q q
0.002
q q qq q q q
q q q q
q q q
q q q q q
0.002
0.002
q q q q q
q q q q q q q q q q q q q q qq q q q q q
qq q q
0.000
0.000
0.000
q q
200
400
600
800
200
400
600
800
200
400
600
800
Module 1: Introduction
bioinformatics.ca
Data Normalization
Before
After
Module 1: Introduction
bioinformatics.ca
Data Transformation
Problem: Solution:
Finak et al.
BMC Bioinformatics
, 2010
Module 1: Introduction
bioinformatics.ca
Data Transformation
Module 1: Introduction
bioinformatics.ca
Automated Gating
Free labour (except for computer time) Can be as accurate (more?!) than human gating Better chance of nding interesting populations in high-D data Allow scientists to do valuable science
Module 1: Introduction
bioinformatics.ca
Population Labelling
Problem: Solution:
Module 1: Introduction
bioinformatics.ca
Module 1: Introduction
bioinformatics.ca
Module 1: Introduction
bioinformatics.ca
Labelling Populations
Module 1: Introduction
bioinformatics.ca
...yet - see Module 6: Additional FCM Analysis Resources Slide courtesy Holden Maecker
Module 1: Introduction
bioinformatics.ca
Module 1: Introduction
bioinformatics.ca
466
HIV
13
Module 1: Introduction
bioinformatics.ca
Frequency of
(CD127 ) has a
(KI-67 ) has a
negative correlation.
Module 1: Introduction
bioinformatics.ca
Frequency of short-lived cells with high proliferation (CD127 KI-67+ ) has a negative correlation. Frequency of terminal eector T-cells has a negative correlation. Frequency of transitional memory T-cells has a negative correlation.
Module 1: Introduction
bioinformatics.ca
Manual analysis:
Computational analysis:
Eventfree Proportion 0.2 0.4 0.6 0.8 1.0 1.0
Lowest (371/86%) Highest (59/14%) Lowest (387/90%) Highest (43/10%)
1.0
0.8
0.6
0.4
0.2
0.0
0.0
15
10
15
0.0
0.2
p < 8.6e13
p < 1.8e06
0.4
0.6
0.8
p < 4.6e10
10
15
Module 1: Introduction
bioinformatics.ca
Mantei and Wood, Flow Cytometric Evaluation of CD38 Expression Assists in Distinguishing Follicular Hyperplasia from Follicular Lymphoma, Cytometry Part B 2009
Module 1: Introduction
bioinformatics.ca
Villanova et al.
PLoS ONE
, (In Press)
Module 1: Introduction
bioinformatics.ca
Lyoplate: better detection of cytokines & activation markers Increased overall brightness
Module 1: Introduction
bioinformatics.ca
Routine standard of care mandates 1% review of cases Use computational classication to ID best cases to review
Prediction
DLBC 4 14
DLBCL 2 16
FOLL 48 6
Remove discrepancies in assignment between two methods: N=60 Prediction DLBC 0 12 FOLL Diagnosis 44 4 FOLL DLBC
Module 1: Introduction
bioinformatics.ca
1 2 3
Lymphoma with no appreciable normal B cell component Composite lymphoma 2X more normal (reactive/polyclonal) B cells than malignant B cells present in the ow sample; partial involvement by lymphoma
A variant of DLBCL but no indication of a FOLL or normal B cell component (partial involvement). However, the ow report and re-review of the plots indicates that no malignant cells were present. Likely ow vs. histology sample discrepancy.
Module 1: Introduction
bioinformatics.ca
41 patients with accompanying diagnosis (PD vs Normal) Test data set 137 patients owType/RchyOptimyx used to identify best population to separate groups Classier performed poorly: PPV=0.52, NPV=0.36 Manual analysis also had no signicance populations when corrected for multiple testing
Module 1: Introduction
bioinformatics.ca
NIH, FOCIS, FITMaN & HIPC B, T, NK cells, monocytes, DCs & activation status
Maecker et al.
(2012)
Module 1: Introduction
bioinformatics.ca
Maecker et al.
BMC Immunology
(2005)
Module 1: Introduction
bioinformatics.ca
Bc e l l s
Me mo r y Bc e l l s
T r a n s i t i o n a l Bc e l l s
(b) Automated
Module 1: Introduction
bioinformatics.ca
(d) Automated
Module 1: Introduction
bioinformatics.ca
(f) Automated
Module 1: Introduction
bioinformatics.ca
(h) Automated
Module 1: Introduction
bioinformatics.ca
(j) Automated
Module 1: Introduction
bioinformatics.ca
(l) Automated
Module 1: Introduction
bioinformatics.ca
(n) Automated
Module 1: Introduction
bioinformatics.ca
(p) Automated
Module 1: Introduction
bioinformatics.ca
F r a c t i o no f C e l l s
Me mo r y I g D - N a i v e
P l a s ma B l a s t s
Bc e l l s
C D 3 -
Me mo r y I g D +
Ma n u a l A u t o ma t e d Ma n u a l A u t o ma t e dMa n u a l A u t o ma t e d
F r a c t i o no f C e l l s
Me mo r y I g D - N a i v e
P l a s ma B l a s t s
Bc e l l s
C D 3 -
Me mo r y I g D +
Ma n u a l A u t o ma t e d Ma n u a l A u t o ma t e dMa n u a l A u t o ma t e d
Problem:
3 patient categories
Analysis:
samples based on population proportions from training set (based on K-nearest neighbour)
Module 1: Introduction
bioinformatics.ca
Module 1: Introduction
bioinformatics.ca
Module 1: Introduction
bioinformatics.ca
Module 1: Introduction
bioinformatics.ca
Module 1: Introduction
bioinformatics.ca
Module 1: Introduction
bioinformatics.ca
Normal 15 0 0
Cancer 0 8 0
Other Disease 1 2 13
< predicted
as
Module 1: Introduction
bioinformatics.ca
20,000 lines (2Fx1M) generated (1/gene) over next 5 years 2 x 10-12D FCS les for each of 60,000 mice 120,000 FCS les and 25 other phenotype measurements
Module 1: Introduction
bioinformatics.ca
Acknowledgements
R/BioConductor.org ow cytometry infrastructure Genentech Robert Gentleman, all BioConductor contributors FlowCAP Coordinating Committee Nima Aghaeepour (BCCA), Greg Finak (FHCRC), Raphael Gottardo (FHCRC), Tim Mosmann (U Rochester), Richard H. Scheuermann (UTSW) Data providers and participants owcap.owsite.org HIV NIH/USMIL Mario Roederer, Pratip K. Chattopadhyay BCCA Nima Aghaeepour, Adrin Jalali, Kieran O'Neill, Habil Zare GC Lymphoma UPMC Fiona Craig, Stephen Ten Eyke BCCA Nima Aghaeepour DLBCL vs. FOLL BCCA Andrew Weng, Nima Aghaeepour, Faysal El Khettabi Parkinson's Disease UNMC Howard E Gendelman BCCA Kieran O'Niell FlowRepository BCCA Josef Spidlen, Karin Breuer CytoBank Chad Rosenberg, Nikesh Kotecha $ Funding NIH (NIBIB, NIAID), HIP-C, TFRI & TFF, CCS, MSFHR, WHCF