You are on page 1of 11

Trends Trends in Analytical Chemistry, Vol. 25, No.

11, 2006

Problems with the ‘‘omics’’


Jackson O. Lay Jr., Sabine Borgmann, Rohana Liyanage, Charles L. Wilkins

‘‘omics’’ studies involve the measurement of large numbers of parameters, certain biological state of a cell or organ-
typically genes (genomics), proteins (proteomics), lipids (lipidomics) or ism. The development of modern instru-
metabolites (metabolomics). Values associated with each of the measured mental analytical methods in the
parameters are searched to find examples that correlate with biological twentieth century had a profound impact
endpoints, often disease or cancer. Although the number of parameters being upon biomarker research. Routine ana-
measured has increased dramatically with ‘‘omics’’ studies, the number of lytical measurement of chemical markers
biological and methodological replicates has not. As with comparable clas- from biological systems became possible,
sical biomedical studies, there exist limitations arising from whether or not and the search for chemical biomarkers of
the analytical methods are adequate for making the measurements needed diseases began in earnest. As new
and whether or not the measurements are implemented properly. In addit- chemical-measurement techniques were
ion, because of the large number of measurements and the limited number of developed, they each were applied, in turn,
test subjects, unique problems arise in ‘‘omics’’ studies involving statistics to the search for new disease-related bio-
and bias. Even though improvements in technology may well minimize markers. These were not yet ‘‘omics’’
measurement problems, the inherent difficulties associated with measuring studies because the number of markers in
so many parameters from a limited number of test subjects will remain. This each study could not be considered
review focuses on the four main problems with the ‘‘omics’’: bias, statistics, ‘‘omic’’ or ‘‘massive’’.
methodology, and method misuse. Although we give suggestions to minimize The primary issues with early
the impacts of these problems, some problems may not be solved unless the biomarker-discovery research were mea-
number of measurements is more consistent with the number of possible surement feasibility or difficulty. Mea-
biological replicates. surements were difficult and the biological
ª 2006 Elsevier Ltd. All rights reserved. matrix was complex. Difficulties included
mixture complexity, analyte polarity,
Keywords: Bias; Genomics; Lipidomics; Metabolomics; Methodology; ‘‘Omics’’;
sensitivity and disease specificity. Could
Proteomics; Statistics
the measurement be done? Sometimes
method validation and method suitability
1. Introduction also became issues. Measurements were
Jackson O. Lay Jr.,
feasible but sometimes difficult to imple-
Rohana Liyanage
Arkansas Statewide Mass The term ‘‘omic’’ is derived from the Latin ment or reproduce.
Spectrometry Facility, suffix ‘‘ome’’ meaning mass or many. In recent years, as analytical chemists
University of Arkansas, Thus ‘‘omics’’ studies are like other studies have developed methods believed suitable
Fayetteville, AR 72701, except that they involve a mass (large for the analysis of a very large number of
USA
number) of measurements per endpoint analytes from a single sample, it has
Sabine Borgmann, rather than one or a few. In the current become popular to attempt the measure-
Charles L. Wilkins* context, the measurements are of chemi- ment of many thousands of potential
Department of Chemistry and cal markers indicative of biological events. biomarkers rather than only a few.
Biochemistry, The original concept of using chemicals as This approach gives rise to the modern
University of Arkansas,
markers to monitor biological endpoints is ‘‘omics’’ experiments. Considering these
Fayetteville, AR 72701,
USA difficult to attribute to any one individual. approaches generally decreased the num-
Considering that Hippocrates proposed the ber of replicates possible while massively
presence of ‘‘fruity odors’’ in human increasing the number of parameters tes-
breath as a diagnostic marker for diabetes ted, it is not clear why this approach was so
in the fourth century B.C., it could be widely accepted – except perhaps simply
argued that the concept of chemical bio- because it became possible. For these new
*
markers has been around for over two approaches, the statistical issues may be
Corresponding author.
millennia. problematic, even if the measurement
Tel.: +1 479 575 3160;
Fax: +1 479 575 4049; The term ‘‘biomarker’’ is used here for techniques themselves are perfected.
E-mail: cwilkins@uark.edu molecules that are characteristic of a Today, investigators consider searching

1046 0165-9936/$ - see front matter ª 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.trac.2006.10.007
Trends in Analytical Chemistry, Vol. 25, No. 11, 2006 Trends

entire classes of possible markers within an organism, approaches and add additional challenges due to mix-
with the aim of finding biomarkers or their combinations, ture complexity and the number of biomarkers measured
from which subtle changes or complex interactions can be from a single sample. The resulting use of very large data
used to make difficult associations with biological events. sets coupled with a search for biomarkers showing subtle
The goal of ‘‘omic’’ approaches is to acquire compre- or complex relationships adds an additional layer of
hensive, integrated understanding of biology by studying complexity and introduces new problems. Many of these
all biological processes to identify the different players problems can be solved, but this will require considerable
(e.g., genes, RNA, proteins and metabolites) rather than interaction between methods developers, users, statisti-
each of those individually. ‘‘Omic’’ approaches currently cians, and bioinformatics experts. The purpose of this
attempt to address specific biological questions, often review is to present representative ‘‘problems with the
without need for prior understanding of a biological ‘‘omics’’’’ with the aim of provoking a discussion
basis. With the development of current and future amongst these various groups and others.
technologies, ‘‘omic’’ research might aim to explain The problems currently encountered with ‘‘omics’’
more complex systemic questions and become a tool in studies are reflected in the provocative title of a recent
diagnostics and drug development. To date, we are still article in The Scientist – ‘‘Gene Association Studies Typ-
far from putting these demanding goals into practice due ically Wrong’’ [1]. Although the title makes specific
to the complexity of the task. reference to genomics, the problems encountered in
Fig. 1 simplifies the interaction of the main ‘‘omics’’ these studies have parallels in all the ‘‘omics’’, as the
actors (genome, proteome, transcriptome and metabo- author notes. After the fact, it has been possible to
lome) within the organism. Fig. 2 displays and simplifies demonstrate that the first published study linking genes
the quantitative dimensions of the different ‘‘omic’’ ele- to diseases is often not supported by subsequent studies.
ments within the context of scale and time resolution. For example, 1992 research associating a 2–4-fold
When the targets are proteins, genes and metabolites, increased risk of schizophrenia in persons carrying two
the corresponding approaches are called proteomics, copies of a specific mutation [2] was not supported by
genomics and metabolomics (or metabonomics). These some 50 follow-up studies, which failed to confirm the
and other ‘‘omics’’ retain the analytical requirements initial findings [3]. If this happened rarely, it would be
and problems associated with older or classical analytical easy to attribute it to the users, rather than to a problem

“ome” parameters influencing the “ome”

genome person‘s genetic predisposition, environmental factors, interactions

regulation
feedback regulation

transcriptome person‘s genetic predisposition, environmental factors, interactions

regulation

regulation (activation, inactivation of e.g. proteins, expression capacity)


proteome/metabolome status of metabolism (catabolism, anabolism; rate of compound absorption,
distribution, metabolism, and excretion)
age, sex, ethnicity
characteristics: stage of cell cycle
allosteric interaction interactions
covalent modification
enzyme levels temperature
linked metabolic pathways redox state
compartmentalization
metabolic specialization of organs culture conditions
external stimuli (nutrition, life style, stress, pharmaceuticals, radiation, ..)

resulting “phenome” (phenotype) “normal” function vs. dysfunction (organism with good health vs.
disease)
cumulative effects of a multiplicity of genes, various signaling and
metabolic pathways

Figure 1. Complexity of the different ‘‘omic’’ levels is highly impacted by genetic predisposition, environmental factors and regulatory interac-
tions within and between the ‘‘omes’’.

http://www.elsevier.com/locate/trac 1047
Trends Trends in Analytical Chemistry, Vol. 25, No. 11, 2006

Figure 2. Complexity of the different ‘‘omic’’ levels with respect to quantitative dimensions and time-scales of processes within an organism
(time-scale for proteome adapted from [41]).

with the general approach. Unfortunately this is not researcher noted that bias was now a threat to the
unusual and may even be the norm. Indeed, in one validity of cancer molecular biomarker research [6].
evaluation of the probability of reproducing genomics Some prominent results in cancer research have been
studies finding associations of genes with complex dis- disputed or not reproduced, and bias has been increas-
eases, a review of 370 studies suggested that, more often ingly recognized as a significant cause of this problem.
than not, subsequent research does not confirm the Part of the overall problem has to do with the approach to
initial finding [4,5]. discovery using the ‘‘omics’’ and the high stakes involved
For the purposes of this article, the ‘‘problems with the in the finding of new ‘‘omics’’-related biomarkers. ‘‘omic’’
‘‘omics’’’’ are divided into four general categories: bias, measurements are usually made without a specific
statistical, methodological, and, ‘‘fitness of use’’. Most of hypothesis in mind. In this case, it may be somewhat
the problems with one ‘‘omics’’ are common to all sim- easier to introduce bias and then to essentially ‘‘find what
ilar ‘‘omics’’ applications. Examples are based on repre- you are looking for’’ because these experiments are so
sentation in the literature and their utility for illustrating loosely defined at the start. With larger data sets, it may
a specific problem. Due to space limitations, only a few be inherently easier to find a correlation irrespective of
examples can be presented here, but the problems are real cause and effect. False positives probably outnumber
general across all ‘‘omics’’. true positives for reasons outlined below. The need to
discover non-invasive biomarker tests for diagnosis of
diseases, where no current non-invasive tests are avail-
2. Bias able, places considerable pressure on investigators to
publish work quickly because there is some potential to
One of the emerging difficulties with the ‘‘omics’’, save lives or to make profits.
including the flawed genomics studies referred to above, Consider the example of ovarian cancer. For this dis-
is the problem of bias. Ransohoff has defined bias as ‘‘the ease, there is currently no non-invasive screening test
systematic erroneous association of some characteristic and the development of a new biomarker for patient
with a group in a way that distorts a comparison with screening would be extremely important and useful.
another group’’ [6]. Bias is not limited to genomics but Recently a ‘‘pattern-recognition’’ serum proteomics
also occurs in proteomics and other ‘‘omics’’. This approach was used to search for a marker for ovarian
problem is sufficiently severe in proteomics that one cancer, which the authors reported had nearly 100%

1048 http://www.elsevier.com/locate/trac
Trends in Analytical Chemistry, Vol. 25, No. 11, 2006 Trends

success at discriminating between women with and  the reliability of the method;
without the disease [7,8]. On this basis, a manufacturer  the background variability within an organism and
began the process of developing a blood test kit for between organisms;
patient screening. However, a number of other investi-  the number of experimental states being investigated;
gators challenged these studies, based both on the  the number of parameters being measured; and,
methodology used and also on whether or not the pro-  even the likelihood that the hypothesis is a reasonable
posed biomarkers were plausible from a biological point one.
of view. Doubts about the validity of the first outcomes This last factor, sometimes called prior probability, is
were sufficient for the FDA subsequently to intervene [9]. often not considered, but it is extremely important.
Some have proposed that the underlying problem in Because of the large number of parameters being mea-
these specific studies was bias [6]. sured, and the typically limited number of biological
Bias is difficult to address in the design of experiments, replicates, the probability (P) of false-positive associations
in part because it is often unintentional. Although some is very high. A major reason for this is because of the
sources of bias are obvious, many are not. Unfortu- propensity for declaring statistical significance based on
nately, bias cannot be minimized by simply increasing the value alone, particularly any P value below 0.05
the sample size. Similarly, there is no relationship [13]. However, the false-positive-report probability
between reproducibility and bias, nor is there a statistical (FPRP) depends not only on the P value but also on the
method that can solve problems caused by bias in study prior probability that the association between the ‘‘omic’’
design. variant and the disease is real and also the statistical
Bias can be minimized only by careful control and power of the test. One approach to understanding the role
planning of the experiments. This can include sample of false-positive findings is to estimate the FPRP so as to
blinding and randomization. Both are powerful methods use it as a measure of significance. When the FPRP value
for reducing bias. Because bias can also be introduced by is high, positive associations should be given little weight.
instrumental drift, randomization of the analyses them- Even though both false-negative and false-positive
selves (in order and in time) is also important. finds detract from the quality of studies, much of the
While many of these controls are familiar to those reproducibility problem mentioned above can be attrib-
involved in method-validation studies or approvals by uted to false-positive findings. One estimate of the prob-
regulatory agencies, they have not been universally ability of false-positive findings in association between
taught or applied, especially in the ‘‘omics’’. An impor- genetic variants and disease places the probability as
tant tool for monitoring ‘‘omics’’ studies for bias (and high as 0.95 [14]. In this case, the false positives so
other problems) is the development of standards for outnumber the true positives that the data are probably
reporting how analyses are conducted. Microarray- not even useful for screening.
experiment guidelines, such as the Minimum Informa- In the absence of bias, three factors can contribute to a
tion About a Microarray Experiment (MIAME), address false-positive yet statistically significant findings [15].
reporting of technical details [10]. Similar criteria for One is the absolute magnitude of the P value. Another is
reporting proteomics data are being developed [11]. the statistical power of the method and the third is
something called ‘‘the fraction of tested hypotheses that
is true’’. Each of these contributes to increasing the
3. Statistics FPRP, and thus the likelihood that an apparently valid
prediction is false. It is not uncommon in ‘‘omics’’ studies
It has been suggested that there is a widespread mis- to set a value of a (type-1 error) at 0.05 and call any
conception amongst users of ‘‘omics’’ that large association between the ‘‘omics’’-detected biomarker and
numbers of measurements could somehow make up disease with a P value below a statistically significant.
for small numbers of samples [12]. However, the Fig. 3 shows the association between prior probability
reality is that the introduction of multiple measure- and false-positive probability for P just below a = 0.05
ments in no way minimizes the need to use statisti- for four different statistical powers. The false-positive
cally appropriate numbers of samples. Replicates must probability can be minimized by reducing P substantially
be adequate both with respect to ‘‘measurement pro- when the prior probability is high. Increasing the
cess’’ and ‘‘biology’’. Some have argued that biological numbers of test and control samples can also reduce the
replicates are even more important than methodolog- false-positive rate, but again only when there are high
ical or technical replicates [12]. Even in well-controlled prior probabilities. Increasing the sample size provides
systems, such as inbred mice, there is usually signifi- only marginal benefit when the prior probability is low.
cant biological variability. The minimum number of Clearly, if all of these factors are not taken into account,
replicates therefore has to be selected to satisfy statis- the probability of a false positive can be very high even
tical criteria driven by: with significant statistical power, as Fig. 3 shows. The

http://www.elsevier.com/locate/trac 1049
Trends Trends in Analytical Chemistry, Vol. 25, No. 11, 2006

Figure 3. Effect of changes in prior probability and statistical power on a false-positive-report probability (FPRP) when a = 0.05. FPRP and P is at
or just below a. (Reprinted from [13]).

current common practice, corresponding to use of a including the likelihood of false positives, can be judged
universal statistical criterion for statistical significance, by others after experiments are completed.
based on rejecting the null hypothesis at a value of a of
0.5, will not work across the range of prior probabilities
for true association, even using the maximum statistical 4. Methodology
power of 1. Ultimately, for the testing of highly unlikely
hypotheses, even an infinitely large sample size cannot To investigate the different processes within an organ-
by itself substantially reduce the FPRP [13]. ism, a tissue, a cell, or a cell compartment, there exist
The proposed FPRP approach requires a realistic esti- arsenals of technologies. Table 1 provides a selection and
mate of the prior probability, specification of necessary a brief evaluation of several current techniques. For
odds ratios for specific applications, definition (in further reading, we recommend several reviews [19–25]
advance) of criteria for an acceptable FPRP value, and that emphasize and partly explain why ‘‘omics’’ do not
then, after a study, determination of whether the find- meet current needs and expectations. In another,
ings are ‘‘noteworthy’’ based on pre-established and ‘‘omics’’ is addressed in the context of understanding
presumably defensible criteria. Often there are difficulties disease and drug development [26]. The status of current
in establishing (or estimating) these values. A complete technologies used in genomics and proteomics, and
mathematical treatment of FPRP is beyond the scope of possible future approaches, is comprehensively reviewed
this work, but can be found in an article by Wacholder by Baak et al. [27].
et al. [13]. Here, we focus on the challenges and the pitfalls facing
Other possible methods for reducing the number of researchers in this field. A typical workflow of an ‘‘omic’’
false positives range from Bonferroni corrections [16], approach consists of several steps:
the use of other false-discovery-rate methods [17,18] to (1) definition of starting conditions as well as analyti-
a large reduction of a from 0.05 to 0.005 or 0.0005 cal/biological questions and study design;
[14]. (2) sample preparation;
Because there is no single generally accepted approach (3) separation of analytes (if necessary);
to the statistical design of ‘‘omics’’ experiments, there is (4) analysis and quantification of analytes;
clearly a problem with substantial numbers of irrepro- (5) validation;
ducible findings, false positives, and otherwise statisti- (6) documentation; and,
cally unsound data. The best approach would seem to be (7) bioinformatics (chemometrics).
inclusion of a biostatistician in the design of ‘‘omics’’ Table 2 summarizes some general pitfalls, issues to
studies from their outset. It is essential to report sufficient consider and suggestions related to designing ‘‘omic’’
experimental details, so that the statistical significance, studies (and experiments, in general).

1050 http://www.elsevier.com/locate/trac
Trends in Analytical Chemistry, Vol. 25, No. 11, 2006 Trends

Table 1. Selected current methodologies employed in genomics, transcriptomics, proteomics, and metabolomics (partially adapted/modified
from [24,27,42])

Methodologies Biomarker target Merits Limitations


employed in

Genomics
Karyotyping Structural changes in Analysis of whole chromosome Labor intensive
chromosomes
Fluorescence in situ Homologous sequences in Gene specific Limited number of probes per labeling
hybridization (FISH) chromosomes reaction
Comparative genome Deletions or amplifications Analysis of whole genome; array Resolution limited to 10 Mbp (for
hybridization (CGH) CGH available array CGH 0.5 Mbp)
Spectral karyotyping Structural changes in Analysis of whole genome Labor-intensive fluorescence labeling
(SKY) chromosomes
Microsatellite instability Stability of chromosomes, Allele specific Large numbers of microsatellites
(MSI) efficiency of DNA repair required
Single nucleotide Genomic DNA, Archived patient genomic DNA High-density SNP mapping required;
polymorphism (SNP) polymorphism can be used SNP maps not yet completed (for
current results, refer to http://
snp.cshl.org/) [43]

Transcriptomics
sRNA interference Small interfering RNA Gene specific; analysis of whole A lot of false-negative and false-
(RNAi) interaction with targeted genome positive results
mRNA molecules (RNA
silencing)
DNA microarray [44,45] mRNAs Established technique; Many Fresh tissue obligatory for cDNA
technology (‘‘gene- thousands of gene-specific isolation, closed transcription profiling
expression profiling’’) mRNAs; ÔsignatureÕ analysis; (pre-requisite: knowledge of the
high-throughput sequence of the genome studied);
reproducibility [46]
Serial analysis of gene RNAs Open transcription profiling; Expensive, labor intensive, qualitative
expression (SAGE) absolute quantitation of gene
expression; large SAGE database
available
Massively parallel Nucleic acid fragments Open transcription profiling; Expensive, labor intensive
signature sequencing convenient handling of complex
(MPSS) mixture of nucleic acid
fragments, throughput
Differential display Differentially expressed Open transcription profiling Expensive, labor intensive; false
genes positives and redundancies observed
Tissue microarray DNA, RNA, in situ Morphologic context; high Formalin-fixed samples
hybridization throughput; compatible to
archived samples

Proteomics [21,24]
Two-dimensional gels Complex protein mixtures Established method; dynamic Systematic limitation to access very
range of 102–104 for protein large, very small, low-abundant [31],
expression; sensitivity (10 6); and membrane-bound proteins; only
post-translational modifications 2000 spots per gel; requires
(PTMs) evident; data experienced personal; time-
incorporated in current literature consuming; limited automation [31]
and databases potential
Yeast two-hybrid Protein-ligand interaction Established basic technique; Only binding properties evaluated (no
automated assay design biochemical function); false-positive or
available; fast; protein function false-negative results observed
determined in vivo (additional confirmation necessary); no
structural information; labor intensive
Protein microarrays Complex protein mixtures Fast, simple and reproducible New technology; limited dynamic
[47,48] assay designs; high throughput range (62D gels); no standardized
formats for assay validation (e.g., cross-
reactivity) and data-processing limited
resources of antibodies and
recombinant proteins

(continued on next page)

http://www.elsevier.com/locate/trac 1051
Trends Trends in Analytical Chemistry, Vol. 25, No. 11, 2006

Table 1 (continued)

Methodologies Biomarker target Merits Limitations


employed in
Fluorescence Protein-ligand interaction Spatial and time resolution; Labor intensive
microscopy (FRET, FLIP, in vivo applications
FRAP, FLIM, BRET,
PRIM)
Mass spectrometry Enzymatic protein fragments Established method; high Limited dynamic range, highest
15
(Peptide Mass from a single protein. specificity and sensitivity (10 ) specificity requires tandem MS (MS2)
Fingerprinting)[49]
Chromatography Enzymatic protein fragments, High sensitivity, moderate to Complex separations, complex data,
coupled with MS multiple proteins. high specificity. false identifications possible
(e.g., LC-MS [51])
Stable isotope methods Pooled enzymatic fragments Accurate quantification of Dynamic range; tagging required
coupled with MS [52] from differentially labeled differential expression;
targets.
Surface-enhanced laser Affinity-selected proteins Fast; small samples required; False positives; poor reproducibility
desorption/ionization from complex mixtures easy sample preparation; already
(SELDI) used for clinical applications
Tissue microarray Protein (e.g., Morphologic context; high Formalin-fixed samples
immunohistochemistry) throughput; compatible with
archived samples
Nuclear magnetic Proteins Detailed 3D structural Poor sensitivity; low throughput
resonance (NMR) information
Microfluidics [53] and Proteins Small, portable; simple Experimental techniques
lab-on-a-chip[54] operation; high throughput; cost-
effective

Metabolomics [55–58]
Cell-based assays of Secreted metabolites Many established methods for Standardized assay and data-
secreted metabolites analysis of some metabolites processing format is missing; limited
validity of in vitro functional assays for
in vivo predictions
Metabolites from body Metabolites in the Near real-time metabolic Invasive technique; dependent on
fluids (‘‘microdialysis’’ extracellular space in living profiling; clinical microdialysis is osmosis
[59–61]) tissue already available
Liquid chromatography Metabolites High throughput, complete Expensive; much lower sensitivity and
(LC)-NMR [61] sample recovery, little sample throughput than MS techniques
preparation
Chromatographic Metabolites Higher sensitivity and throughput
(LC, GC, CE [63,64]) than NMR
approaches with MS,
FTMS [65,66] Metabolites High mass accuracy and Expensive equipment; limited
resolution; broad range of throughput; possible analyte
analytes interactions without chromatographic
introduction

The first step in the process includes proper design of very accurate control of many parameters. These include
the study. From seven large studies aimed at predicting variables such as cell-culture conditions, sample prepa-
prognosis of cancer patients by employing DNA-micro- ration, sample handling, and even protein separation.
array techniques, Michiels and co-workers [28] found Unfortunately, many parameters may also affect either
that the results obtained in these studies depended on protein expression or protein measurements. Effects on
how the patients were selected in the training set. the measured proteome are thus not limited to those
Because of problems with this aspect of study design, the caused by the biomarkers in question. Many of these
outcomes of these studies were biased. proteomics-influencing parameters are difficult to
We covered difficulties associated with statistical recognize or control.
problems in study design and the effects of bias in pre- Assuming that the experimental parameters gov-
vious sections. After study design is complete, the erning proteome expression are well controlled, there is
experimental processes can begin. Consider a proteomics still a need to measure it. Technically, it is challenging
study as an example. Generating a precise, reproducible to maintain the quantitative integrity of the measure-
representation of the proteome intrinsically demands ment of a complex sample during sample preparation,

1052 http://www.elsevier.com/locate/trac
Trends in Analytical Chemistry, Vol. 25, No. 11, 2006 Trends

Table 2. Examples of pitfalls in ‘‘omic’’ approaches

Common pitfalls Issues to consider Suggestions


Definition of starting Number of samples is too Start with reasonable goals for a Develop sound experimental
conditions and low for number of features study; select approaches that are design [25]; analyze homogeneous
analytical question analyzed (Ôovertraining of able to meet those objectives groups of samples; perform large-
the initial learning setÕ) (e.g., select a reasonable number scale analysis on a large number of
of samples to be investigated) samples; use a hypothesis-driven
rather than a discovery-oriented
approach
Sample preparation Changing the ratio(s) of RNA and proteins can easily Perform adequate sampling (e.g.,
analyte(s) during sample degrade over time if the samples homogenous groups of samples);
preparation are not handled and stored optimize and standardize sample
properly handling and storage; select
sample types that are stable,
reliable, ethically inoffensive, and
repeatedly accessible
Separation of analytes (if Often incapable of Choose techniques with Use the ÔbestÕ techniques currently
necessary) resolving the whole appropriate resolving power and available for specific question of
‘‘ome’’ dynamic range for analytes of study
interest
Data analysis, Automated data analysis Avoid being ‘‘over-optimistic’’ Check raw data of positive ‘‘hits’’
quantification and might give false-positive about positive results for plausibility; validate computer
interpretation or false-negative results; algorithms for automated data
data are not quantitative handling
Validation, Done insufficiently Key step in proving the utility of Validate and document study in
documentation the study accordance with good laboratory
practice (GLP) guidelines
[21,67,68]; verify results by an
independent method and/or
independent sample set (repeated
random sampling); examine inter-
laboratory variation to ensure that
novel techniques have adequate
accuracy and precision; establish
recommendations for novel
techniques when they are ready to
be used by a broader scientific
community
Bioinformatics, Done insufficiently ‘‘Omic’’ data need to be Employ chemometric methods to
chemometrics [12] produced in a format that will facilitate the outcome of ‘‘omic’’
meet the requirements of large- studies; extract relevant
scale modeling tools (e.g., information by optimizing
definition, quantity, function, experiments, processing data, and
localization, dynamics, and calibrating, organizing, and
interaction of the component of performing quality controls of the
interest) [41] ‘‘omic’’ studies; (see also [12])

separation, and quantification. Indeed, for reasons of of certainty, that signal arises from a pre-determined
magnitude, the measurement is difficult and has little target. In other words, the method employs selection
precedent in other traditional instrumental analytical criteria to allow quantification of predetermined
fields. First, most instrumental analytical measure- targets. However, with proteomics, the target identities
ments do not involve large numbers of analytes. Typ- and their quantities are determined simultaneously and
ical instrumental methodologies are designed to they may not correspond to previously expected com-
measure relatively small numbers of analytes quanti- pounds for which standards and controls have been
tatively, often under 50. Second, usual analytical developed. Moreover, with proteomics, all of this is
approaches require either identity or quantity be done at a level of complexity not attempted before.
determined, but not both. Analysis of pollutants, toxi- Given these considerations, it is now difficult to defend
cants, or typical analytical test subjects normally the initial optimism regarding the ability to quantify
requires that the analyte has characteristics defined by and to identify proteins from the proteome simulta-
specificity criteria that assure, with some defined level neously with any degree of accuracy. This may require

http://www.elsevier.com/locate/trac 1053
Trends Trends in Analytical Chemistry, Vol. 25, No. 11, 2006

repeated analysis of multiple samples derived from the some of these differences cause the cancer. Other differ-
control and the experimental sources. ences are incidental.
In addition to the problems associated with the size Such baseline, inter-individual, and intra-individual
and the scope involved in ‘‘omics’’ measurements, variability not only confounds studies of group re-
there are intrinsic challenges associated with each new sponse, but ultimately may reveal clues about individ-
sample. These challenges are related to the great ual responses if the measurements can eventually be
complexity of the sample and the complex or incom- made. Revealing more insights into intra-species and
pletely known biochemistry of the disease state (see inter-species variations is not only interesting from the
also Figs. 1 and 2). Focusing on the proteome, a scientific point of view, but might also promote the
sound selection or ‘‘snapshots’’ (several proteomes) of movement towards ‘‘personalized’’ biomarker identifi-
a given sample needs to be collected for the results to cation and drug treatment (briefly, tailor-made medi-
be biologically and/or clinically relevant. Proteins are cine). A recent review about pharmacogenomics related
quite diverse in their structures, turnover rates, com- to acute lymphoblastic leukemia revealed insights into
partmentalization, dynamic range of abundance (in its usefulness for cancer therapy [35]. Clayton and co-
humans cells, up to 107–108, and, in plasma, up to workers introduced a rat model with an interesting
1012) [29,30], and function. More than 200 covalent pharmaco-metabolomic approach for phenotyping and
post-translational modifications of proteins have been ‘‘personalized’’ drug treatment [36]. They propose to
found and identified. There is little doubt that others monitor the metabolites of pre-dose biofluids of animals
remain undetected. Taking the dynamic range of sev- in order to take metabolic and other individual pre-
eral proteomics methods into account (102–104), it is dispositions into account, and to use the results to
clearly not even possible to sample proteins of widely predict inter-subject variations in effects of drug
varying copy numbers using a single proteomics metabolism.
measurement or approach. The fact is that current
technologies are simply inadequate for covering a
whole proteome, even if statistics and sampling were 5. Fitness of use
not an issue. Although progress is being made on low-
abundance-protein detection (e.g., in new 2D protein For disease states showing large changes in significant
maps, as pointed out by Ahmed and Rice in a recent numbers of abundant biomarkers, successful ‘‘omics’’
review [31]), we are nevertheless still far from even studies should still be possible, even with current tech-
the possibility of producing a complete proteome (or nological limitations. Unfortunately, problems with ‘‘fit-
metabolome maps) for any organism for reasons of ness of use’’ are often encountered in the ‘‘omics’’ and
both complexity and dynamic range. It might well be have limited the successes (e.g., in 2000, Rhodes and co-
that low-abundant proteins are those of greatest workers reported that immunohistochemical studies of
importance in regulatory-based diseases, but they are, tumors in different laboratories were not useful due to
of course, also the most difficult to analyze. poorly defined analytical techniques [37]). Reports are
Another aspect worth considering is the variation of accumulating that the reproducibility and significance of
biomarkers in different compartments (e.g., individual reported proteomic biomarkers are questionable [6,38–
cells, sub-populations of cells, tissues, organs, and 40]. Standards are now being established requiring
organisms). There is a lack of understanding of the reporting of all aspects of proteomic and genomic studies
baseline or ‘‘normal’’ value for biological variation and (beyond what would be required in other fields) because
for variation within different compartments, so baseline of prior fitness-of-use and validation problems. The fail-
criteria are missing in many modern experimental ure of many studies probably correlates with failure to
designs. The importance of baseline variability can be consider the analytical need to define quality standards,
illustrated using a study by Colman-Lerner et al. using including method validation and standardization. Vali-
pathway-induced yeasts [32]. Their work concentrated dation of the data obtained by ‘‘omic’’ studies is a highly
on cell-to-cell variability. About half the variation they critical feature and requires much more effort than
observed was related to differences measured in associ- proof-of-principle experiments. Reproducibility needs to
ation with the cell-cycle stage of the individual cells [32]. be estimated by using adequately independent samples of
Such large differences confound the detection of other appropriate size. Hypotheses derived by ‘‘omic’’
changes, especially small ones. Small individual ‘‘omics’’ approaches need to be verified by additional, functional
differences have importance in disease states and need to studies in order to differentiate false positives from bio-
be measured, yet they exist superimposed on a variation markers that are clinically relevant. However, if the ratio
not associated with the target disease. Church stated in of false to true positives is very large, it is unlikely that
an editorial about the Personal Genome Project [33] that even this approach will work. The ‘‘omics’’ should not be
genomic variation between ‘‘normal’’ and cancer cells used as a discovery tool producing false and true posi-
might consist of about 10 6 per base pair [34] but only tives for subsequent screening unless estimates of the

1054 http://www.elsevier.com/locate/trac
Trends in Analytical Chemistry, Vol. 25, No. 11, 2006 Trends

FPRP suggest the true positives can be detected amongst If the number of biological and methodological repli-
the false ones. cates appears appropriate, the study is more fully
It is important to demonstrate intra-laboratory and designed with respect to sample preparation, analyte
inter-laboratory reproducibility and standardization of separation, quantitation, and validation approaches.
new ‘‘omic’’ tools as well as to maintain high quality To avoid bias, all criteria need to be developed before
standards using blind control and test samples. Valida- experiments begin. Method validation on a test set or
tion includes the selection of ‘‘hit’’ criteria. Which degree standards is critical. If the analytical methodology can-
of differences in a comparative analysis reflects a bio- not be validated, then the study needs to be redesigned.
logically relevant meaning? How likely is it that a dif- Once the initial analytical and statistical preconditions
ferentially expressed protein is correctly identified? Could have been met, large-scale measurements begin.
it be that it is a modified version of the identified protein? After completion, the data needs to be submitted to
statistical and chemometric treatment. This evaluation
may uncover flaws or limitations that require rethinking
6. Conclusion any presumptions about the original biological question
or methodology.
Putting theory into practice using ‘‘omic’’ approaches is If the statistical and chemometric treatments suggest a
difficult. It is not feasible to measure an entire proteome, meaningful correlation between the ‘‘omics’’ and the
genome or metabolome. However, that is not the root of biological endpoint, then a biological validation step
most problems with the ‘‘omics’’. Indeed, the very notion should be sought to minimize the chances of eliminating
that measuring every possible parameter is desirable is false positives.
probably one of the biggest misconceptions about the During each step, the entire process should be doc-
‘‘omics’’. ‘‘Omic’’-wide measurements may violate sta- umented. Studies that do not reach a biologically
tistical norms and have little precedent with respect to validated final conclusion may merit publication if
feasibility in analytical chemistry literature. sufficient details can be documented to aid in the
Fig. 4 suggests one approach. An initial sampling design of new studies. Studies that appear validated
strategy is proposed and reviewed for statistical sound- also need complete documentation, especially for eval-
ness, based on what is known or can be assumed about uation if instances arise where different studies need to
the problem. One obvious option for improving the be closely compared.
‘‘omics’’ is to decrease the number of parameters that are Integrating validation of analytical methods and
being measured in each sample and to include more statistical criteria into all phases of the ‘‘omics’’ will
replicates. This is certainly compatible with the current not be easy or inexpensive, but ultimately it is likely to
limitations of measurement technology. produce more reliable data. It appears that the scien-
tific community is willing to put significant resources
into generating large data sets by ‘‘omics’’ approaches,
despite the aforementioned difficulties. It remains to be
ideal workflow of “omic” approaches
seen whether it will be possible to learn from prior
analytical/biological questions ‘‘problems with the ‘‘omics’’’’ and ‘‘do it right’’ so as to
proposed sampling & starting conditions
leave a legacy of reliable and useable data for future
researchers or whether production of difficult-to-
biostatistics
- interpret data with significant uncertainty and false
+
positives will continue.
development of study design

Acknowledgement
documentation

sample preparation

separation of analytes (if necessary)

Part of this work was financially supported by the


analysis and quantification of analytes
Arkansas Biosciences Institute grant ‘‘Protein Charac-
method validation
- terization by Mass Spectrometry’’. The authors thank
+
Douglas D. Rhoads and Ina Radtke for helpful discus-
biostatistics & chemometrics sions.
+ -

conclusion biological validation


+ - References

[1] J. Lucentini, The Scientist 18 (2004) 20.


Figure 4. Schematic of an idealized work flow for an ‘‘omics’’
[2] M.A. Crocq, R. Mant, P. Asherson, J. Williams, Y. Hode,
experiment.
A. Mayerova, D. Collier, L. Lannfelt, P. Sokoloff, J.C. Schwartz,

http://www.elsevier.com/locate/trac 1055
Trends Trends in Analytical Chemistry, Vol. 25, No. 11, 2006

M. Gill, J.P. Macher, P. McGuffin, M.J. Owen, J. Med. Genet. 29 [35] M.H. Cheok, W.E. Evans, Natl. Rev. Cancer 6 (2006) 117.
(1992) 858. [36] T.A. Clayton, J.C. Lindon, O. Cloarec, H. Antti, C. Charuel,
[3] E.G. Jonsson, R. Kaiser, J. Brockmoller, V.L. Nimgaonkar, G. Hanton, J.P. Provost, J.L. Le Net, D. Baker, R.J. Walley,
M.A. Crocq, Psychiatr. Genet. 14 (2004) 9. J.R. Everett, J.K. Nicholson, Nature (London) 440 (2006)
[4] K.E. Lohmueller, C.L. Pearce, M. Pike, E.S. Lander, J.N. Hirsch- 1073.
horn, Nat. Genet. 33 (2003) 177. [37] A. Rhodes, B. Jasani, A.J. Balaton, K.D. Miller, J. Clin. Pathol. 53
[5] J.P.A. Ioannidis, E.E. Ntzani, T.A. Trikalinos, D.G. Contopoulos- (2000) 292.
Ioannidis, Nat. Genet. 29 (2001) 306. [38] K.R. Coombes, J.R.S. Morris, J.H. Hu, S.R. Edmonson,
[6] D.F. Ransohoff, Nat. Rev. Cancer 5 (2005) 142. K.A. Baggerly, Nat. Biotechnol. 23 (2005) 291.
[7] E. Petricoin III, A. Ardekani, B. Hitt, P. Levine, V. Fusaro, [39] E.P. Diamandis, Expert Rev. Mol. Diagn. 4 (2004) 575.
S. Steinberg, G. Mills, C. Simone, D. Fishman, E. Kohn, Lancet 359 [40] K.A. Baggerly, J.S. Morris, S.R. Edmonson, K.R. Coombes, J. Natl.
(2002) 572. Cancer Inst. 97 (2005) 307.
[8] W. Zhu, X. Wang, Y. Ma, M. Rao, J. Glimm, J.S. Kovach, Proc. [41] S. Souchelnytskyi, Proteomics 5 (2005) 4123.
Natl. Acad. Sci. USA 100 (2003) 14666. [42] A.Q. Dove, Nat. Biotechnol. 17 (1999) 233.
[9] L.A. Wagner, J. Natl. Canc. Inst. 96 (2004) 500. [43] R. Sachidanandam, D. Weissman, S.C. Schmidt, J.M. Kakol,
[10] A. Brazma, P. Hingamp, J. Quackenbush, G. Sherlock, L.D. Stein, G. Marth, S. Sherry, J.C. Mullikin, B.J. Mortimore,
P. Spellman, C. Stoeckert, J. Aach, W. Ansorge, C.A. Ball, D.L. Willey, S.E. Hunt, C.G. Cole, P.C. Coggill, C.M. Rice,
H.C. Causton, T. Gaasterland, P. Glenisson, F.C.P. Holstege, Z.M. Ning, J. Rogers, D.R. Bentley, P.Y. Kwok, E.R. Mardis,
I.F. Kim, V. Markowitz, J.C. Matese, H. Parkinson, A. Robinson, R.T. Yeh, B. Schultz, L. Cook, R. Davenport, M. Dante, L. Fulton,
U. Sarkans, S. Schulze-Kremer, J. Stewart, R. Taylor, J. Vilo, L. Hillier, R.H. Waterston, J.D. McPherson, B. Gilman,
M. Vingron, Nat. Genet. 29 (2001) 365. S. Schaffner, W.J. Van Etten, D. Reich, J. Higgins, M.J. Daly,
[11] S. Carr, R. Aebersold, M. Baldwin, A. Burlingame, K. Clauser, B. Blumenstiel, J. Baldwin, N.S. Stange-Thomann, M.C. Zody,
A. Nesvizhskii, Mol. Cell. Proteom. 3 (2004) 351. L. Linton, E.S. Lander, D. Altshuler, Nature (London) 409 (2001)
[12] D.M. Rocke, Semin. Cell Dev. Biol. 15 (2004) 703. 928.
[13] S. Wacholder, S. Chanock, G.-C. Maontserrat, L. El Ghormli, [44] M. Schena, D. Shalon, R.W. Davis, P.O. Brown, Science (Wash-
N. Rothman, J. Natl. Canc. Inst. 96 (2004) 434. ington, DC) 270 (1995) 467.
[14] H.M. Colhoun, P.M. McKeigue, G. Davey Smith, Lancet 361 [45] M. Schena, R.A. Heller, T.P. Theriault, K. Konrad, E. Lachenmeier,
(2003) 865. R.W. Davis, Trends Biotechnol. 16 (1998) 301.
[15] J.A. Sterne, G. Davey Smith, Br. Med. J. 322 (2001) 226. [46] E. Marshal, Science (Washington, DC) 306 (2004) 630.
[16] N. Risch, K. Merikangas, Science (Washington, DC) 273 (1996) [47] Y. Hu, M. Uttamchandani, S.Q. Yao, Comb. Chem. High
1516. Throughput Screen. 9 (2006) 203.
[17] Y. Benjamini, Y. Hochberg, J. R. Stat. Soc., Ser. B 57 (1995) 289. [48] D. Stoll, M.F. Templin, J. Bachmann, T.O. Joos, Curr. Opin. Drug
[18] C. Sabatti, S. Service, N. Freimer, Genetics 164 (2003) 829. Discov. Dev. 8 (2005) 239.
[19] C. Gerner, Comb. Chem. High Throughput Screen. 7 (2004) 1. [49] J. Reinders, U. Lewandrowski, J. Moebius, Y. Wagner,
[20] J.L. Harry, M.R. Wilkins, B.R. Herbert, N.H. Packer, A.A. Gooley, A. Sickmann, Proteomics 4 (2004) 3686.
K.L. Williams, Electrophoresis 21 (2000) 1071. [51] Y. Shi, R. Xiang, C. Horvath, J.A. Wilkins, J. Chromatogr., A 1053
[21] F. Vitzthum, F. Behrens, N.L. Anderson, J.H. Shaw, J. Proteome (2004) 27.
Res. 4 (2005) 1086. [52] L.V. Schneider, M.R. Hall, Drug Discov. Today 10 (2005) 353.
[22] G.L.G. Miklos, R. Maleszka, Proteomics 1 (2001) 30. [53] W.C. Sung, H. Makamba, S.H. Chen, Electrophoresis 26 (2005)
[23] M.F. Lopez, Electrophoresis 21 (2000) 1082. 1783.
[24] P.R. Graves, T.A.J. Haystead, Microbiol. Mol. Biol. Rev. 66 (2002) [54] R.B.M. Schasfoort, Expert Rev. Proteomics 1 (2004) 123.
39. [55] O. Fiehn, Compar. Funct. Genom. 2 (2001) 155.
[25] W.S. Hancock, S.L. Wu, P. Shieh, Proteomics 2 (2002) 352. [56] W.B. Dunn, D.I. Ellis, Trends Anal. Chem. 24 (2005) 285.
[26] J.A. Bilello, Curr. Mol. Med. 5 (2005) 39. [57] J. van der Greef, A.K. Smilde, J. Chemometr. 19 (2005) 376.
[27] J.P.A. Baak, E.A.M. Janssen, K. Soreide, R. Heikkilae, Ann. Oncol. [58] J.L. Griffin, Philos. Trans. R. Soc., Ser. B 361 (2006) 147.
16 (2005) 30. [59] N. Plock, C. Kloft, Eur. J. Pharm. Sci. 25 (2005) 1.
[28] S. Michiels, S. Koscielny, C. Hill, Lancet 365 (2005) 488. [60] M.J. Cano-Cebrian, T. Zornoza, A. Polache, L. Granero, Curr. Drug
[29] N.L. Anderson, N.G. Anderson, Electrophoresis 19 (1998) 1853. Metab. 6 (2005) 83.
[30] G.L. Corthals, V.C. Wasinger, D.F. Hochstrasser, J.C. Sanchez, [61] F. Magkos, L.S. Sidossis, Curr. Opin. Clin. Nutr. Metab. Care 8
Electrophoresis 21 (2000) 1104. (2005) 501.
[31] N. Ahmed, G.E. Rice, J. Chromatogr., B 815 (2005) 39. [63] W. Kolch, C. Neususs, M. Peizing, H. Mischak, Mass Spectrom.
[32] A. Colman-Lerner, A. Gordon, E. Serra, T. Chin, O. Resnekov, Rev. 24 (2005) 959.
D. Endy, C.G. Pesce, R. Brent, Nature (London) 437 (2005) 699. [64] D.C. Simpson, R.D. Smith, Electrophoresis 26 (2005) 1291.
[33] G.M. Church, Mol. Syst. Biol. (2005) doi:10.1038/msb4100040. [65] J.J. Jones, S. Mariccor, A.B. Batoy, C.L. Wilkins, Comput. Biol.
[34] Z.H. Wang, D. Shen, D.W. Parsons, A. Bardelli, J. Sager, S. Szabo, Chem. 29 (2005) 294.
J. Ptak, N. Silliman, B.A. Peters, M.S. van der Heijden, [66] J.J. Jones, S. Borgmann, C.L. Wilkins, R.M. OÕBrien, Anal. Chem.
G. Parmigiani, H. Yan, T.L. Wang, G. Riggins, S.M. Powell, 78 (2006) 3062.
J.K.V. Willson, S. Markowitz, K.W. Kinzler, B. Vogelstein, [67] J.P.A. Baak, J. Pathol. 198 (2002) 277.
V.E. Velculescu, Science (Washington, DC) 304 (2004) 1164. [68] P.A. Hall, J.J. Going, Histopathology 35 (1999) 489.

1056 http://www.elsevier.com/locate/trac

You might also like