Professional Documents
Culture Documents
Andrew E. Bruno
Center for Computational Research, University at Buffalo, Buffalo, New York, United States of America.
Patrick Charbonneau
Department of Chemistry, Duke University, Durham, North Carolina, USA and
Department of Physics, Duke University, Durham, North Carolina, USA
Janet Newman
Collaborative Crystallisation Centre CSIRO, Parkville, Victoria, Australia.
arXiv:1803.10342v2 [q-bio.BM] 26 May 2018
Edward H. Snell
Hauptman-Woodward Medical Research Institute and SUNY Buffalo,
Department of Materials, Design, and Innovation,
Buffalo, New York 14203, United States of America.
Christopher J. Watkins
IM&T Scientific Computing, CSIRO, Clayton South, Victoria, Australia
Shawn Williams
Platform Technology and Sciences, GlaxoSmithKline Inc.,
Collegeville, Pennsylvania, United States of America.
Julie Wilson
Department of Mathematics, University of York, York, United Kingdom.
(Dated: May 29, 2018)
The Machine Recognition of Crystallization Outcomes (MARCO) initiative has assembled roughly
half a million annotated images of macromolecular crystallization experiments from various sources
and setups. Here, state-of-the-art machine learning algorithms are trained and tested on different
parts of this data set. We find that more than 94% of the test images can be correctly labeled,
irrespective of their experimental origin. Because crystal recognition is key to high-density screening
and the systematic analysis of crystallization experiments, this approach opens the door to both
industrial and fundamental research applications.
Author summary: Protein crystal growth experiments are routinely imaged, but the mass of
accumulated data is difficult to manage and analyze. Using state-of-the-art machine learning
algorithms on a large and diverse set of reference images, we manage to recapitulate the labels of
a remarkably large fraction of the set. This automation should enable a number of industrial and
fundamental applications.
commercial crystallization screens have been developed; scores to the same image on different occasions during
each screen generally contains 96 formulations designed the study, with the average agreement rate for scores on
to promote crystal growth. Whether these screens are a control set at the beginning and middle of the study
equally effective or not [5, 6] remains debated, but their being 77%, rising to 84% for the agreement in scores be-
overall yield is in any case paltry. Typically fewer than 5% tween the middle and end of the study. Crystallographers
of crystallization attempts produce useful results (with a also tend to be optimistically biased when scoring their
success rate as low as 0.2% in some contexts [7]). own experiments [16]. A better use of expert time and
The practical solution to this hurdle has been to in- attention would be to focus on scientific inquiry.
crease the convenience and number of crystallization tri- An algorithm that could analyze images of drops, dis-
als. To offset the expense of reagents and scientist time, tinguish crystals from trivial outcomes, and reduce the
labs routinely employ industrial robotic liquid handlers, effort spent cataloging failure, would present clear value
nanoliter-size drops, and record trial outcomes using auto- both to the discipline and to industry. Ideally, such an
mated imaging systems [5, 8–11]. Hoping to compensate algorithm would act like an experienced crystallographer
for the rarity of crystallization, commercially available in:
systems readily probe a large area of chemical space with
• recognizing macromolecular crystals appropriate for
minimal sample volume with a throughput of ∼ 1000
diffraction experiments;
individual experiments per hour.
While liquid handling is readily automated, crystal • recognizing outcomes that, while requiring optimiza-
recognition is not. Imaging systems may have made view- tion, would lead to crystals for diffraction experi-
ing results more comfortable than bending over a micro- ments;
scope, but crystallographers still manually inspect images
and/or drops, looking for crystals or, more commonly, • recognizing non-macromolecular crystals;
conditions that are likely to produce good crystals when • ignoring technical failures;
optimized. This human cost makes crystal recognition a
key experimental bottleneck within the larger challenge • identifying non-crystalline outcomes that require
of crystallizing biomolecules [7]. A typical experiment follow up;
for a given sample includes four 96-well screens at two
temperatures, i.e., 768 conditions (and can have up to • being agnostic as to the imaging platform used;
twice that [12]). Assuming that it takes 2 seconds to man- • being indefatigable and unbiased;
ually scan a droplet (and noting that the scans have to
be repeated, as crystallization is time dependent), simply • occurring in a time frame that does not impede the
looking at a single set of 96 trials over the lifetime of process;
an experiment can take the better part of an hour [13].
• learning from experience.
For the sake of illustration, the U.S. Structural Science
group at GlaxoSmithKline performs ∼ 1200 96-well ex- Such an algorithm would further reduce the variance in the
periments per year. If the targeted observation schedule assessments, irrespective of its accuracy. A high-variance,
were rigorously followed, the group would spend a quarter manual process is not conducive to automating the quality
of the year staring at drops, of which the vast majority control of the system end-to-end, including the imaging
contains no crystal. Recording outcomes and analyzing equipment. Enhanced reproducibility enables traceability
the results of the 96 trials would further increase the time of the outcomes, and paves the way for putting in place
burden. Current operations are already straining existing measurable, continuous improvement processes across the
resources, and the approach simply does not scale for entire imaging chain.
proposed higher-density screening [10]. Automated crystallization image classifications that
Crystal growth is also sufficiently uncommon that the attempt to meet the above criteria have been previously
tolerance for false negatives is almost nil. Yet most crystal- attempted. The research laboratories that first automated
lographers are misguided in thinking that they themselves crystallization inspection quickly realized that image anal-
would never miss identifying a crystal given an image con- ysis would be a huge problem, and concomitantly devel-
taining an crystal, or indeed miss a crystal in a droplet oped algorithms to interpret them [17–20]. None of these
viewed directly under a microscope [14]. In fact, not only programs was ever widely adopted. This may have been
do crystallographers miss crystals due to lack of atten- due in part to their dependence on a particular imag-
tion through boredom, they often disagree on the class ing system, and to the relatively limited use of imaging
an image should be assigned to. An overall agreement systems at the time. Many of the early image analysis
rate of ∼ 70% was found when the classes assigned to programs further required very time consuming collation
1200 images by 16 crystallographers were compared [14]. of features and significant preprocessing, e.g., drop seg-
(When considering only crystalline outcomes, agreement mentation to locate the experimental droplet within the
rose to ∼ 93%.) Consistency in visual scoring was also image. To the best of our knowledge, there was also no
considered by Snell et al. when compiling a ∼ 150, 000 widespread effort to make a widely available image anal-
image dataset [15]. They found that viewers give different ysis package in the same way that that the diffraction
3
oriented programs have been organized, e.g., the CCP4 images not obviously falling into the three major classes,
package [21]. and came to assume a functional significance as the classi-
Can a better algorithm be constructed and trained? In fication process was further investigated. Examination of
order to help answer this question, the Machine Recog- the least classifiable five percent of images indeed revealed
nition of Crystallization Outcomes (MARCO) initiative many instances of process failure, such as dispensing er-
was set up [22]. MARCO assembled a set of roughly rors or illumination problems. These uninterpretable
half a million classified images of crystallization trials images were then labelled as “other” during the rescoring,
through an international collaboration with five separate which added an element of quality control to the overall
institutions. Here, we present a machine-learning based process [25].
approach to categorize these images. Remarkably, the
algorithm we employ manages to obtain an accuracy ex-
ceeding 94%, which is even above what was once thought Relabeling
possible for human categorization. This suggests that a
deployment of this technology in a variety of laboratory After a first baseline system was trained (see Sec. III),
settings is now conceivable. The rest of this paper is as the 5% of the images that were most in disagreement
follows. Section II describes the dataset and the scoring with the classifier (independently of whether the image
scheme, Sec. III describes the machine-learning model and was in the training or the test set), were relabeled by one
training procedure, Secs. IV and V describe and discuss expert, in order to obtain a systematic eye on the most
the results, respectively, and Sec. VI briefly concludes. problematic images.
Because no rules were established and no exemplars
were circulated prior to the initial scoring, individual
viewpoints varied on classifying certain outcomes. For
II. MATERIAL AND METHODS example, the bottom 5% contained many instances of
phase separation, where the protein forms oil droplets or
Image Data an oily film that coats the bottom of the crystallization
well. Images were found to be inconsistently scored as
The MARCO data set used in this study contains “clear”, “precipitate”, or “other” depending on the amount
493,214 scored images from five institutions (See Ta- and visibility of the oil film. This example highlights the
ble I [22]). The images were collected from imagers difficulty of scoring experimental outcomes beyond crystal
made from two different manufacturers (Rigaku Automa- identification. A more serious source of ambiguity arises
tion and Formulatrix), which have different optical sys- from process failure. Many of the problematic images did
tems, as well as by the in-house imaging equipment not capture experimental results at all. They were out of
built at the Hauptman-Woodward Medical Research Insti- focus, dark, overexposed, dropless, etc. Whatever labeling
tute (HWMRI) High-Throughput Crystallization Center convention was initially followed, for the relabeling the
(HTCC). Different versions of the setups were also used – “other” category was deemed to also diagnose problems
some Rigaku images are collected with a true color camera, with the imaging process.
some are collected as greyscale images. The zoom extent A total of 42.6% of annotations for the images that
varies, with some imagers set up to collect a field-of-view were revisited disagreed with the original label, suggesting
(FOV) of only the experimental droplet, and some set for somewhat high (1 to 2%) label noise in this difficult
the FOV to encompass a larger area of the experimental fraction of the dataset. For a fraction of this data, multiple
setup. The Rigaku and Formulatrix automation imaged raters were asked to label the images independently and
vapor diffusion based experiments while the HTCC sys- had an inter-rater disagreement rate of approximately
tem imaged microbatch-under-oil experiments. A random 22%. The inherent difficulty of assigning a label to a
selection of 50,284 test images was held out for validation. small fraction of the images is therefore consistent with
Images in the test set were not represented in the training the results of Ref. [14]. Table II shows the final image
set. The precise data split is available from the MARCO counts after relabeling.
website [22].
TABLE I. Breakdown of data sources and imaging technology per institution contributing to MARCO.
Institution Technical Setup # of Images
Bristol-Myers Squibb Formulatrix Rock Imager (FRI) 8719
CSIRO Sitting drop, FRI, Rigaku Minstrel [23, 24] 15933
HWMRI Under oil, Home system [15] 79632
GlaxoSmithKline Sitting drop, FRI 83126
Merck Sitting drop, FRI 305804
Model Architecture
reference implementation.
TABLE IV. Confusion Matrix. Fraction of the test TABLE VI. Validation at C3 Precision, recall and
data that is assigned to each class based on the posterior accuracy from an independent set of images collected
probability assigned by the classifier. For instance, 0.8% after the MARCO tool was developed. The 38K images
of images labeled as Precipitate in the test set were of sitting drop trials were collected between January 1
classified as Crystals. and March 30, 2018 on two Formulatrix Rock Imager
True Predictions (FRI) instruments.
Label Crystals Precipitate Clear Other DL tool Precision Recall Accuracy
Crystals 91.0% 5.8% 1.7% 1.5% DeepCrystal 0.4928 0.4520 0.7391
Precipitate 0.8% 96.1% 2.3% 0.7% MARCO 0.7777 0.8663 0.9018
Clear 0.2% 1.8% 97.9% 0.2%
Other 4.8% 19.7% 5.9% 69.6%
Pixel Attribution
TABLE V. Standard Deviation of the predictions We visually inspect to what parts of the image the
across data sources. Note in particular the large classifier learns to attend by aggregating noisy gradients
variability in the consistency of the label ’Other’ across of the image with respect to its label on a per-pixel basis.
datasets, which leads to comparatively poor selectivity of The SmoothGrad [47] approach is used to visualize the
that less well-defined class. focus of the classifier. The images in Fig. 4 are constructed
True Predictions by overlaying a heat map of the classifier’s attention over
Label Crystals Precipitate Clear Other a grayscale version of the input image.
Crystals 5% 4% 1% 1% Note that saliency methods are imperfect and do not
Precipitate 2% 4% 1% 2% in general weigh faithfully all the evidence present in an
Clear 1% 3% 5% 1% image according to their contributions to the decision, es-
Other 7% 15% 6% 21% pecially when the evidence is highly correlated. Although
these visualizations paint a simplified and very partial
picture of the classifier’s decision mechanisms, they help
confirm that it is likely not picking up and overfitting to
set to another is fairly small, about 5% (see Table V), cues that are irrelevant to the task.
compared to the overall performance of the classifier.
The classifier outputs a posterior probability for each
class. By varying the acceptance threshold for a proposed Inference and Availability
classification, one can trade precision of the classifica-
tion against recall. The receiver operating characteristic The model is open-sourced and available online at [48].
(ROC) curves can be seen in Fig. 3. It can be run locally using TensorFlow or TensorFlow Lite,
or as a Google Cloud Machine Learning [49] endpoint. At
time of writing, inference on a standard Cloud instance
takes approximately 260ms end-to-end per standalone
query. However, due to the very efficient parallelism
properties of convolutional networks, latency per image
Validation
can be dramatically cut down for batch requests.
(a) (b)
Fig 3. Receiver Operating Characteristic Curves. (Q) Percentage of the correctly accepted detection of
crystals as a function of the percentage of incorrect detections (AUC: 98.8). 98.7% of the crystal images can be
recalled at the cost of less than 19% false positives. Alternatively, 94% of the crystals can be retrieved with less than
1.6% false positives. (B) Percentage of the correctly accepted detection of precipitate as a function of the percentage
of incorrect detections (AUC: 98.9). 99.6% of the precipitate images can be recalled at the cost of less than 25% false
positives. Alternatively, 94% of the precipitates can be retrieved with less than 3.4% false positives.
of interest, to be used as variables in the classification 54, 58], in which missed crystals are reported with much
algorithms. lower rates. This advance comes at the expense of more
false positives. For example, Pan et al. report just under
The results from various studies can be difficult to 3% false negatives, but almost 38% false positives [55].
compare head-to-head because different groups present
confusion matrices with the number of classes ranging As the trained algorithms are specific to a set of im-
from 2 to 11, only sometimes aggregating results for ages, they are also restricted to a particular type of crys-
crystals/crytalline materials. There is also a tradeoff tallisation experiment. Prior to the curation of the cur-
between the number of false negatives and the number rent dataset, the largest set of images (by far) came
of false positives. Yet most report classification rates from the Hauptman-Woodward Medical Research Insti-
for crystals around 80-85% even in more recent work [8, tute HTCC [15]. This dataset, which contains 147,456
8
images from 96 different proteins but is limited to ex- across a large range of conditions, and trained a CNN
periments with the microbatch-under-oil technique, has on the labels of these images. Remarkably, the resulting
been used in a number of studies [57, 59]. Most notably, machine-learning scheme was able to recapitulate the la-
Yann et al. used a deep convolutional neural network bels of more than 94% of a test set. Such accuracy has
that automatically extracted features, and reported a rarely been obtained, and has no equal for an uncurated
correct classification rates as high as 97% for crystals and dataset. The analysis also identified a small subset of
96% for non-crystals. Although impressive, these results problematic images, which upon reconsideration revealed
were however obtained from a curated subset of 85,188 a high level of label discrepancy. This variability inher-
clean images, i.e., images with class labels on which sev- ent to using human labeling highlights one of the main
eral human experts agreed [57]. In order to validate our benefits of automatic scoring. Such accuracy also make
approach, we retrained our model to perform the same 10- conceivable high-density screening.
way classification on that subset of the data alone without Enhancing the imaging capabilities by including UV or
any tuning of the model’s hyperparameters and achieved SONICC results, for instance, could certainly enrich the
94.7% accuracy, compared to the reported 90.8%. model. But several research avenues could also be pursued
In this context, the current results are especially re- without additional laboratory equipment. In particular,
markable. A crystallographer can classify images of exper- it should be possible to leverage side information that is
iments independently of the systems used to create those currently not being used.
images. They can view an experiment with a microscope
• The four-way classification scheme used is a distilla-
or look at a computer image and reach similar conclu-
tion of 38 categories which are present in the source
sions. They can look at a vapor diffusion experiment or a
data. While these categories are presumed to be
microbatch-under-oil setup and, again, asses either with
somewhat inconsistent across datasets, they could
confidence. Here, we show that this can be accomplished
potentially provide an additional supervision signal.
equally well, if not better, using deep CNNs. A benchtop
researcher can classify many images, especially if they • Because one goal of this classifier is to be able to
relate to a project that has been years in the making. For generalize across datasets, it would be worthwhile
high-throughput approaches, however, that task becomes to investigate the contribution of techniques that
challenging. The strength of computational approaches is have been designed to specifically reduce the effect of
that each image is treated like the previous one, with no domain shift across data sources on the classification
fatigue. Classification of 10,000 images is as consistent outcomes [60, 61].
as classification of one. This advance opens the door for
complete classification of all results in a high-throughput • Each crystallization experiment records a series of
setting and for data mining of repositories of past image images taken over times. Using the timecourse
data. information could enhance the success rate of the
Another remarkable aspect of our results is that they classifier [62].
leverage a very generic computer vision architecture orig- Note in closing that the current study focused on crys-
inally designed for a different classification problem – tallization as an outcome, which is but a small fraction
categorization of natural images – with very distinct char- of the protein solubility diagram. Patterns of precipi-
acteristics. For instance, one can presume that the global tation, phase separation, and clear drops, also provide
geometric relationships between object parts would play information as to whether and where crystallization might
a greater role in identifying a car or a dog in an image, occur. The success in identifying crystals, precipitate and
compared to the very local, texture-like features involved clear can be thus also be used to accurately chart the
in recognizing crystal-like structures. Yet no particu- crystallization regimes and to identify pathways for opti-
lar specialization of the model was required to adapt it mization [59, 63, 64]. The application of this approach to
to the widely differing visual appearances of the sam- large libraries of historical data may therefore reveal pat-
ples originating from different imagers. This convergence terns that guide future crystallization strategies, including
of approaches toward a unified perception architecture novel chemical screens and mutagenesis programs.
across a wide range of computer vision problems has been
a common theme in recent years, further suggesting that
the technology is now ready for wide adoption for any ACKNOWLEDGMENTS
human-mediated visual recognition task.
We acknowledge discussions at various stages of this
project with I. Altan, S. Bowman, R. Dorich, D. Fusco,
E. Gualtieri, R. Judge, A. Narayanaswamy, J. Noah-
VI. CONCLUSION
Vanhoucke, P. Orth, M. Pokross, X. Qiu, P. F. Riley,
V. Shanmugasundaram, B. Sherborne and F. von Delft.
In this work, we have collated biomolecular crystal- PC acknowledges support from National Science Founda-
lization images for nearly half a million of experiments tion Grant no. NSF DMR-1749374.
9
[1] S Harrison, B Lahue, Z Peng, A Donofrio, C Chang, and [15] Edward H. Snell, Joseph R. Luft, Stephen A. Potter,
M Glick, “Extending predict first to the design make- Angela M. Lauricella, Stacey M. Gulde, Michael G.
test cycle in small-molecule drug discovery,” Future Med. Malkowski, Mary Koszelak-Rosenblum, Meriem I. Said,
Chem. 9, 533536 (2017). Jennifer L. Smith, Christina K. Veatch, Robert J. Collins,
[2] Alexander McPherson, Crystallization of Biological Geoff Franks, Max Thayer, Christian Cumbaa, Igor Ju-
Macromolecules (CSHL Press, Cold Spring Harbor, 1999) risica, and George T. DeTitta, “Establishing a training
p. 586. set through the visual analysis of crystallization trials.
[3] Naomi E. Chayen, “Turning protein crystallisation from part i: 1̃50 000 images,” Acta Cryst. D 64, 1123–1130
an art into a science,” Curr. Opin. Struct. Biol. 14, 577– (2008).
583 (2004). [16] David Hargreaves, Personal communication.
[4] Diana Fusco and Patrick Charbonneau, “Soft matter per- [17] Glen Spraggon, Scott A. Lesley, Andreas Kreusch, and
spective on protein crystal assembly,” Colloids Surf. B: John P. Priestle, “Computational analysis of crystalliza-
Biointerfaces 137, 22–31 (2016). tion trials,” Acta Cryst. D 58, 1915–1923 (2002).
[5] J.T. Ng, C. Dekker, P. Reardon, and F. von Delft, [18] Christian Cumbaa and Igor Jurisica, “Automatic classifi-
“Lessons from ten years of crystallization experiments at cation and pattern discovery in high-throughput protein
the SGC,” Acta Cryst. D 72, 224–235 (2016). crystallization trials,” J. Struct. Funct. Genomics 6, 195–
[6] V.J. Fazio, T.S. Peat, and J. Newman, “Lessons for the 202 (2005).
future,” Methods Mol. Biol. 1261, 141–156 (2015). [19] Kuniaki Kawabata, Kanako Saitoh, Mutsunori Takahashi,
[7] Janet Newman, Evan E. Bolton, Jochen Muller- Hajime Asama, Taketoshi Mishima, Mitsuaki Sugahara,
Dieckmann, Vincent J. Fazio, D. Travis Gallagher, David and Masashi Miyano, “Evaluation of protein crystalliza-
Lovell, Joseph R. Luft, Thomas S. Peat, David Ratcliffe, tion state by sequential image classification,” Sensor Rev.
Roger A. Sayle, Edward H. Snell, Kerry Taylor, Pascal 28, 242–247 (2008).
Vallotton, Sameer Velanker, and Frank von Delft, “On [20] Samarasena Buchala and Julie C. Wilson, “Improved clas-
the need for an international effort to capture, share and sification of crystallization images using data fusion and
use crystallization screening data,” Acta Cryst. F 68, multiple classifiers,” Acta Cryst. D 64, 823–833 (2008).
253–258 (2012). [21] Martyn D. Winn, Charles C. Ballard, Kevin D. Cowtan,
[8] Yulia Kotseruba, Christian A Cumbaa, and Igor Jurisica, Eleanor J. Dodson, Paul Emsley, Phil R. Evans, Ronan M.
“High-throughput protein crystallization on the world com- Keegan, Eugene B. Krissinel, Andrew G. W. Leslie, Air-
munity grid and the gpu,” J. Phys. Conf. Ser. 341, 012027 lie McCoy, Stuart J. McNicholas, Garib N. Murshudov,
(2012). Navraj S. Pannu, Elizabeth A. Potterton, Harold R. Pow-
[9] Janet Newman, “One plate, two plates, a thousand plates. ell, Randy J. Read, Alexei Vagin, and Keith S. Wilson,
how crystallisation changes with large numbers of sam- “Overview of the ccp4 suite and current developments,”
ples,” Methods 55, 73 – 80 (2011). Acta Cryst. D 67, 235–242 (2011).
[10] Shuheng Zhang, Charline J.J. Gerard, Aziza Ikni, Gilles [22] “MAchine Recognition of Crystallization Outcomes
Ferry, Laurent M. Vuillard, Jean A. Boutin, Nathalie (MARCO),” (2017), https://marco.ccr.buffalo.edu
Ferte, Romain Grossier, Nadine Candoni, and Stphane [Online; accessed 17-March-2018].
Veesler, “Microfluidic platform for optimization of crys- [23] Pascal Vallotton, Changming Sun, David Lovell, Vin-
tallization conditions,” J. Cryst. Growth 472, 18 – 28 cent J. Fazio, and Janet Newman, “Droplit, an improved
(2017), industrial Crystallization and Precipitation in image analysis method for droplet identification in high-
France (CRISTAL-8), May 2016, Rouen (France). throughput crystallization trials,” J. Appl. Crystallogr.
[11] Y. Thielmann, J. Koepke, and H. Michel, “The esfri in- 43, 1548–1552 (2010).
struct core centre frankfurt: Automated high-throughput [24] N. Rosa, M. Ristic, B. Marshall, and J. Newman, “Keep-
crystallization suited for membrane proteins and more,” ing crystallographers app-y,” Acta Cryst. F submitted.
J. Struct. Funct. Genomics 13, 63–69 (2012). [25] Katarina Mele, Rongxin Li, Vincent J. Fazio, and Janet
[12] Edward H. Snell, Angela M. Lauricella, Stephen A. Potter, Newman, “Quantifying the quality of the experiments
Joseph R. Luft, Stacey M. Gulde, Robert J. Collins, Geoff used to grow protein crystals: the iqc suite,” J. Appl.
Franks, Michael G. Malkowski, Christian Cumbaa, Igor Cryst. 47, 1097–1106 (2014).
Jurisica, and George T. DeTitta, “Establishing a training [26] Yann LeCun, Bernhard Boser, John S Denker, Donnie
set through the visual analysis of crystallization trials. Henderson, Richard E Howard, Wayne Hubbard, and
part II: Crystal examples,” Acta Cryst. D 64, 1131–1137 Lawrence D Jackel, “Backpropagation applied to hand-
(2008). written zip code recognition,” Neural computation 1, 541–
[13] This estimate is based on personnal communication with 551 (1989).
five experienced crystallographers at GlaxoSmithKline: 2 [27] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, “Deep
seconds/observation × 8 observations × 96 wells. Note learning,” Nature 521, 436 (2015).
that current technology can automatically store and image [28] Waseem Rawat and Zenghui Wang, “Deep convolutional
plates at about 3 min/plate. neural networks for image classification: A comprehensive
[14] Julie Wilson, “Automated classification of images from review,” Neural Comput. 29, 2352–2449 (2017).
crystallisation experiments,” in Advances in Data Mining. [29] A Berg, J Deng, and L Fei-Fei, “Large scale visual
Applications in Medicine, Web Mining, Marketing, Image recognition challenge (ILSVRC),” (2010) http://www.
and Signal Mining, edited by Petra Perner (Springer Berlin image-net.org/challenges/LSVRC.
Heidelberg) pp. 459–473.
10
[30] Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Ar- houcke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals,
naud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Pete Warden, Martin Wattenberg, Martin Wicke, Yuan
Ghafoorian, Jeroen AWM van der Laak, Bram van Gin- Yu, and Xiaoqiang Zheng, “TensorFlow: Large-scale
neken, and Clara I Sánchez, “A survey on deep learning machine learning on heterogeneous systems,” (2015).
in medical image analysis,” Med. Image Anal. 42, 60–88 [43] Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen,
(2017). Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker,
[31] Christof Angermueller, Tanel Pärnamaa, Leopold Parts, Ke Yang, Quoc V Le, et al., “Large scale distributed deep
and Oliver Stegle, “Deep learning for computational biol- networks,” in Advances in neural information processing
ogy,” Mol. Syst. Biol. 12, 878 (2016). systems (2012) pp. 1223–1231.
[32] Jonathan Krause, Varun Gulshan, Ehsan Rahimy, Peter [44] Tijmen Tieleman and Geoffrey Hinton, “Lecture 6.5-
Karth, Kasumi Widner, Greg S Corrado, Lily Peng, and rmsprop: Divide the gradient by a running average of
Dale R Webster, “Grader variability and the importance of its recent magnitude,” COURSERA: Neural networks for
reference standards for evaluating machine learning mod- machine learning 4, 26–31 (2012).
els for diabetic retinopathy,” arXiv:1710.01711 [cs.CV] [45] Chris Watkins, “C4, C3 Classifier Pipeline. v1. CSIRO.
(preprint) (2017). Software Collection.” (2018), https://doi.org/10.
[33] Yun Liu, Krishna Gadepalli, Mohammad Norouzi, 4225/08/5a97375e6c0aa [Online; accessed 09-May-2018].
George E Dahl, Timo Kohlberger, Aleksey Boyko, Sub- [46] “DeepCrystal,” (2017), http://www.deepcrystal.com
hashini Venugopalan, Aleksei Timofeev, Philip Q Nelson, [Online; accessed 09-May-2018].
Greg S Corrado, et al., “Detecting cancer metastases on [47] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda
gigapixel pathology images,” arXiv:1703.02442 [cs.CV] Viégas, and Martin Wattenberg, “Smoothgrad: Re-
(preprint) (2017). moving noise by adding noise,” arXiv:1706.03825 [cs.LG]
[34] David E Rumelhart, Geoffrey E Hinton, and Ronald J (preprint) (2017).
Williams, “Learning representations by back-propagating [48] Vincent Vanhoucke, “Marco repository in Tensor-
errors,” Nature 323, 533 (1986). Flow Models,” (2018), http://github.com/tensorflow/
[35] Léon Bottou, “Large-scale machine learning with stochas- models/tree/master/research/marco [Online; accessed
tic gradient descent,” in Proceedings of COMPSTAT’2010 01-May-2018].
(Springer, 2010) pp. 177–186. [49] “Google Cloud Machine Learning Engine,” (2018), [On-
[36] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon line; accessed 02-February-2018].
Shlens, and Zbigniew Wojna, “Rethinking the incep- [50] Christian A. Cumbaa, Angela Lauricella, Nancy Fehrman,
tion architecture for computer vision,” in Proceedings of Christina Veatch, Robert Collins, Joe Luft, George De-
the IEEE Conference on Computer Vision and Pattern Titta, and Igor Jurisica, “Automatic classification of
Recognition (2016) pp. 2818–2826. sub-microlitre protein-crystallization trials in 1536-well
[37] Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, plates,” Acta Cryst. D 59, 1619–1627 (2003).
and Alexander A Alemi, “Inception-v4, inception-resnet [51] Kanako Saitoh, Kuniaki Kawabata, Hajime Asama, Take-
and the impact of residual connections on learning.” toshi Mishima, Mitsuaki Sugahara, and Masashi Miyano,
arXiv:1602.07261 [cs.CV] (preprint) (2017). “Evaluation of protein crystallization states based on tex-
[38] Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and ture information derived from greyscale images,” Acta
Quoc V Le, “Learning transferable architectures for Cryst. D 61, 873–880 (2005).
scalable image recognition,” arXiv:1707.07012 [cs.CV] [52] Marshall Bern, David Goldberg, Raymond C. Stevens,
(preprint) (2017). and Peter Kuhn, “Automatic classification of protein
[39] Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, crystallization images using a curve-tracking algorithm,”
Greg Kochanski, John Karro, and D Sculley, “Google J. Appl. Cryst. 37, 279–287 (2004).
vizier: A service for black-box optimization,” in Proceed- [53] R. Liu, Y. Freund, and G. Spraggon, “Image-based crystal
ings of the 23rd ACM SIGKDD International Conference detection: a machine-learning approach.” Acta Cryst. D
on Knowledge Discovery and Data Mining (ACM, 2017) 64, 1187–95 (2008).
pp. 1487–1495. [54] Christian A. Cumbaa and Igor Jurisica, “Protein crystal-
[40] Nathan Silberman and Sergio Guadarrama, “TensorFlow- lization analysis on the world community grid,” J. Struct.
Slim image classification model library,” (2017), Funct. Genomics 11, 61–69 (2010).
https://github.com/tensorflow/models/tree/ [55] Shen Pan, Gidon Shavit, Marta Penas-Centeno, Dong-Hui
master/research/slim [Online; accessed 02-February- Xu, Linda Shapiro, Richard Ladner, Eve Riskin, Wim
2018]. Hol, and Deirdre Meldrum, “Automated classification
[41] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya of protein crystallization images using support vector
Sutskever, and Ruslan Salakhutdinov, “Dropout: A sim- machines with scale-invariant texture and gabor features,”
ple way to prevent neural networks from overfitting,” J. Acta Cryst. D 62, 271–279 (2006).
Mach. Learn. Res. 15, 1929–1958 (2014). [56] M.J. Po and A.F. Laine, “Leveraging genetic algorithm
[42] Martı́n Abadi, Ashish Agarwal, Paul Barham, Eugene and neural network in automated protein crystal recog-
Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, nition,” in Proceedings of the 30th Annual International
Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghe- Conference of the IEEE Engineering in Medicine and Biol-
mawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, ogy Society, EMBS’08 - ”Personalized Healthcare through
Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Technology” (2008) pp. 1926–1929.
Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, [57] Margot Lisa-Jing Yann and Yichuan Tang, “Learning
Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, deep convolutional neural networks for x-ray protein crys-
Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya tallization image analysis,” in Thirtieth AAAI Conference
Sutskever, Kunal Talwar, Paul Tucker, Vincent Van- on Artificial Intelligence (2016).
11
[58] Jeffrey Hung, John Collins, Mehari Weldetsion, Oliver separation networks,” in Advances in Neural Information
Newland, Eric Chiang, Steve Guerrero, and Kazunori Processing Systems (2016) pp. 343–351.
Okada, “Protein crystallization image classification with [62] Katarina Mele, B. M. Thamali Lekamge, Vincent J. Fazio,
elastic net,” in SPIE Medical Imaging, Vol. 9034 (SPIE) and Janet Newman, “Using time courses to enrich the
p. 14. information obtained from images of crystallization trials,”
[59] Diana Fusco, Timothy J. Barnum, Andrew E. Bruno, Cryst. Growth Des 14, 261269 (2014).
Joseph R. Luft, Edward H. Snell, Sayan Mukherjee, and [63] Edward H. Snell, Ray M. Nagel, Ann Wojtaszcyk, Hugh
Patrick Charbonneau, “Statistical analysis of crystal- O’Neill, Jennifer L. Wolfley, and Joseph R. Luft, “The
lization database links protein physico-chemical features application and use of chemical space mapping to interpret
with crystallization mechanisms,” PLoS ONE 9, e101123 crystallization screening results,” Acta Cryst. D 64, 1240–
(2014). 1249 (2008).
[60] Yaroslav Ganin and Victor Lempitsky, “Unsupervised [64] Irem Altan, Patrick Charbonneau, and Edward H. Snell,
domain adaptation by backpropagation,” in International “Computational crystallization,” Arch. Biochem. Biophys.
Conference on Machine Learning (2015) pp. 1180–1189. 602, 1220 (2016).
[61] Konstantinos Bousmalis, George Trigeorgis, Nathan Sil-
berman, Dilip Krishnan, and Dumitru Erhan, “Domain