You are on page 1of 64

10 Reasons to Crowdsource Science

Adrien Treuille
Carnegie Mellon University

Protein Folding

http://martin-protean.com/protein-structure.html

Protein Folding
MSFQGHGIY YIYTRLALS AYVANTRL
Amino Acid Sequence Protein Shape

Key to understanding life. Huge computational resources.

Protein Folding

RNA Nanoengineering
GCUAGGCUA AUACGAUAC CAACATGA
Nucleotide Sequence Target RNA Shape

Next-generation Catalysts Drug-responsive Control Elements

RNA Nanoengineering

Game Interface

Voting

Results

Synthesis

RNA Nanoengineering

Crowdsourcing the Scienti c Method

Crowdsourcing Science

Launched 2008

Launched 2011

Protein Folding
57,000 Players

RNA Nanoengineering
25,000 Players

Computational Chemistry

Experimental Chemistry

Crowdsourcing Science

Scientists

Problem

Game

Players

Crowdsourcing Science

Scientists

Problem

Players

Game

10 Reasons to Crowdsource Science

10 Reasons to Crowdsource Science

#1 Games Make Us Understand

#1 Games Make Us Understand

Foldit

EteRNA

BioClipse

http://chem-bla-ics.blogspot.com/2006/04/ protein-support-in-bioclipse-using.html

#1 Games Make Us Understand

Pull/Bands

Lock

Wiggle

Shake

Rebuild

Tweak

#1 Games Make Us Understand

Repulsive

Attractive

Solvation

Hydrogen Bonds

Issue Analysis

#1 Games Make Us Understand

Foldit

EteRNA

#1 Games Make Us Understand

Interactive Biology

10 Reasons to Crowdsource Science


#1 Games Make Us Understand

#2 Humans Solve Hard Problems

#2 Humans Solve Hard Problems


Native Conformation Best Computer Solution Best Player Solution

#2 Humans Solve Hard Problems


Native Conformation

PhD

Best Computer Solution Best Player Solution

#2 Humans Solve Hard Problems


Native Conformation Starting Positioin Best Player Solution

Predicting protein structures with a multiplayer online game.


Nature Vol 466, 5 August 2010.

#2 Humans Solve Hard Problems

Target Shape

#2 Humans Solve Hard Problems


Player Designs

EteRNA Score: 96%

Ding's Round 2 Bulged Star by Ding

Starry's Bulged Star III by starryjess

EteRNA Score: 94%

EteRNA Score: 94%

Mat - Bulged star v1.1 by mat747

Computer Designs

EteRNA Score: 76%

ViennaRNA Design 03 by ViennaRNA Bot

EteRNA Score: 75%

ViennaRNA Design 05 by ViennaRNA Bot

EteRNA Score: 73%

ViennaRNA Design 02 by ViennaRNA Bot

#2 Humans Solve Hard Problems


Player Designs

EteRNA Score: 96%

Ding's Round 2 Bulged Star by Ding

Starry's Bulged Star III by starryjess

EteRNA Score: 94%

EteRNA Score: 94%

Mat - Bulged star v1.1 by mat747

Computer Designs

EteRNA Score: 76%

ViennaRNA Design 03 by ViennaRNA Bot

EteRNA Score: 75%

ViennaRNA Design 05 by ViennaRNA Bot

EteRNA Score: 73%

ViennaRNA Design 02 by ViennaRNA Bot

10 Reasons to Crowdsource Science


#1 Games Make Us Understand #2 Humans Solve Hard Problems

#3 Its a Total Rush!

#3 Its a Total Rush!


Player: Wow, you sure know a lot about Foldit! Engineer: Thank you. Actually, I was one of the programmers. Player: Really? Engineer: Yes. Player: You are a god.

10 Reasons to Crowdsource Science


#1 Games Make Us Understand #2 Humans Solve Hard Problems

#3 Its a Total Rush!

#4 Human Learning

Com puter Designs Player De signs

ng's Round 2 ulged Star by Ding

A Score: 96%

#4 Human Learning
Starry's Bulged Star III by starryjess
EteRNA Score: 94% EteRNA Score: 96% EteRDing's Round 2 NA S c Viennore: 7Star Bulged 6%

s Rou % nd 2 Bulge d Sta r by D ing

lged % EteRNA Score:II 94% Star I Mat -bBulged star y sta r v1.1 ryjess by mat747

EteRN Starr A Score: 9 y

's Bu

EteRN Mat A Score: 9

sngiseD retupmoC

Computer Designs

sngiseD retupmoC

sngiseD reyalP

D reyalP

degluB - taM 1.1v 747tam yb

49 :erocS ANRetE

ratS degluB s'yrratS III ssejyrrats yb

%49 :erocS ANRetE

2 dnuoR s'gniD ratS degluB gniD yb

Computer Solutions
et ANR E

Player Solutions

ANRanneiV 20 ngiseD ANRanneiV yb

ANRanneiV 50 ngiseD ANRanneiV yb

degluB s'yrratS III sejyrrats yb

37 :erocS ANRetE

49 :erocS ANRetE

57 :erocS ANRetE

Starry's Bulged Star Mat - Bulged star III v1.1 aRNA EteRN by starryjess by mat747 Desiby Ding A Sco gn 03 by Vi Vienn re: 75% enna aRNA RNA Target Shape Bot EteRN Desig A Sco A Score: 76% EteRNA Score: 75% EteRNA n 0 Score: 73% by Vi 5 Vienn re: 73% enna iennaRNA ViennaRNA ViennaRNA aRNA RNA B02 Desig Design 03 Design 05 Design ot n 02 by Vi enna ennaRNA Bot by ViennaRNA Bot by ViennaRNA Bot RNA Bo EteRNA Score: 76% EteRNA Score: 75% EteRNA Score: t 73% ViennaRNA ViennaRNA ViennaRNA Design 03 Design 05 Design 02 by ViennaRNA Bot by ViennaRNA Bot by ViennaRNA Bot

EteRNA Score: 94%

EteRNA Score: 94%

4 - Bul ged s% tar v1.1 by m at747

ANRanneiV 50 ngiseD toB ANRanneiV yb 2 dnuoR s'gniD ratS degluB gniD yb

ANRanneiV 30 ngiseD toB ANRanneiV yb

%57 :erocS ANRetE

%69 :erocS ANRetE

%67 :erocS ANRetE

ANRanneiV 30 ngiseD toB ANRanneiV yb

%69 :erocS ANRetE

%67 :erocS ANRetE

S :eroc aM %49 gluB - t e rats d 1.1v b am y

747t

%37 anneiV ANR iseD 20 ng eiV yb n NRan toB A :eroc S AN RetE

S :eroc rratS %49 B s'y deglu ratS III yb rrats

ssejy

%57 ann ANR is 50 ng NRa toB A etE ANR

S :eroc

#4 Human Learning

Player Solutions Computer Solutions

#4 Human Learning

10 Reasons to Crowdsource Science


#1 Games Make Us Understand #2 Humans Solve Hard Problems #3 Its a Total Rush!

#4 Human Learning

#5 Humans Create Knowledge

#5 Humans Create Knowledge

#5 Humans Create Knowledge


I have been painstakingly going over most, if not all of the new Lab Design Submissions by brand new players. I was chagrined to nd... ...a dozen Christmas Trees were submitted.

Let us not waste even one of our precious few design slots.

#5 Humans Create Knowledge

#5 Humans Create Knowledge


A meta-analysis of one-cross-bulge results
by Alan.Robot
RNAfold POSITIONAL ENTROPY https://docs.google.com/document/pub?id=12h1bwjS1tHRUaG8O01b... RNAfold POSITIONAL ENTROPY https://docs.google.com/document/pub?id=12h1bwjS1tHRUaG8O01b... RNAfold POSITIONAL ENTROPY https://docs.google.com/document/pub?id=12h1bwjS1tHRUaG8O01b...

RNAfold POSITIONAL ENTROPY

The optimal secondary structure in dot-bracket notation with a minimum free energy of -39.80 kcal/mol is given below.
This line should match up with the energy in ETErna, otherwise you have chosen the wrong energy options! From here on out, I will say MFE instead of minimum free energy.

A meta-analysis of one-cross-bulge results


I: positional entropy and what it means
by alan.robot, updated on 2/9 (changed incorrect link to d9s design analysis) last updated 3/14 (fixed dead links to server results, thanks to Chaendryn for re-running them) NOTE: this document is now shared as a web page here that does not require a login, please update your links! Ok, so you are wondering how you can improve your submitted designs using the output from the Vienna RNA suite of programs, which have been confirmed by the devs to be the computational backend for eteRNA. In this tutorial, Ill show you an example of how positional entropy can be used to help predict winning and loosing designs, even before you submit! *disclaimer* Im not affiliated with eteRNA, and although I am a computational biophysicist, Im not a specialist in RNA bioinformatics, so any inaccuracies are my fault alone and not due to eteRNA or its staff.

The frequency of the MFE structure in the ensemble is 80.77 %


Note the very high percentage! This is good. This means of all the possible structures that the server considered likely to occur (including suboptimal folds, NOT just the MFE fold), the majority of them are of the correct fold. Note that when these are synthesized in a lab, you get a test-tube full of these (an ensemble), not just one single molecule, and you need ALL OF THEM (or as close as possible) to fold correctly. Generally, when one says ensemble, one means on the order of Avagadros Number of molecules (thats 23 6.022x10 ), which is ALOT.

The ensemble diversity is 0.44 This is average distance, in number of base-pairs, between structures in the ensemble. So, lower is better, here we see that the remaining 20% only differ by less than one base pair, on average, from the MFE. Thats good! Note there are TWO structures displayed below, the MFE and the centroid. The centroid is exactly what it sounds like, its the middle-of-the-pack structure in the ensemble (again distance is measured in base-pairs). Since the MFE is 80% of the ensemble, the centroid is identical to the MFE, but if that percentage were lower it would not! The structures are colored by default to base pair probability, which is the probability the base is in the structure that you see. They should all be close to 1 for a good structure. But its not the end of the world if one or two base pairs dont form correctly, thats still a win if it doesnt happen very often. If its highly likely that a few base pairs will be off, but it only happens in a few ways that preserve the rest of the MFE structure, it could still win. If its highly likely we have wrong base-pairs forming and there are many ways this can happen without preserving the MFE structure, then we are toast! How do we measure the number of ways the fold is expected to go wrong weighted by how likely it is? ENTROPY, which, in the words of my physics professor, is just a fancy word for the logarithm of the number of ways. You can also think if it as disorder, but how do you count an amount of disorder? Click the box that says positional entropy to see this map:

First things first:

The Vienna RNA servers are here: http://rna.tbi.univie.ac.at/ The Vienna source code is here: http://www.tbi.univie.ac.at/~ivo/RNA/ THIS is a link to a discussion on how to download the sequence files for submitted designs for lab 103 one bulge cross.
I will be referring to output from the web server version for this tutorial, but if you want to do your own analysis of more than 1 sequence at a time youre probably best off compiling and running on your own machine. Its not as hard as it sounds, you do not need to know how to program, but you do need a unix/linux environment to compile in. If you are running windows, I can highly recommend ubuntu running on virtualbox (http://www.virtualbox.org/), both are free software and very user-friendly to set up and use, beats the pants off of Cygwin.

Putting it all together: So now we know, the computer expects 80% of the test tube to fold perfectly, and 20% will have a defect, most likely to occur at the green spots on the picture above. BUT, we also know the average difference between structures is less than 1 bp from the target, so not all of the green spots will be wrong at the same time, they probably occur individually in individual molecules one at a time. So the MFE structure will be preserved, this is a win!

CONTRAST WITH A POOR-SCORING ENTRY


The following entry scored a 65 in round 2:
GGAAAGUAGGAGAUGUUAGUUUGAAAGGAUUGGCCGGUGGUUUGAAAGGGCGAUUGUCUUUAGUGAAAGUUAAAGAGUUUUUUGCAA Ill cut to the chase, heres the output and here is a summary The optimal secondary structure in dot-bracket notation with a minimum free energy of -19.20 kcal/mol is given below. The frequency of the MFE structure in the ensemble is 27.45 %. The ensemble diversity is 3.47

Here is a link to the results when the round 4 winning design by dimension 9, input into the RNAfold webserver. Note I have no idea how long that link will work, so Ill cut and paste the relevant bits here if you want to try and reproduce it. Use default settings except where mentioned below, you have to expand the show advanced options to see them sequence: GGAAGGUUCUCUGGCGUUCGUGAAAACAUGAAUGGGAGGCAUCAAGAGAUGGCUCCGCUUGUUCAAGAGAAUAGGCCCAGAGAGCAAA advanced settings:
unpaired bases can participate in at most one dangling end (MFE folding only)
(yes, for the super observant, this is the rule that lets only one side of a loop get a bonus from adding a red G)

Turner 1999 energy parameters

So now you get a pretty output page with lots of details. What should you pay attention to?

You can see below this structure is not predicted to maintain the central hub, and the bottom arm probably doesnt form correctly. Most of the ensemble is NOT represented by the MFE, and
7/18/11 11:24 AM 2 of 6 7/18/11 11:24 AM 4 of 6 7/18/11 11:24 AM

1 of 6

RNAfold POSITIONAL ENTROPY

https://docs.google.com/document/pub?id=12h1bwjS1tHRUaG8O01b...

RNAfold POSITIONAL ENTROPY

https://docs.google.com/document/pub?id=12h1bwjS1tHRUaG8O01b...

RNAfold POSITIONAL ENTROPY

https://docs.google.com/document/pub?id=12h1bwjS1tHRUaG8O01b...

members of the ensemble differ by 3-4 bp from each other. Note the axis on the entropy map goes to 0.8 this time. 3-4 bp is quite alot if they are right next to each other, because that means an entire arm will form wrong. Another way to tell this is to, look at the mountain plot below, where sloped lines are base-paired positions and flat lines are unpaired positions. The fact that the green (the average of the ensemble) and blue (the centroid) DONT overlap indicates we have a problem. And since the cool colors are all clustered together in groups of 3-4 bp, we could reasonably expect misfolds to lose an entire arm or worse!

Notice the cool colors represent values at the weakest spots of this structure, where entropy >0. Can you see the corresponding peaks on the entropy vs position plot below? These are the most likely positions for deviations from the MFE structure. Note the scale of the graph, 0 entropy means NO deviations, and >0 means some deviations. How is entropy calculated? This is a Shannon Entropy from information theory which is calculated where p is the probability of a particular outcome and log is the natural log (base e). Note that all of the probabilities added together have to sum to exactly 1. So if there is only 1 possibility with probability 1, -1*log(1) = 0 Say, there are 2 possibilities, one with 0.99 probability and 0.01 for the other, thats -1 *( 0.99*log(0.99) + 0.01*log(0.01)) = 0.056 : pretty darn close to 0 Say there are 100 equally likely outcomes, -1*(0.01*log(0.01)) * 100 = 4.6. Thats very big compared to 0 or 0.056. So, many numbers of equally likely outcomes means entropies much greater than 0, and in the limit that there is only a single possible way for the base to be positioned, the entropy goes to 0.

Thats all for today. In the segment II, I will explain why Christmas trees are bad using the barriers and subopt RNAfold kinetics simulation program.
Published by Google Docs Report Abuse Updated automatically every 5 minutes

So how good is the prediction compared to the lab result? Heres a snapshot in target mode of the synthesis results. You can see its not an exact prediction, but it gives alot of the right trends. Useful!

3 of 6

7/18/11 11:24 AM

5 of 6

7/18/11 11:24 AM

6 of 6

7/18/11 11:24 AM

#5 Humans Create Knowledge


by Alan.Robot
The following entry scored a 65 in round 2:
Ill cut to the chase, heres the output and here is a summary

defect, most likely to occur at the green spots on the picture above. BUT, we also know average difference between structures is less than 1 bp from the target, so not all of the spots will be wrong at the same time, they probably occur individually in individual molec a time. So the MFE structure will be preserved, this is a win!

A meta-analysis of one-cross-bulge results


CONTRAST WITH A POOR-SCORING ENTRY

GGAAAGUAGGAGAUGUUAGUUUGAAAGGAUUGGCCGGUGGUUUGAAAGGGCGAUUGUCUUUAGU

The optimal secondary structure in dot-bracket notation with a minimum f of -19.20 kcal/mol is given below. The frequency of the MFE structure in the ensemble is 27.45 %. The ensemble diversity is 3.47

You can see below this structure is not predicted to maintain the central hub, and the bot probably doesnt form correctly. Most of the ensemble is NOT represented by the MFE, a
6

10 Reasons to Crowdsource Science


#1 #2 #3 #4
Games Make Us Understand Humans Solve Hard Problems Its a Total Rush! Human Learning

#5 Humans Create Knowledge

#6 Creating New Scientists

#6 Creating New Scientists

#6 Creating New Scientists

Convert Problem to Game

Study User Solutions

Create Algorithms

#6 Creating New Scientists

Inverse Crowdsourcing
(but thats not all)

#6 Creating New Scientists


Berex NZ: @mat and alan, quick question, who would you recommend would use a DIY lab more, than myself out of this community? alan.robot: you mean in terms of players? Berex NZ: yea alan.robot: noone else has volunteered to do actual lab work, but you mean who might want to test out ideas outside of the normal channels? Berex NZ: yep alan.robot: because I think everyone would want a synthesis slot if it were possible :-) alan.robot: didn't realize the consumables were so much, though, did you see the cost breakdown? Berex NZ: I'm just wondering if anyone else has asked to do the actual wet work Berex NZ: yep alan.robot: fortunately, the bulk is DNA synthesis, and historically that's been tracking with moore's law. Berex NZ: just wondering, what was the total you got per round? alan.robot: wasn't it 500 something for 8 designs? Berex NZ: yea 520 Berex NZ: exc labour Berex NZ: Oligos arent that expensive though...

#6 Creating New Scientists

Backyard Biosynth

10 Reasons to Crowdsource Science


#1 #2 #3 #4 #5
Games Make Us Understand Humans Solve Hard Problems Its a Total Rush! Human Learning Humans Create Knowledge

#6 Creating New Scientists

#7 Breaking the Rules

#7 Breaking the Rules

#7 Breaking the Rules

#7 Breaking the Rules


RNA Alphabet by clollin

#7 Breaking the Rules

#7 Breaking the Rules

by Joshua Weizmann

10 Reasons to Crowdsource Science


#1 #2 #3 #4 #5 #6
Games Make Us Understand Humans Solve Hard Problems Its a Total Rush! Human Learning Humans Create Knowledge Creating New Scientists

#7 Breaking the Rules

#8 There Are a Lot of Humans

#8 There Are a Lot of Humans

man/ 26 years
(in 6 months)

10 Reasons to Crowdsource Science


#1 #2 #3 #4 #5 #6 #7
Games Make Us Understand Humans Solve Hard Problems Its a Total Rush! Human Learning Humans Create Knowledge Creating New Scientists Breaking the Rules

#8 There Are a Lot of Humans

#9 And They Work for Free

#9 And They Work for Free


potentially transformative risky - the NSF

#9 And They Work for Free

#9 And They Work for Free


Dont just give them points.... Give them lots of points... - Luis Von Ahn

#9 And They Work for Free

10 Reasons to Crowdsource Science


#1 #2 #3 #4 #5 #6 #7 #8
Games Make Us Understand Humans Solve Hard Problems Its a Total Rush! Human Learning Humans Create Knowledge Creating New Scientists Breaking the Rules There Are a Lot of Humans

#10 It Means a Lot to Players

#9 And They Work for Free

#10 It Means a Lot to Players

#10 It Means a Lot to Players

10 Reasons to Crowdsource Science


#1 #2 #3 #4 #5 #6 #7 #8
Games Make Us Understand Humans Solve Hard Problems Its a Total Rush! Human Learning Humans Create Knowledge Creating New Scientists Breaking the Rules There Are a Lot of Humans And They Work for Free It Means a Lot to Players

#9 #10

Crowdsourcing Science

Scientists

Problem

Game

Players

Crowdsourcing Science

Scientists

Problem

Players

Game

Crowdsourcing Science
Is backyard biosyth possible? How can we trust it? Who owns these designs? Can we get the players to write a paper without us? Major progress in the next 5 years.... Maybe we can save the world.

Crowdsourcing Science

Seth Cooper

Zoran Popovi

Jee Lee

David Baker

10 Reasons to Crowdsource Science


Adrien Treuille
Carnegie Mellon University

You might also like