You are on page 1of 64

SciComp Examples IPython A growing project Wrapup

IPython
A tool for the lifecycle of computational ideas

Fernando Prez
http://fperez.org, @fperez_org
Fernando.Perez@berkeley.edu

Henry H. Wheeler Jr. Brain Imaging Center, UC Berkeley

April 4, 2013

SciComp Examples IPython A growing project Wrapup

Outline

Scientific Computing

Two examples

IPython: Interactive Python

A growing project

Wrapup

FP (UC Berkeley)

IPython

April 4, 2013

2 / 51

SciComp Examples IPython A growing project Wrapup

Outline

Scientific Computing

Two examples

IPython: Interactive Python

A growing project

Wrapup

FP (UC Berkeley)

IPython

April 4, 2013

3 / 51

Computing is not the third branch of science...

It is now the backbone of theory and experiment!


Computing in science must improve drastically before we can really
call it scientific.

Computing is not the third branch of science...

It is now the backbone of theory and experiment!


Computing in science must improve drastically before we can really
call it scientific.

SciComp Examples IPython A growing project Wrapup

A crisis of credibility and real issues


The Duke clinical trials scandal - Potti/Nevin
A compounding of (common and otherwise) data analysis errors.
Lawsuits, resignations, careers destroyed.
More importantly: Patients were harmed.
Major policy reviews and changes: NCI, IOM, ...
More: see K. Baggerlys "starter set" page.

The Duke situation is more common than wed like to believe!


Begley & Ellis, Nature, 3/28/12: Drug development: Raise standards
for preclinical cancer research.
47 out of 53 landmark papers could not be replicated.

Nature, Feb 2012, Ince et al: The case for open computer programs
The scientific community places more faith in computation than is
justified
anything less than the release of actual source code is an indefensible
approach for any scientific results that depend on computation
FP (UC Berkeley)

IPython

April 4, 2013

5 / 51

SciComp Examples IPython A growing project Wrapup

A crisis of credibility and real issues


The Duke clinical trials scandal - Potti/Nevin
A compounding of (common and otherwise) data analysis errors.
Lawsuits, resignations, careers destroyed.
More importantly: Patients were harmed.
Major policy reviews and changes: NCI, IOM, ...
More: see K. Baggerlys "starter set" page.

The Duke situation is more common than wed like to believe!


Begley & Ellis, Nature, 3/28/12: Drug development: Raise standards
for preclinical cancer research.
47 out of 53 landmark papers could not be replicated.

Nature, Feb 2012, Ince et al: The case for open computer programs
The scientific community places more faith in computation than is
justified
anything less than the release of actual source code is an indefensible
approach for any scientific results that depend on computation
FP (UC Berkeley)

IPython

April 4, 2013

5 / 51

SciComp Examples IPython A growing project Wrapup

A crisis of credibility and real issues


The Duke clinical trials scandal - Potti/Nevin
A compounding of (common and otherwise) data analysis errors.
Lawsuits, resignations, careers destroyed.
More importantly: Patients were harmed.
Major policy reviews and changes: NCI, IOM, ...
More: see K. Baggerlys "starter set" page.

The Duke situation is more common than wed like to believe!


Begley & Ellis, Nature, 3/28/12: Drug development: Raise standards
for preclinical cancer research.
47 out of 53 landmark papers could not be replicated.

Nature, Feb 2012, Ince et al: The case for open computer programs
The scientific community places more faith in computation than is
justified
anything less than the release of actual source code is an indefensible
approach for any scientific results that depend on computation
FP (UC Berkeley)

IPython

April 4, 2013

5 / 51

What does it take to get reproducible research results?

Reproducible research practices!


Reproducibility at publication time?
Its already too late.
Learn from a community (open source) where
reproducibility is an everyday practice
(by necessity)

What does it take to get reproducible research results?

Reproducible research practices!


Reproducibility at publication time?
Its already too late.
Learn from a community (open source) where
reproducibility is an everyday practice
(by necessity)

What does it take to get reproducible research results?

Reproducible research practices!


Reproducibility at publication time?
Its already too late.
Learn from a community (open source) where
reproducibility is an everyday practice
(by necessity)

FOSS better than scientific research?


FOSS: Free and Open Source Software

Public distributed version control: provenance tracking

Pull requests: ongoing peer review

Pull requests: back and forth discussion

Automated tests: validation


The IPython build Dashboard: immediate feedback

Versioned science
Git: the tool you didnt know you needed

Reproducibility?
Tracking and recreating every step of your work
In the software world: its called Version Control!
Git: an enabling technology. Use version control for everything
Paper/grant writing (never get paper_v5_john.tex by email again!)
git clone https://server.com/my-grant/repo.git
cd repo
make nsf-fastlane
Everyday research: track your results
Collaboration: synchronize multi-author work.
Teaching!

A Git tutorial for scientists: http://bit.ly/YMBP83

Versioned science
Git: the tool you didnt know you needed

Reproducibility?
Tracking and recreating every step of your work
In the software world: its called Version Control!
Git: an enabling technology. Use version control for everything
Paper/grant writing (never get paper_v5_john.tex by email again!)
git clone https://server.com/my-grant/repo.git
cd repo
make nsf-fastlane
Everyday research: track your results
Collaboration: synchronize multi-author work.
Teaching!

A Git tutorial for scientists: http://bit.ly/YMBP83

The IBM Mark I at Harvard

In the beginning, IBM said...


Let there be FORTRAN

In the beginning, IBM said...


Let there be FORTRAN

Beyond (Floating Point) Number Crunching


Symbolic manipulation
Interval arithmetic
Rationals
Arbitrary precision
integers
Extended precision
floating point

Hardware
floating point FORTRAN
Text processing
Databases
Data formats: HDF5, XML, ...
Graphical user
interfaces
Web interfaces
Hardware
control
Multi-language
integration

The purpose of computing is insight, not numbers.


Richard Hamming, 1962

SciComp Examples IPython A growing project Wrapup

The computer as microscope


Exploratory: Problems definition evolves as we understand it.
No requirements to build an application against.
Mathematica, Maple, Matlab, IDL, etc.
All have an interactive environment.

Applications

FP (UC Berkeley)

Languages

IPython

April 4, 2013

16 / 51

IPython: part of a Rich Ecosystem

IPython

NetworkX

SciComp Examples IPython A growing project Wrapup

The Lifecycle of a Scientific Idea (schematically)

Individual exploratory work

Collaborative development

Production work (HPC, cloud, parallel)

Publication (with reproducible results!)

Education

Goto 1.

The Problem with most tools


Barriers and discontinuities in workflow in between all the steps

FP (UC Berkeley)

IPython

April 4, 2013

18 / 51

SciComp Examples IPython A growing project Wrapup

The Lifecycle of a Scientific Idea (schematically)

Individual exploratory work

Collaborative development

Production work (HPC, cloud, parallel)

Publication (with reproducible results!)

Education

Goto 1.

The Problem with most tools


Barriers and discontinuities in workflow in between all the steps

FP (UC Berkeley)

IPython

April 4, 2013

18 / 51

SciComp Examples IPython A growing project Wrapup

EEG analysis for epilepsy High quality plotting: matplotlib JPL

Outline

Scientific Computing

Two examples

IPython: Interactive Python

A growing project

Wrapup

FP (UC Berkeley)

IPython

April 4, 2013

19 / 51

Data analysis for epilepsy surgery


Isolating the origin of drug-resistant epileptic seizures which require surgery.

John Hunter, Department of Pediatric Neurology, University of Chicago.

Electrode location in 3D, combined with MRI data

Correlation analysis of seizure data

Matplotlib: 2d plotting

Matplotlib: 3d plotting

JPL: Mars mission trajectory design and nav data


Ted Drain and Lynn Craig, Jet Propulsion Laboratory (NASA/Caltech)

From: Name Elided <nameelided@jpl.nasa.gov>


Date: Oct 2, 2007 7:15 PM
Subject: Fwd: matplotlib bug numbers
To: John Hunter <jdh2358@gmail.com>
One of my lead developers mentioned that they had sent a bug to you about the annotations feature of
MatPlotLib. Would you be able to let me know what the timeline is to resolve that bug? The reason is that
the feature is needed for the Phoenix project and their arrival at Mars will be in March sometime, but they
are doing their testing in the coming few months. This annotation feature is used on reports that present
the analysis of the trajectory to the navigation team and it shows up on our schedule. It would really
help me to know approximately when it could be resolved.
B-plane plots are used to show the trajectory of a spacecraft with respect to the target body (specifically
perpendicular to the incoming asymptote of the spacecraft trajectory) and we plot them with the y-axis
inverted. The plot is used heavily in flight operations so it is important to our customers.
In addition, we have what is called a thundering heard plot where many different trajectory solutions
(determined from different measurement sources) are plotted together. The annotations are import there so
we can see which plot corresponds to each source of data. I hope it helps to know how your code will be
used in spacecraft navigation.
Thanks for all your efforts.

JPL: Mars mission data visualization


Expected communication power levels between an orbiting spacecraft and
a lander as it goes through the atmosphere:

August 23, 2011

The astronomy event of a generation


Josh Bloom, UC Berkeley Astronomy
@profjsb

Supernova PTF11kyl:
Event of a Generation found on Tuesday

Monday

Tuesday

Wednesday

Most nearby Type Ia supernova in > 25 years


Soon visible with binoculars
http://bit.ly/ptf11kly

SciComp Examples IPython A growing project Wrapup

Outline

Scientific Computing

Two examples

IPython: Interactive Python

A growing project

Wrapup

FP (UC Berkeley)

IPython

April 4, 2013

28 / 51

Why IPython?
(something other than
Id rather not finish my dissertation)

Why IPython?
(something other than
Id rather not finish my dissertation)

SciComp Examples IPython A growing project Wrapup

The Lifecycle of a Scientific Idea (schematically)

Individual exploratory work

Collaborative development

Production work (HPC, cloud, parallel)

Publication (with reproducible results!)

Education

Goto 1.

The Problem with most tools


Barriers and discontinuities in workflow in between all the steps

FP (UC Berkeley)

IPython

April 4, 2013

30 / 51

SciComp Examples IPython A growing project Wrapup

The Lifecycle of a Scientific Idea (schematically)

Individual exploratory work

Collaborative development

Production work (HPC, cloud, parallel)

Publication (with reproducible results!)

Education

Goto 1.

The Problem with most tools


Barriers and discontinuities in workflow in between all the steps

FP (UC Berkeley)

IPython

April 4, 2013

30 / 51

IPythons goal:
Fluid transitions in all these steps

Demo

SciComp Examples IPython A growing project Wrapup

Pillar #1: An architecture for interactive computing

FP (UC Berkeley)

IPython

April 4, 2013

33 / 51

SciComp Examples IPython A growing project Wrapup

Pillar #2: the Notebook Format

JSON but version control-friendly


Easy for machine processing, fixable by hand if need be.
Lots of hooks for metadata
Not Python-specific (Ruby, JS notebooks exist, R, Julia planned)
Produce Markdown, reST, LATEX, HTML, etc...

An open format for sharing, publishing and


archiving executable computational work

FP (UC Berkeley)

IPython

April 4, 2013

34 / 51

SciComp Examples IPython A growing project Wrapup

Outline

Scientific Computing

Two examples

IPython: Interactive Python

A growing project

Wrapup

FP (UC Berkeley)

IPython

April 4, 2013

35 / 51

Documented protocols and formats:


a growing ecosystem around IPython

An Emacs Notebook Client!


Takafumi Arakaki: http://tkf.github.com/emacs-ipython-notebook.

A vim client to control an IPython kernel/console


Paul Ivanov (Berkeley), https://github.com/ivanov/vim-ipython

Microsoft Visual Studio 2010 integrated console


Dino Viehland and Shahrokh Mortazavi (Microsoft); http://pytools.codeplex.com

Star Cluster: IPython parallel+Notebook on Amazon EC2


Justin Riley (MIT): http://web.mit.edu/star/cluster

Other projects using IPython


Scientific

Web/Other

EPD: Enthought Python Distribution.

Visual Studio 2010: MS.

Anaconda: Continuum Python Distribution.

Django.

Sage: mathematics.

Turbo Gears.

PyRAF: Space Telescope Science Institute

Pylons web framework

CASA: Nat. Radio Astronomy Observatory

Zope and Plone CMS.

Ganga: CERN
PyMAD: neutron spectrom., Laue Langevin

Axon Shell, BBC


Kamaelia.

Sardana: European Synchrotron Radiation

Schevo database.

ASCEND: eng. modeling (Carnegie Mellon).

Pitz: distributed
task/bug tracking.

JModelica: dynamical systems.


DASH: Denver Aerosol Sources and Health.
Trilinos: Sandia National Lab.

iVR (interactive Virtual


Reality).

NiPype: computational pipelines, MIT.

Movable Python
(portable Python
environment).

PyIMSL Studio, by Visual Numerics.

...

DoD: baseline configuration.

...

Brian Granger

Min Ragan-Kelley

Thomas Kluyver

Matthias Bussonnier

Paul Ivanov

Brad Froehle

Jrgen Stenarson

Robert Kern

Evan Patterson

Jonathan March

(Incomplete) Cast of Characters


Brian Granger - Physics, Cal State San Luis Obispo
Min Ragan-Kelley - Nuclear Engineering, UC Berkeley
Matthias Bussonnier - Physics, Institut Curie, Paris
Jonathan March- Enthought
Thomas Kluyver - Biology, U. Sheffield
Jrgen Stenarson - Elect. Engineering, Sweden.
Paul Ivanov - Neuroscience, UC Berkeley.
Robert Kern - Enthought
Evan Patterson - Physics, Caltech/Enthought
Brad Froehle - Mathematics, UC Berkeley
Stefan van der Walt - UC Berkeley
John Hunter - TradeLink Securities, Chicago.
Prabhu Ramachandran - Aerospace Engineering, IIT Bombay.
Satra Ghosh- MIT Neuroscience
Gal Varoquaux - Neurospin (Orsay, France)
Ville Vainio - CS, Tampere University of Technology, Finland
Barry Wark - Neuroscience, U. Washington.
Ondrej Certik - Physics, U Nevada Reno
Darren Dale - Cornell
Justin Riley - MIT
Mark Voorhies - UC San Francisco
Nicholas Rougier - INRIA Nancy Grand Est
Thomas Spura - Fedora project

Many more! (~220 commit authors)

Support
Thank you!

Enthought, Austin, TX: Lots!


Microsoft: WinHPC support, Visual Studio integration, Azure
(thanks to Shahrokh Mortazavi).
DoD/DRC Inc: funding through Sept. 2012 (thanks to Jose
Unpingco and Chris Keees).
NIH: via NiPy grant
NSF: via Sage compmath grant
Google: summer of code 2005, 2010.
Tech-X Corp., Boulder, CO: Parallel/notebook (previous versions)
Recent stable funding (2 years, 7 people, J. Taylor):

Open Source:
skills, tools and practices we need!
A culture where things get done.
Wildly collaborative.
Reproducible by necessity.
Version control, testing, documentation, public peer review, etc.

Reward Structure in academia:


we punish all of the above
Departmental boundaries: interdisciplinary work is a great buzzword,
not such a great career path.

Computational heritage is built on code


not on citations

Continuous evolution vs publication milestones


Authorship in collaborative works vs the first-author paper.
Scholarship and intellectual effort embedded in the code.

SciComp Examples IPython A growing project Wrapup

Too few are lifting too many

Normalized commit rates since Jan-2010


cython
ipython
matplotlib
mayavi
numpy
scipy
sympy

1.0

Commit rate

0.8
0.6
0.4
0.2
0.0 1

FP (UC Berkeley)

4
5
7
6
Individual Committer
IPython

10

April 4, 2013

47 / 51

NumFOCUS: Open Code, Better Science

Support the development of core projects in education and research.


Community-created and driven.
A neutral ground for industry, academia and government.
501(c)3 - donations are tax-exempt in the USA

http://numfocus.org

SciComp Examples IPython A growing project Wrapup

Outline

Scientific Computing

Two examples

IPython: Interactive Python

A growing project

Wrapup

FP (UC Berkeley)

IPython

April 4, 2013

49 / 51

The future of IPython: a 2-year roadmap

Spring/summer 2013: IPython 1.0


Notebook document management (nbconvert)
JavaScript internals cleanup

Fall 2013
Interactive JavaScript API
With callbacks to remote kernels.

2014
Multiuser server
Simple to deploy
Trusted (shell OK) Unix users in a lab, group, class, etc.

https://github.com/ipython/ipython/wiki/Roadmap:-IPython

In closing: our vision of scientific computing


Build on the right abstractions
The kernel: unify interactive and parallel computing
you only have one brain!

A single protocol: many kernels, many clients.


Communications and logging
the protocol is the notebook file format.

Insight and communication (Hamming)


Literate computing vs literate programming.
Build a community and an ecosystem
How to Scale a Code in the Human Dimension, M. Turk,
http://arxiv.org/abs/1301.7064.

In closing: our vision of scientific computing


Build on the right abstractions
The kernel: unify interactive and parallel computing
you only have one brain!

A single protocol: many kernels, many clients.


Communications and logging
the protocol is the notebook file format.

Insight and communication (Hamming)


Literate computing vs literate programming.
Build a community and an ecosystem
How to Scale a Code in the Human Dimension, M. Turk,
http://arxiv.org/abs/1301.7064.

In closing: our vision of scientific computing


Build on the right abstractions
The kernel: unify interactive and parallel computing
you only have one brain!

A single protocol: many kernels, many clients.


Communications and logging
the protocol is the notebook file format.

Insight and communication (Hamming)


Literate computing vs literate programming.
Build a community and an ecosystem
How to Scale a Code in the Human Dimension, M. Turk,
http://arxiv.org/abs/1301.7064.

You might also like