You are on page 1of 25

foreach + iterators

Bryan Lewis Steve Weston

Revolution Computing New Haven, CT USA

Rmetrics 2009

Outline

iterators

foreach

Experimenting with existing packages

iterators

An S3 class with tools for iterating over various R data structures: Conceptually like while loops Dened by a nextElem function Like iterators in Java and other languages

Simple Examples

it <- iter (1:3)

it <- icount (3)

Another example

iquery <- function (con, statement, ..., n=1) { rs <- dbSendQuery (con, statement, ...) nextElem <- function() { d <- fetch (rs, n) if (nrow (d) == 0) { dbClearResult (rs) stop (StopIteration) } d } structure (list (nextElem=nextElem), class=c (iquery, iter)) } nextElem.iquery <- function(obj) obj$nextElem()

foreach

New looping methods for R An abstract interface to parallel computing Python/Haskell-like list comprehensions

Foreach Syntax

foreach(iterator,...) %dopar% { statements }

Example

> foreach (j=1:4) %dopar% { j } [[1]] [1] 1 [[2]] [1] 2 [[3]] [1] 3 [[4]] [1] 4

Examples

> foreach (j=1:4,.combine=c) %dopar% { j } [1] 1 2 3 4 > foreach (j=icount(4),.combine=+) %dopar% { j } [1] 10 Note the difference with sum (1:4).

Another Example

> > > >

library (randomForest) x <- matrix (runif (500), 100) y <- gl (2, 50) rf <- foreach (ntree=rep (250, 4), .combine=combine) %dopar% + randomForest (x, y, ntree=ntree)

The %dopar% operator

%dopar% is a registration API for parallel back-ends: doSEQ (the default backend) doMC (multicore package) doNWS doSNOW

The %dopar% operator

%dopar% is a registration API for parallel back-ends: doSEQ (the default backend) doMC (multicore package) doNWS doSNOW doRHIPE? doRMPI? ...

Foreach tries to parse R syntax reasonably

> z <- 2 > f <- function (x) { sqrt (x + z) } > foreach (j=1:4, .combine=c) %dopar% { f (j) } [1] 1.732051 2.000000 2.236068 2.449490

List comprehension

> foreach (j=-2:2,.combine=c) %:% when (j>=0) + %dopar% sqrt (j) [1] 0.000000 1.000000 1.414214

Nesting

Foreach loops can be nested. Nesting admits at least two interesting cases: Easy loop unrolling Easy multi-paradigm parallelism

Loop unrolling
Compare (100 iterations of 5 parallel tasks): x <- foreach (j=1:100,.combine=sum) %do% { foreach (k=1:5,.combine=c) %dopar% {j*k} } With an unrolled version (500 parallel tasks): y <- foreach (j=1:100,.combine=sum) %:% foreach (k=1:5,.combine=c) %dopar% {j*k} The unrolled approach is better load-balanced on a cluster.

Multi-paradigm parallelism

> > > > + + + + +

require (doSNOW) cl <- makeCluster (c (n1, n2, n3, n4)) registerDoSNOW (cl) foreach (j=<iterator>, .packages=doMC) %dopar% { foreach (k=<iterator>) %dopar% { registerDoMC () ... } }

Example: Very simple backtesting

simpleRule <- function (z, fast=12, slow=26, signal=9, instr, benchmark) { x <- MACD (z, nFast=fast, nSlow=slow, nSig=signal, maType="EMA") position <- sign (x[,1]-x[,2]) s <- xts (position,order.by=index(z)) return (instr*(s>0) + benchmark*(s<=0)) }

Brute-force parameter optimization


# Define a return series Ra for the instrument # (below we use the closing price of MSFT), and # benchmark series Rb M <- 100 S <- matrix(0,M,M) for (j in 1:(M-1)) { for (k in min ((j+2),M):M) { R <- simpleRule (Cl (MSFT),j,k,9, Ra, Rb) Dt <- na.omit (R - Rb) S[j,k] <- mean (Dt)/sd(Dt) } }

Now in parallel, by rows...

M <- 100 S <- foreach (j=1:(M-1), .combine=rbind, .packages=c (xts,TTR)) %dopar% { x <- rep (0,M) for (k in min ((j+2),M):M) { R <- simpleRule (Cl (MSFT),j,k,9,Ra,Rb) Dt <- na.omit (R - Rb) x[k] <- mean (Dt)/sd( Dt) } x }

Parallelizing parts of an existing package

Basic idea Prole code with Rprof (profr is a nice wrapper that visualizes the results) Examine bottlenecks for apply-like statements and for loops with independent code blocks Rewrite for loops without side-effects as required (may require a custom combine function) Unlock the namespace, provisionally replace target function(s) and experiment (a nice trick)

Example: ipred

(Work through the ipred replacement functions in the lecvx.R.)

Appendix: Fun map/reduce examples


Succint map/reduce...from the mapReduce package by Christopher Brown: mapReduce <- function (map, ..., data=NULL, applyfun=sapply) { innerFun <- function(my.data, expr) eval(expr, my.data) outerFun <- function (expr, split.data) sapply (split.data, innerFun, expr) attach (data) map <- eval (substitute (map, data)) detach (data) expr = substitute (c ( ... ))[-1] split.data <- split( data, map ) applyfun (expr, outerFun, split.data) }

mapReduce sequential and parallel examples


# An example mapReduce (cyl, mean(mpg), mean(hp), data=mtcars, applyfun=sapply) # With multicore: require (mutlicore) mapReduce (cyl, mean(mpg), mean(hp), data=mtcars, applyfun=mclapply) # With SNOW parSapply: require (snow) cl <- makeSOCKcluster(c("localhost","localhost")) ssapply <- function (A,B,C) {parSapply(cl, A, B, C)} mapReduce (cyl, mean(mpg), mean(hp), data=mtcars, applyfun=ssapply)

mapReduce parallel examples

# With Rmpi mpi.parSapply: require (Rmpi) x <- mapReduce (cyl, mean(mpg), mean(hp), data=mtcars, applyfun=mpi.parSapply) # With foreach: require (foreach) fapply <- function (A,B,C) { foreach (j=A, .combine=cbind) %dopar% B(j, C) } mapReduce (cyl, mean(mpg), mean(hp), data=mtcars, applyfun=fapply)

You might also like