Professional Documents
Culture Documents
Rmetrics 2009
Outline
iterators
foreach
iterators
An S3 class with tools for iterating over various R data structures: Conceptually like while loops Dened by a nextElem function Like iterators in Java and other languages
Simple Examples
Another example
iquery <- function (con, statement, ..., n=1) { rs <- dbSendQuery (con, statement, ...) nextElem <- function() { d <- fetch (rs, n) if (nrow (d) == 0) { dbClearResult (rs) stop (StopIteration) } d } structure (list (nextElem=nextElem), class=c (iquery, iter)) } nextElem.iquery <- function(obj) obj$nextElem()
foreach
New looping methods for R An abstract interface to parallel computing Python/Haskell-like list comprehensions
Foreach Syntax
Example
> foreach (j=1:4) %dopar% { j } [[1]] [1] 1 [[2]] [1] 2 [[3]] [1] 3 [[4]] [1] 4
Examples
> foreach (j=1:4,.combine=c) %dopar% { j } [1] 1 2 3 4 > foreach (j=icount(4),.combine=+) %dopar% { j } [1] 10 Note the difference with sum (1:4).
Another Example
library (randomForest) x <- matrix (runif (500), 100) y <- gl (2, 50) rf <- foreach (ntree=rep (250, 4), .combine=combine) %dopar% + randomForest (x, y, ntree=ntree)
%dopar% is a registration API for parallel back-ends: doSEQ (the default backend) doMC (multicore package) doNWS doSNOW
%dopar% is a registration API for parallel back-ends: doSEQ (the default backend) doMC (multicore package) doNWS doSNOW doRHIPE? doRMPI? ...
> z <- 2 > f <- function (x) { sqrt (x + z) } > foreach (j=1:4, .combine=c) %dopar% { f (j) } [1] 1.732051 2.000000 2.236068 2.449490
List comprehension
> foreach (j=-2:2,.combine=c) %:% when (j>=0) + %dopar% sqrt (j) [1] 0.000000 1.000000 1.414214
Nesting
Foreach loops can be nested. Nesting admits at least two interesting cases: Easy loop unrolling Easy multi-paradigm parallelism
Loop unrolling
Compare (100 iterations of 5 parallel tasks): x <- foreach (j=1:100,.combine=sum) %do% { foreach (k=1:5,.combine=c) %dopar% {j*k} } With an unrolled version (500 parallel tasks): y <- foreach (j=1:100,.combine=sum) %:% foreach (k=1:5,.combine=c) %dopar% {j*k} The unrolled approach is better load-balanced on a cluster.
Multi-paradigm parallelism
require (doSNOW) cl <- makeCluster (c (n1, n2, n3, n4)) registerDoSNOW (cl) foreach (j=<iterator>, .packages=doMC) %dopar% { foreach (k=<iterator>) %dopar% { registerDoMC () ... } }
simpleRule <- function (z, fast=12, slow=26, signal=9, instr, benchmark) { x <- MACD (z, nFast=fast, nSlow=slow, nSig=signal, maType="EMA") position <- sign (x[,1]-x[,2]) s <- xts (position,order.by=index(z)) return (instr*(s>0) + benchmark*(s<=0)) }
M <- 100 S <- foreach (j=1:(M-1), .combine=rbind, .packages=c (xts,TTR)) %dopar% { x <- rep (0,M) for (k in min ((j+2),M):M) { R <- simpleRule (Cl (MSFT),j,k,9,Ra,Rb) Dt <- na.omit (R - Rb) x[k] <- mean (Dt)/sd( Dt) } x }
Basic idea Prole code with Rprof (profr is a nice wrapper that visualizes the results) Examine bottlenecks for apply-like statements and for loops with independent code blocks Rewrite for loops without side-effects as required (may require a custom combine function) Unlock the namespace, provisionally replace target function(s) and experiment (a nice trick)
Example: ipred
# With Rmpi mpi.parSapply: require (Rmpi) x <- mapReduce (cyl, mean(mpg), mean(hp), data=mtcars, applyfun=mpi.parSapply) # With foreach: require (foreach) fapply <- function (A,B,C) { foreach (j=A, .combine=cbind) %dopar% B(j, C) } mapReduce (cyl, mean(mpg), mean(hp), data=mtcars, applyfun=fapply)