You are on page 1of 8

Apache Crunch

What is it ? How does it work ? Why use it ? Hadoop MapReduce pipelines Scrunch Joins

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Apache Crunch Pipe line

Crunch is based on Goo le!s "lu#eJa$a Pro$ides a Ja$a based AP% &or M'R pipelines %t uses an MS( ) #ultiple seriali*able type + data #odel Good &or processin co#ple, data types -etter &or .non tuple/ data types i0e0

%#a es Audio Seis#ic data

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Apache Crunch Pipe line

What is a Map Reduce Pipe line ?


Map Shu&&le Reduce Co#bine

Arran ed in se1uence and ' or in parallel Potentially $ery lon chains

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Apache Crunch Scala

Scrunch is a Scala wrapper &or Apache Crunch Reduced code "unctional and 22 styles 3ses type in&erencin &or Map ' Reduce %ncorporates Ja$a Materiali*e &unctionality %ncludes R4P5 ) read e$al print loop +

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Apache Crunch Joins

6etails o& Joins a$ailable in Crunch


%nner ' 2uter like S75 8oins Sa#e with 5e&t ' Ri ht ' "ull 8oins MapSide 8oin is an in #e#ory 8oin

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Apache Crunch Per&or#ance

A li ht wei ht AP% that runs e&&iciently Crunch is a thin $eneer on top o& Map Reduce (wo i#ple#entations a$ailable

Hadoop Writeables A$ro

A$ro i#ple#entation #uch &aster

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Apache Crunch AP%

6ata Model

2perators

Pipeline MRPipeline Me#Pipeline Pcollection Ptable P roup(able Source (ar et 4#itter P(ype

6o"n Co#bine"n "ilter"n Joins Cartesian Sort Secondary Sort Pob8ect -loo#"ilters

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Contact 3s

"eel &ree to contact us at


www0se#tech9solutions0co0n* in&o:se#tech9solutions0co0n*

We o&&er %( pro8ect consultancy We are happy to hear about your proble#s ;ou can 8ust pay &or those hours that you need (o sol$e your proble#s

You might also like