Professional Documents
Culture Documents
What is it ? How does it work ? Why use it ? Hadoop MapReduce pipelines Scrunch Joins
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
Crunch is based on Goo le!s "lu#eJa$a Pro$ides a Ja$a based AP% &or M'R pipelines %t uses an MS( ) #ultiple seriali*able type + data #odel Good &or processin co#ple, data types -etter &or .non tuple/ data types i0e0
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
Scrunch is a Scala wrapper &or Apache Crunch Reduced code "unctional and 22 styles 3ses type in&erencin &or Map ' Reduce %ncorporates Ja$a Materiali*e &unctionality %ncludes R4P5 ) read e$al print loop +
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
%nner ' 2uter like S75 8oins Sa#e with 5e&t ' Ri ht ' "ull 8oins MapSide 8oin is an in #e#ory 8oin
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
A li ht wei ht AP% that runs e&&iciently Crunch is a thin $eneer on top o& Map Reduce (wo i#ple#entations a$ailable
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
6ata Model
2perators
Pipeline MRPipeline Me#Pipeline Pcollection Ptable P roup(able Source (ar et 4#itter P(ype
6o"n Co#bine"n "ilter"n Joins Cartesian Sort Secondary Sort Pob8ect -loo#"ilters
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
Contact 3s
www0se#tech9solutions0co0n* in&o:se#tech9solutions0co0n*
We o&&er %( pro8ect consultancy We are happy to hear about your proble#s ;ou can 8ust pay &or those hours that you need (o sol$e your proble#s