Tuning The Cluster For MapReduce v2 (YARN)

Cloudera.
com
Training
Support
Documentation
DevCenter
| ContactUs
Downloads
ThisisthedocumentationforCloudera5.3.x.DocumentationforotherversionsisavailableatClouderaDocumentation.
TuningtheClusterforMapReducev2(YARN)
ThistopicappliestoYARNclustersonly,anddescribeshowtotuneandoptimizeYARNforyourcluster.Itintroducesthefollowingterms:
ResourceManager:Amasterdaemonthatauthorizessubmittedjobstorun,assignsanApplicationMastertothem,andenforcesresourcelimits.
NodeManager:AworkerdaemonthatlaunchesApplicationMasterandtaskcontainers.
ApplicationMaster:Asupervisorytaskthatrequeststheresourcesneededforexecutortasks.AnApplicationMasterrunsonadifferentNodeManagerforeachapplication.TheApplicationMaster
requestscontainers,whicharesizedbytheresourcesataskrequirestorun.
vcore:VirtualCPUcorealogicalunitofprocessingpower.Inabasiccase,itisequivalenttoaphysicalCPUcoreorhyperthreadedvirtualCPUcore.
Container:Aresourcebucketandprocessspaceforatask.Acontainersresourcesconsistofvcoresandmemory.
IdentifyingHardwareResourcesandServiceDemand
BeginYARNtuningbycomparinghardwareresourcesontheworkernodetothesumdemandoftheworkerservicesyouintendtorun.First,determinehowmanyvcores,howmuchmemory,andhowmany
spindlesareavailableforHadoopoperationsoneachworkernode.Then,estimateservicedemand,ortheresourcesneededtorunaYARNNodeManagerandHDFSDataNodeprocess.Theremaybeother
HadoopservicesthatdonotsubscribetoYARN,including:
Impalad
HBaseRegionServer
Solr
Workernodesalsorunsystemsupportservicesandpossiblythirdpartymonitoringorassetmanagementservices.ThisincludestheLinuxoperatingsystem.
EstimatingandConfiguringResourceRequirements
Afteridentifyinghardwareandsoftwareservices,youcanestimatetheCPUcoresandmemoryeachservicerequires.Thedifferencebetweenthehardwarecomplementandthissumistheamountof
resourcesyoucanassigntoYARNwithoutcreatingcontention.Clouderarecommendsstartingwiththeseestimates:
1020%ofRAMforLinuxanditsdaemonservices
Atleast16GBRAMforanImpaladprocess
Nomorethan1216GBRAMforanHBaseRegionServerprocess
Inaddition,youmustallowresourcesfortaskbuffers,suchastheHDFSSortI/Obuffer.Forvcoredemand,considerthenumberofconcurrentprocessesortaskseachservicerunsasaninitialguide.For
theoperatingsystem,startwithacountoftwo.
Thefollowingtableshowsexampledemandestimatesforaworkernodewith24vcoresand256GBofmemory.Servicesthatarenotexpectedtorunareallocatedzeroresources.
Table1.ResourceDemandEstimates:24vcores,256GBRAM
Service
vcores
Memory(MB)
Operatingsystem
YARNNodeManager
HDFSDataNode
1,024
ImpalaDaemon
16,348
HBaseRegionServer
SolrServer
ClouderaManageragent
1,024
Taskoverhead
52,429
YARNcontainers
18
137,830
Total
24
262,144
YoucannowconfigureYARNtousetheremainingresourcesforitssupervisoryprocessesandtaskcontainers.StartwiththeNodeManager,whichhasthefollowingsettings:
Table2.NodeManagerProperties
Property
Description
Default
yarn.nodemanager.resource.cpuvcores
NumberofvirtualCPUcoresthatcanbeallocatedforcontainers.
yarn.nodemanager.resource.memorymb
Amountofphysicalmemory,inMB,thatcanbeallocatedforcontainers.
8GB
HadoopisadiskI/Ocentricplatformbydesign.Thenumberofindependentphysicaldrives(spindles)dedicatedtoDataNodeuselimitshowmuchconcurrentprocessinganodecansustain.Asaresult,
thenumberofvcoresallocatedtotheNodeManagershouldbethelesserofeither:
(totalvcores)(numberofvcoresreservedfornonYARNuse),or
2x(numberofphysicaldisksusedforDataNodestorage)
TheamountofRAMallottedtoaNodeManagerforspawningcontainersshouldbethedifferencebetweenanodesphysicalRAMminusallnonYARNmemorydemand.So
yarn.nodemanager.resource.memorymb=totalmemoryonthenode(sumofallmemoryallocationstootherprocessessuchasDataNode,NodeManager,RegionServeretc.)Fortheexamplenode,
assumingtheDataNodehas10physicaldrives,thecalculationis:
Table3.NodeManagerRAMCalculation
Property
Value
yarn.nodemanager.resource.cpuvcores
min(246,2x10)=18
yarn.nodemanager.resource.memorymb
137,830MB
SizingtheResourceManager
TheResourceManagerenforceslimitsonYARNcontainerresourcesandcanrejectNodeManagercontainerrequestswhenrequired.TheResourceManagerhassixpropertiestospecifytheminimum,
maximum,andincrementalallotmentsofvcoresandmemoryavailableforarequest.
Table4.ResourceManagerProperties
Property
Description
Default
yarn.scheduler.minimumallocationvcores
ThesmallestnumberofvirtualCPUcoresthatcanberequestedfora
container.
yarn.scheduler.maximumallocationvcores
ThelargestnumberofvirtualCPUcoresthatcanberequestedfora
container.
32
yarn.scheduler.incrementallocationvcores
IfusingtheFairScheduler,virtualcorerequestsareroundeduptothe
nearestmultipleofthisnumber.
yarn.scheduler.minimumallocationmb
Thesmallestamountofphysicalmemory,inMB,thatcanberequestedfor
acontainer.
1GB
yarn.scheduler.maximumallocationmb
Thelargestamountofphysicalmemory,inMB,thatcanberequestedfora
container.
64GB
yarn.scheduler.incrementallocationmb
IfyouareusingtheFairScheduler,memoryrequestsareroundeduptothe
nearestmultipleofthisnumber.
512MB
IfaNodeManagerhas50GBormoreRAMavailableforcontainers,considerincreasingtheminimumallocationto2GB.Thedefaultmemoryincrementis512MB.Forminimummemoryof1GB,acontainer
thatrequires1.2GBreceives1.5GB.Youcansetmaximummemoryallocationequaltoyarn.nodemanager.resource.memorymb.
Thedefaultminimumandincrementvalueforvcoresis1.Becauseapplicationtasksarenotcommonlymultithreaded,yougenerallydonotneedtochangethisvalue.Themaximumvalueisusuallyequalto
yarn.nodemanager.resource.cpuvcores.Reducethisvaluetolimitthenumberofcontainersrunningconcurrentlyononenode.
Theexampleleavesmorethan50GBRAMavailableforcontainers,whichaccommodatesthefollowingsettings:
Table5.ResourceManagerCalculations
Property
Value
yarn.scheduler.minimumallocationmb
2,048MB
yarn.scheduler.maximumallocationmb
137,830MB
yarn.scheduler.maximumallocationvcores
18
ConfiguringYARNSettings
YoucanchangetheYARNsettingsthatcontrolMapReduceapplications.Aclientcanoverridethesevaluesifrequired,uptotheconstraintsenforcedbytheResourceManagerorNodeManager.Thereare
ninetasksettings,threeeachformappers,reducers,andtheApplicationMasteritself:
Table6.Gateway/ClientProperties
Property
Description
Default
mapreduce.map.memory.mb
Theamountofphysicalmemory,inMB,allocatedforeachmaptaskofajob.
1GB
mapreduce.map.java.opts.max.heap
ThemaximumJavaheapsize,inbytes,ofthemapprocesses.
800MB
mapreduce.map.cpu.vcores
ThenumberofvirtualCPUcoresallocatedforeachmaptaskofajob.
mapreduce.reduce.memory.mb
Theamountofphysicalmemory,inMB,allocatedforeachreducetaskofajob.
1GB
mapreduce.reduce.java.opts.max.heap
ThemaximumJavaheapsize,inbytes,ofthereduceprocesses.
800MB
mapreduce.reduce.cpu.vcores
ThenumberofvirtualCPUcoresforeachreducetaskofajob.
yarn.app.mapreduce.am.resource.mb
Thephysicalmemoryrequirement,inMB,fortheApplicationMaster.
1GB
ApplicationMasterJavamaximumheapsize
Themaximumheapsize,inbytes,oftheJavaMapReduceApplicationMaster.
800MB
ExposedinClouderaManageraspartoftheYARNserviceconfiguration.This
valueisfoldedintothepropertyyarn.app.mapreduce.am.commandopts.
yarn.app.mapreduce.am.resource.cpuvcores
ThevirtualCPUcoresrequirementfortheApplicationMaster.
Thesettingsformapreduce.[map|reduce].java.opts.max.heapspecifythedefaultmemoryallottedformapperandreducerheapsize,respectively.Themapreduce.[map|reduce].memory.mb
settingsspecifymemoryallottedtheircontainers,andthevalueassignedshouldallowoverheadbeyondthetaskheapsize.Clouderarecommendsapplyingafactorof1.2tothemapreduce.[map|
reduce].java.opts.max.heapsetting.Theoptimalvaluedependsontheactualtasks.Clouderaalsorecommendssettingmapreduce.map.memory.mbto12GBandsetting
mapreduce.reduce.memory.mbtotwicethemappervalue.TheApplicationMasterheapsizeis1GBbydefault,andcanbeincreasedifyourjobscontainmanyconcurrenttasks.Usingtheseguides,size
theexampleworkernodeasfollows:
Table7.Gateway/ClientCalculations
Property
Value
mapreduce.map.memory.mb
2048MB
mapreduce.reduce.memory.mb
4096MB
mapreduce.map.java.opts.max.heap
0.8x2,048=1,638MB
mapreduce.reduce.java.opts.max.heap
0.8x4,096=3,277MB
DefiningContainers
WithYARNworkerresourcesconfigured,youcandeterminehowmanycontainersbestsupportaMapReduceapplication,basedonjobtypeandsystemresources.Forexample,aCPUboundworkload
suchasaMonteCarlosimulationrequiresverylittledatabutcomplex,iterativeprocessing.TheratioofconcurrentcontainerstospindleislikelygreaterthanforanETLworkload,whichtendstobeI/O
bound.Forapplicationsthatusealotofmemoryinthemaporreducephase,thenumberofcontainersthatcanbescheduledislimitedbyRAMavailabletothecontainerandtheRAMrequiredbythetask.
OtherapplicationsmaybelimitedbasedonvcoresnotinusebyotherYARNapplicationsortherulesemployedbydynamicresourcepools(ifused).
Tocalculatethenumberofcontainersformappersandreducersbasedonactualsystemconstraints,startwiththefollowingformulas:
Table8.ContainerFormulas
Property
Value
mapreduce.job.maps
MIN(yarn.nodemanager.resource.memorymb/mapreduce.map.memory.mb,yarn.nodemanager.resource.cpuvcores/
mapreduce.map.cpu.vcores,numberofphysicaldrivesxworkloadfactor)xnumberofworkernodes
mapreduce.job.reduces
MIN(yarn.nodemanager.resource.memorymb/mapreduce.reduce.memory.mb,yarn.nodemanager.resource.cpuvcores/
mapreduce.reduce.cpu.vcores,#ofphysicaldrivesxworkloadfactor)x#ofworkernodes
Theworkloadfactorcanbesetto2.0formostworkloads.ConsiderahighersettingforCPUboundworkloads.
ManyotherfactorscaninfluencetheperformanceofaMapReduceapplication,including:
Configuredrackawareness
Skewedorimbalanceddata
Networkthroughput
Cotenancydemand(otherservicesorapplicationsusingthecluster)
Dynamicresourcepooling
YoumayalsohavetomaximizeorminimizeclusterutilizationforyourworkloadortomeetServiceLevelAgreements(SLAs).Tofindthebestresourceconfigurationforanapplication,tryvariouscontainer
andgateway/clientsettingsandrecordtheresults.
Forexample,thefollowingTeraGen/TeraSortscriptsupportsthroughputtestingwitha10GBdataloadandaloopofvaryingYARNcontainerandgateway/clientsettings.Youcanobservewhich
configurationyieldsthebestresults.
#!/bin/sh
HADOOP_PATH=/opt/cloudera/parcels/CDH/lib/hadoop0.20mapreduce
foriin248163264#Numberofmappercontainerstotest
do
forjin248163264#Numberofreducercontainerstotest
do
forkin10242048#Containermemoryformappers/reducerstotest
do
MAP_MB=`echo"($k*0.8)/1"|bc`#JVMheapsizeformappers
RED_MB=`echo"($k*0.8)/1"|bc`#JVMheapsizeforreducers
hadoopjar$HADOOP_PATH/hadoopexamples.jarteragen
Dmapreduce.job.maps=$iDmapreduce.map.memory.mb=$k
Dmapreduce.map.java.opts.max.heap=$MAP_MB100000000
/results/tg10GB${i}${j}${k}1>tera_${i}_${j}_${k}.out2>tera_${i}_${j}_${k}.err
hadoopjar$HADOOP_PATH/hadoopexamples.jarterasort
Dmapreduce.job.maps=$iDmapreduce.job.reduces=$jDmapreduce.map.memory.mb=$k
Dmapreduce.map.java.opts.max.heap=$MAP_MBDmapreduce.reduce.memory.mb=$k
Dmapreduce.reduce.java.opts.max.heap=$RED_MB/results/ts10GB${i}${j}${k}
1>>tera_${i}_${j}_${k}.out2>>tera_${i}_${j}_${k}.err
hadoopfsrmrskipTrash/results/tg10GB${i}${j}${k}
hadoopfsrmrskipTrash/results/ts10GB${i}${j}${k}
done
done
done
PagegeneratedOctober22,2015.
<<MigratingfromMapReduce1(MRv1)toMapReduce2(MRv2,YARN)
DeployingCDH5onaCluster>>
2015Cloudera,Inc.Allrightsreserved
TermsandConditionsPrivacyPolicy

Tuning The Cluster For MapReduce v2 (YARN)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tuning The Cluster For MapReduce v2 (YARN)

Uploaded by

Copyright:

Available Formats

Cloudera.

You might also like