You are on page 1of 7

Cloudera.

com

Training

Support

Documentation

DevCenter

| ContactUs

Downloads

ThisisthedocumentationforCloudera5.3.x.DocumentationforotherversionsisavailableatClouderaDocumentation.

TuningtheClusterforMapReducev2(YARN)
ThistopicappliestoYARNclustersonly,anddescribeshowtotuneandoptimizeYARNforyourcluster.Itintroducesthefollowingterms:
ResourceManager:Amasterdaemonthatauthorizessubmittedjobstorun,assignsanApplicationMastertothem,andenforcesresourcelimits.
NodeManager:AworkerdaemonthatlaunchesApplicationMasterandtaskcontainers.
ApplicationMaster:Asupervisorytaskthatrequeststheresourcesneededforexecutortasks.AnApplicationMasterrunsonadifferentNodeManagerforeachapplication.TheApplicationMaster
requestscontainers,whicharesizedbytheresourcesataskrequirestorun.
vcore:VirtualCPUcorealogicalunitofprocessingpower.Inabasiccase,itisequivalenttoaphysicalCPUcoreorhyperthreadedvirtualCPUcore.
Container:Aresourcebucketandprocessspaceforatask.Acontainersresourcesconsistofvcoresandmemory.

IdentifyingHardwareResourcesandServiceDemand
BeginYARNtuningbycomparinghardwareresourcesontheworkernodetothesumdemandoftheworkerservicesyouintendtorun.First,determinehowmanyvcores,howmuchmemory,andhowmany
spindlesareavailableforHadoopoperationsoneachworkernode.Then,estimateservicedemand,ortheresourcesneededtorunaYARNNodeManagerandHDFSDataNodeprocess.Theremaybeother
HadoopservicesthatdonotsubscribetoYARN,including:
Impalad
HBaseRegionServer
Solr
Workernodesalsorunsystemsupportservicesandpossiblythirdpartymonitoringorassetmanagementservices.ThisincludestheLinuxoperatingsystem.

EstimatingandConfiguringResourceRequirements
Afteridentifyinghardwareandsoftwareservices,youcanestimatetheCPUcoresandmemoryeachservicerequires.Thedifferencebetweenthehardwarecomplementandthissumistheamountof
resourcesyoucanassigntoYARNwithoutcreatingcontention.Clouderarecommendsstartingwiththeseestimates:
1020%ofRAMforLinuxanditsdaemonservices
Atleast16GBRAMforanImpaladprocess
Nomorethan1216GBRAMforanHBaseRegionServerprocess

Inaddition,youmustallowresourcesfortaskbuffers,suchastheHDFSSortI/Obuffer.Forvcoredemand,considerthenumberofconcurrentprocessesortaskseachservicerunsasaninitialguide.For
theoperatingsystem,startwithacountoftwo.
Thefollowingtableshowsexampledemandestimatesforaworkernodewith24vcoresand256GBofmemory.Servicesthatarenotexpectedtorunareallocatedzeroresources.
Table1.ResourceDemandEstimates:24vcores,256GBRAM

Service

vcores

Memory(MB)

Operatingsystem

YARNNodeManager

HDFSDataNode

1,024

ImpalaDaemon

16,348

HBaseRegionServer

SolrServer

ClouderaManageragent

1,024

Taskoverhead

52,429

YARNcontainers

18

137,830

Total

24

262,144

YoucannowconfigureYARNtousetheremainingresourcesforitssupervisoryprocessesandtaskcontainers.StartwiththeNodeManager,whichhasthefollowingsettings:
Table2.NodeManagerProperties

Property

Description

Default

yarn.nodemanager.resource.cpuvcores

NumberofvirtualCPUcoresthatcanbeallocatedforcontainers.

yarn.nodemanager.resource.memorymb

Amountofphysicalmemory,inMB,thatcanbeallocatedforcontainers.

8GB

HadoopisadiskI/Ocentricplatformbydesign.Thenumberofindependentphysicaldrives(spindles)dedicatedtoDataNodeuselimitshowmuchconcurrentprocessinganodecansustain.Asaresult,
thenumberofvcoresallocatedtotheNodeManagershouldbethelesserofeither:

(totalvcores)(numberofvcoresreservedfornonYARNuse),or
2x(numberofphysicaldisksusedforDataNodestorage)
TheamountofRAMallottedtoaNodeManagerforspawningcontainersshouldbethedifferencebetweenanodesphysicalRAMminusallnonYARNmemorydemand.So
yarn.nodemanager.resource.memorymb=totalmemoryonthenode(sumofallmemoryallocationstootherprocessessuchasDataNode,NodeManager,RegionServeretc.)Fortheexamplenode,
assumingtheDataNodehas10physicaldrives,thecalculationis:
Table3.NodeManagerRAMCalculation

Property

Value

yarn.nodemanager.resource.cpuvcores

min(246,2x10)=18

yarn.nodemanager.resource.memorymb

137,830MB

SizingtheResourceManager
TheResourceManagerenforceslimitsonYARNcontainerresourcesandcanrejectNodeManagercontainerrequestswhenrequired.TheResourceManagerhassixpropertiestospecifytheminimum,
maximum,andincrementalallotmentsofvcoresandmemoryavailableforarequest.
Table4.ResourceManagerProperties

Property

Description

Default

yarn.scheduler.minimumallocationvcores

ThesmallestnumberofvirtualCPUcoresthatcanberequestedfora
container.

yarn.scheduler.maximumallocationvcores

ThelargestnumberofvirtualCPUcoresthatcanberequestedfora
container.

32

yarn.scheduler.incrementallocationvcores

IfusingtheFairScheduler,virtualcorerequestsareroundeduptothe
nearestmultipleofthisnumber.

yarn.scheduler.minimumallocationmb

Thesmallestamountofphysicalmemory,inMB,thatcanberequestedfor
acontainer.

1GB

yarn.scheduler.maximumallocationmb

Thelargestamountofphysicalmemory,inMB,thatcanberequestedfora
container.

64GB

yarn.scheduler.incrementallocationmb

IfyouareusingtheFairScheduler,memoryrequestsareroundeduptothe
nearestmultipleofthisnumber.

512MB

IfaNodeManagerhas50GBormoreRAMavailableforcontainers,considerincreasingtheminimumallocationto2GB.Thedefaultmemoryincrementis512MB.Forminimummemoryof1GB,acontainer
thatrequires1.2GBreceives1.5GB.Youcansetmaximummemoryallocationequaltoyarn.nodemanager.resource.memorymb.
Thedefaultminimumandincrementvalueforvcoresis1.Becauseapplicationtasksarenotcommonlymultithreaded,yougenerallydonotneedtochangethisvalue.Themaximumvalueisusuallyequalto
yarn.nodemanager.resource.cpuvcores.Reducethisvaluetolimitthenumberofcontainersrunningconcurrentlyononenode.
Theexampleleavesmorethan50GBRAMavailableforcontainers,whichaccommodatesthefollowingsettings:
Table5.ResourceManagerCalculations

Property

Value

yarn.scheduler.minimumallocationmb

2,048MB

yarn.scheduler.maximumallocationmb

137,830MB

yarn.scheduler.maximumallocationvcores

18

ConfiguringYARNSettings
YoucanchangetheYARNsettingsthatcontrolMapReduceapplications.Aclientcanoverridethesevaluesifrequired,uptotheconstraintsenforcedbytheResourceManagerorNodeManager.Thereare
ninetasksettings,threeeachformappers,reducers,andtheApplicationMasteritself:
Table6.Gateway/ClientProperties

Property

Description

Default

mapreduce.map.memory.mb

Theamountofphysicalmemory,inMB,allocatedforeachmaptaskofajob.

1GB

mapreduce.map.java.opts.max.heap

ThemaximumJavaheapsize,inbytes,ofthemapprocesses.

800MB

mapreduce.map.cpu.vcores

ThenumberofvirtualCPUcoresallocatedforeachmaptaskofajob.

mapreduce.reduce.memory.mb

Theamountofphysicalmemory,inMB,allocatedforeachreducetaskofajob.

1GB

mapreduce.reduce.java.opts.max.heap

ThemaximumJavaheapsize,inbytes,ofthereduceprocesses.

800MB

mapreduce.reduce.cpu.vcores

ThenumberofvirtualCPUcoresforeachreducetaskofajob.

yarn.app.mapreduce.am.resource.mb

Thephysicalmemoryrequirement,inMB,fortheApplicationMaster.

1GB

ApplicationMasterJavamaximumheapsize

Themaximumheapsize,inbytes,oftheJavaMapReduceApplicationMaster.

800MB

ExposedinClouderaManageraspartoftheYARNserviceconfiguration.This
valueisfoldedintothepropertyyarn.app.mapreduce.am.commandopts.
yarn.app.mapreduce.am.resource.cpuvcores

ThevirtualCPUcoresrequirementfortheApplicationMaster.

Thesettingsformapreduce.[map|reduce].java.opts.max.heapspecifythedefaultmemoryallottedformapperandreducerheapsize,respectively.Themapreduce.[map|reduce].memory.mb
settingsspecifymemoryallottedtheircontainers,andthevalueassignedshouldallowoverheadbeyondthetaskheapsize.Clouderarecommendsapplyingafactorof1.2tothemapreduce.[map|
reduce].java.opts.max.heapsetting.Theoptimalvaluedependsontheactualtasks.Clouderaalsorecommendssettingmapreduce.map.memory.mbto12GBandsetting
mapreduce.reduce.memory.mbtotwicethemappervalue.TheApplicationMasterheapsizeis1GBbydefault,andcanbeincreasedifyourjobscontainmanyconcurrenttasks.Usingtheseguides,size
theexampleworkernodeasfollows:
Table7.Gateway/ClientCalculations

Property

Value

mapreduce.map.memory.mb

2048MB

mapreduce.reduce.memory.mb

4096MB

mapreduce.map.java.opts.max.heap

0.8x2,048=1,638MB

mapreduce.reduce.java.opts.max.heap

0.8x4,096=3,277MB

DefiningContainers
WithYARNworkerresourcesconfigured,youcandeterminehowmanycontainersbestsupportaMapReduceapplication,basedonjobtypeandsystemresources.Forexample,aCPUboundworkload
suchasaMonteCarlosimulationrequiresverylittledatabutcomplex,iterativeprocessing.TheratioofconcurrentcontainerstospindleislikelygreaterthanforanETLworkload,whichtendstobeI/O
bound.Forapplicationsthatusealotofmemoryinthemaporreducephase,thenumberofcontainersthatcanbescheduledislimitedbyRAMavailabletothecontainerandtheRAMrequiredbythetask.
OtherapplicationsmaybelimitedbasedonvcoresnotinusebyotherYARNapplicationsortherulesemployedbydynamicresourcepools(ifused).
Tocalculatethenumberofcontainersformappersandreducersbasedonactualsystemconstraints,startwiththefollowingformulas:
Table8.ContainerFormulas

Property

Value

mapreduce.job.maps

MIN(yarn.nodemanager.resource.memorymb/mapreduce.map.memory.mb,yarn.nodemanager.resource.cpuvcores/
mapreduce.map.cpu.vcores,numberofphysicaldrivesxworkloadfactor)xnumberofworkernodes

mapreduce.job.reduces

MIN(yarn.nodemanager.resource.memorymb/mapreduce.reduce.memory.mb,yarn.nodemanager.resource.cpuvcores/
mapreduce.reduce.cpu.vcores,#ofphysicaldrivesxworkloadfactor)x#ofworkernodes

Theworkloadfactorcanbesetto2.0formostworkloads.ConsiderahighersettingforCPUboundworkloads.
ManyotherfactorscaninfluencetheperformanceofaMapReduceapplication,including:
Configuredrackawareness
Skewedorimbalanceddata
Networkthroughput
Cotenancydemand(otherservicesorapplicationsusingthecluster)
Dynamicresourcepooling
YoumayalsohavetomaximizeorminimizeclusterutilizationforyourworkloadortomeetServiceLevelAgreements(SLAs).Tofindthebestresourceconfigurationforanapplication,tryvariouscontainer
andgateway/clientsettingsandrecordtheresults.
Forexample,thefollowingTeraGen/TeraSortscriptsupportsthroughputtestingwitha10GBdataloadandaloopofvaryingYARNcontainerandgateway/clientsettings.Youcanobservewhich
configurationyieldsthebestresults.
#!/bin/sh
HADOOP_PATH=/opt/cloudera/parcels/CDH/lib/hadoop0.20mapreduce
foriin248163264#Numberofmappercontainerstotest
do

forjin248163264#Numberofreducercontainerstotest

do

forkin10242048#Containermemoryformappers/reducerstotest

do

MAP_MB=`echo"($k*0.8)/1"|bc`#JVMheapsizeformappers

RED_MB=`echo"($k*0.8)/1"|bc`#JVMheapsizeforreducers

hadoopjar$HADOOP_PATH/hadoopexamples.jarteragen
Dmapreduce.job.maps=$iDmapreduce.map.memory.mb=$k
Dmapreduce.map.java.opts.max.heap=$MAP_MB100000000
/results/tg10GB${i}${j}${k}1>tera_${i}_${j}_${k}.out2>tera_${i}_${j}_${k}.err

hadoopjar$HADOOP_PATH/hadoopexamples.jarterasort
Dmapreduce.job.maps=$iDmapreduce.job.reduces=$jDmapreduce.map.memory.mb=$k
Dmapreduce.map.java.opts.max.heap=$MAP_MBDmapreduce.reduce.memory.mb=$k
Dmapreduce.reduce.java.opts.max.heap=$RED_MB/results/ts10GB${i}${j}${k}
1>>tera_${i}_${j}_${k}.out2>>tera_${i}_${j}_${k}.err

hadoopfsrmrskipTrash/results/tg10GB${i}${j}${k}
hadoopfsrmrskipTrash/results/ts10GB${i}${j}${k}

done

done
done

PagegeneratedOctober22,2015.
<<MigratingfromMapReduce1(MRv1)toMapReduce2(MRv2,YARN)

DeployingCDH5onaCluster>>
2015Cloudera,Inc.Allrightsreserved
TermsandConditionsPrivacyPolicy

You might also like