You are on page 1of 25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

Callus18446966465(USTollFree)
Home
Courses
Blog
Tutorials
InterviewQuestions
ProjectExperience
SignIn

BuildProjects,LearnSkills,GetHired
RequestInfo

Top100HadoopInterviewQuestionsandAnswers
2016
21Aug2015
LatestUpdatemadeonJuly1st,2016.
BigDataandHadoopisaconstantlychangingfieldwhichrequiredpeopletoquicklyupgradetheirskills,to
fittherequirementsforHadooprelatedjobs.IfyouareapplyingforaHadoopjobrole,itisbesttobe
preparedtoansweranyHadoopinterviewquestionthatmightcomeyourway.Wewillkeepupdatingthislist
ofHadoopInterviewquestions,tosuitthecurrentindustrystandards.
IfyouwouldlikemoreinformationaboutBigDatacareers,pleaseclicktheorange"RequestInfo"buttonon
topofthispage.
Withmorethan30,000openHadoopdeveloperjobs,professionalsmustfamiliarizethemselveswiththeeach
andeverycomponentoftheHadoopecosystemtomakesurethattheyhaveadeepunderstandingofwhat
Hadoopissothattheycanformaneffectiveapproachtoagivenbigdataproblem.Tohelpyougetstarted,
DeZyrepresentedacomprehensivelistofTop50HadoopDeveloperInterviewQuestionsaskedduring
recentHadoopjobinterviews.
WiththehelpofDeZyresHadoopInstructors,wehaveputtogetheradetailedlistofHadooplatestinterview
questionsbasedonthedifferentcomponentsoftheHadoopEcosystemsuchasMapReduce,Hive,HBase,
Pig,YARN,Flume,Sqoop,HDFS,etc.

Wehadtospendlotsofhoursresearchinganddeliberatingonwhatarethebestpossibleanswerstothese
interviewquestions.Wewouldlovetoinvitepeoplefromtheindustryhadoopdevelopers,hadoop
https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

1/25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

adminsandarchitectstokindlyhelpusandeveryoneelsewithansweringtheunansweredquestions.

HerearetopHadoopDeveloperInterviewQuestionsandAnswersbasedondifferentcomponentsofthe
HadoopEcosystem

1)HadoopBasicInterviewQuestions
2)HadoopHDFSInterviewQuestions
3)MapReduceInterviewQuestions
4)HadoopHBaseInterviewQuestions
5)HadoopSqoopInterviewQuestions
6)HadoopFlumeInterviewQuestions
7)HadoopZookeeperInterviewQuestions
8)PigInterviewQuestions
9)HiveInterviewQuestions
10)HadoopYARNInterviewQuestions

BigDataHadoopInterviewQuestionsandAnswers
TheseareHadoopBasicInterviewQuestionsandAnswersforfreshersandexperienced.
1.WhatisBigData?ClickheretoTweet
Bigdataisdefinedasthevoluminousamountofstructured,unstructuredorsemistructureddatathathas
hugepotentialforminingbutissolargethatitcannotbeprocessedusingtraditionaldatabasesystems.Big
dataischaracterizedbyitshighvelocity,volumeandvarietythatrequirescosteffectiveandinnovative
methodsforinformationprocessingtodrawmeaningfulbusinessinsights.Morethanthevolumeofthedata
itisthenatureofthedatathatdefineswhetheritisconsideredasBigDataornot.
HereisaninterestingandexplanatoryvisualonWhatisBigData?
https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

2/25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

Hadoop Training- What is Big Data by DeZyre.com

2.WhatdothefourVsofBigDatadenote?ClickheretoTweet
IBMhasanice,simpleexplanationforthefourcriticalfeaturesofbigdata:
a)VolumeScaleofdata
b)VelocityAnalysisofstreamingdata
c)VarietyDifferentformsofdata
d)VeracityUncertaintyofdata
HereisanexplanatoryvideoonthefourVsofBigData

Hadoop Training- Four V's of Big Data by DeZyre.com

3.Howbigdataanalysishelpsbusinessesincreasetheirrevenue?Giveexample.ClickheretoTweet
BigdataanalysisishelpingbusinessesdifferentiatethemselvesforexampleWalmarttheworldslargest
retailerin2014intermsofrevenueisusingbigdataanalyticstoincreaseitssalesthroughbetterpredictive
analytics,providingcustomizedrecommendationsandlaunchingnewproductsbasedoncustomer
preferencesandneeds.Walmartobservedasignificant10%to15%increaseinonlinesalesfor$1billionin
incrementalrevenue.TherearemanymorecompanieslikeFacebook,Twitter,LinkedIn,Pandora,JPMorgan
Chase,BankofAmerica,etc.usingbigdataanalyticstoboosttheirrevenue.
https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

3/25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

Hereisaninterestingvideothatexplainshowvariousindustriesareleveragingbigdataanalysistoincrease
theirrevenue

Hadoop Training- Top 10 industries using Big Data by...

4.NamesomecompaniesthatuseHadoop.Clickheretotweetthisquestion
Yahoo(Oneofthebiggestuser&morethan80%codecontributortoHadoop)
Facebook
Netflix
Amazon
Adobe
eBay
Hulu
Spotify
Rubikloud
Twitter

WhatcompaniesareyouapplyingtoforHadoopjobroles?
Enteryournamehere...

Writeyouranswerhere...

Submit

ClickonthislinktoviewadetailedlistofsomeofthetopcompaniesusingHadoop.
5.DifferentiatebetweenStructuredandUnstructureddata.ClickheretoTweet
Datawhichcanbestoredintraditionaldatabasesystemsintheformofrowsandcolumns,forexamplethe
onlinepurchasetransactionscanbereferredtoasStructuredData.Datawhichcanbestoredonlypartiallyin
traditionaldatabasesystems,forexample,datainXMLrecordscanbereferredtoassemistructureddata.
Unorganizedandrawdatathatcannotbecategorizedassemistructuredorstructureddataisreferredtoas
unstructureddata.Facebookupdates,TweetsonTwitter,Reviews,weblogs,etc.areallexamplesof
unstructureddata.
6.OnwhatconcepttheHadoopframeworkworks?ClickheretoTweet
HadoopFrameworkworksonthefollowingtwocorecomponents
1)HDFSHadoopDistributedFileSystemisthejavabasedfilesystemforscalableandreliablestorageof
largedatasets.DatainHDFSisstoredintheformofblocksanditoperatesontheMasterSlaveArchitecture.

https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

4/25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

2)HadoopMapReduceThisisajavabasedprogrammingparadigmofHadoopframeworkthatprovides
scalabilityacrossvariousHadoopclusters.MapReducedistributestheworkloadintovarioustasksthatcan
runinparallel.Hadoopjobsperform2separatetasksjob.Themapjobbreaksdownthedatasetsintokey
valuepairsortuples.Thereducejobthentakestheoutputofthemapjobandcombinesthedatatuplestointo
smallersetoftuples.Thereducejobisalwaysperformedafterthemapjobisexecuted.
HereisavisualthatclearlyexplaintheHDFSandHadoopMapReduceConcepts

Hadoop Training- Denition of Hadoop Ecosystem, H...

7)WhatarethemaincomponentsofaHadoopApplication?ClickheretoTweet
Hadoopapplicationshavewiderangeoftechnologiesthatprovidegreatadvantageinsolvingcomplex
businessproblems.
CorecomponentsofaHadoopapplicationare
1)HadoopCommon
2)HDFS
3)HadoopMapReduce
4)YARN
DataAccessComponentsarePigandHive
DataStorageComponentisHBase
DataIntegrationComponentsareApacheFlume,Sqoop,Chukwa
DataManagementandMonitoringComponentsareAmbari,OozieandZookeeper.
DataSerializationComponentsareThriftandAvro
DataIntelligenceComponentsareApacheMahoutandDrill.
8.WhatisHadoopstreaming?ClickheretoTweet
https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

5/25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

HadoopdistributionhasagenericapplicationprogramminginterfaceforwritingMapandReducejobsinany
desiredprogramminglanguagelikePython,Perl,Ruby,etc.ThisisreferredtoasHadoopStreaming.Users
cancreateandrunjobswithanykindofshellscriptsorexecutableastheMapperorReducers.
9.WhatisthebesthardwareconfigurationtorunHadoop?ClickheretoTweet
ThebestconfigurationforexecutingHadoopjobsisdualcoremachinesordualprocessorswith4GBor8GB
RAMthatuseECCmemory.HadoophighlybenefitsfromusingECCmemorythoughitisnotlowend.
ECCmemoryisrecommendedforrunningHadoopbecausemostoftheHadoopusershaveexperienced
variouschecksumerrorsbyusingnonECCmemory.However,thehardwareconfigurationalsodependson
theworkflowrequirementsandcanchangeaccordingly.
10.WhatarethemostcommonlydefinedinputformatsinHadoop?ClickheretoTweet
ThemostcommonInputFormatsdefinedinHadoopare:
TextInputFormatThisisthedefaultinputformatdefinedinHadoop.
KeyValueInputFormatThisinputformatisusedforplaintextfileswhereinthefilesarebroken
downintolines.
SequenceFileInputFormatThisinputformatisusedforreadingfilesinsequence.
WehavefurthercategorizedBigDataInterviewQuestionsforFreshersandExperienced
HadoopInterviewQuestionsandAnswersforFreshersQ.Nos1,2,4,5,6,7,8,9
HadoopInterviewQuestionsandAnswersforExperiencedQ.Nos3,8,9,10
ForadetailedPDFreportonHadoopSalaries CLICKHERE

HadoopHDFSInterviewQuestionsandAnswers
1.WhatisablockandblockscannerinHDFS?ClickheretoTweet
BlockTheminimumamountofdatathatcanbereadorwrittenisgenerallyreferredtoasablockin
HDFS.ThedefaultsizeofablockinHDFSis64MB.
BlockScannerBlockScannertracksthelistofblockspresentonaDataNodeandverifiesthemtofindany
kindofchecksumerrors.BlockScannersuseathrottlingmechanismtoreservediskbandwidthonthe
datanode.
2.ExplainthedifferencebetweenNameNode,BackupNodeandCheckpointNameNode.Clickhereto
Tweet
NameNode:NameNodeisattheheartoftheHDFSfilesystemwhichmanagesthemetadatai.e.thedataof
thefilesisnotstoredontheNameNodebutratherithasthedirectorytreeofallthefilespresentintheHDFS
filesystemonahadoopcluster.NameNodeusestwofilesforthenamespace
fsimagefileItkeepstrackofthelatestcheckpointofthenamespace.
editsfileItisalogofchangesthathavebeenmadetothenamespacesincecheckpoint.
CheckpointNode
CheckpointNodekeepstrackofthelatestcheckpointinadirectorythathassamestructureasthatof
NameNodesdirectory.Checkpointnodecreatescheckpointsforthenamespaceatregularintervalsby
downloadingtheeditsandfsimagefilefromtheNameNodeandmergingitlocally.Thenewimageisthen
againupdatedbacktotheactiveNameNode.
BackupNode:
https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

6/25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

BackupNodealsoprovidescheckpointingfunctionalitylikethatofthecheckpointnodebutitalsomaintains
itsuptodateinmemorycopyofthefilesystemnamespacethatisinsyncwiththeactiveNameNode.
3.Whatiscommodityhardware?ClickheretoTweet
CommodityHardwarereferstoinexpensivesystemsthatdonothavehighavailabilityorhighquality.
CommodityHardwareconsistsofRAMbecausetherearespecificservicesthatneedtobeexecutedon
RAM.Hadoopcanberunonanycommodityhardwareanddoesnotrequireanysupercomputersorhigh
endhardwareconfigurationtoexecutejobs.
4.WhatistheportnumberforNameNode,TaskTrackerandJobTracker?ClickheretoTweet
NameNode50070
JobTracker50030
TaskTracker50060
5.Explainabouttheprocessofinterclusterdatacopying.ClickheretoTweet
HDFSprovidesadistributeddatacopyingfacilitythroughtheDistCPfromsourcetodestination.Ifthisdata
copyingiswithinthehadoopclusterthenitisreferredtoasinterclusterdatacopying.DistCPrequiresboth
sourceanddestinationtohaveacompatibleorsameversionofhadoop.
6.HowcanyouoverwritethereplicationfactorsinHDFS?ClickheretoTweet
ThereplicationfactorinHDFScanbemodifiedoroverwrittenin2ways
1)UsingtheHadoopFSShell,replicationfactorcanbechangedperfilebasisusingthebelowcommand
$hadoopfssetrepw2/my/test_file(test_fileisthefilenamewhosereplicationfactorwillbesetto2)
2)UsingtheHadoopFSShell,replicationfactorofallfilesunderagivendirectorycanbemodifiedusingthe
belowcommand
3)$hadoopfssetrepw5/my/test_dir(test_diristhenameofthedirectoryandallthefilesinthisdirectory
willhaveareplicationfactorsetto5)
7.ExplainthedifferencebetweenNASandHDFS.ClickheretoTweet
NASrunsonasinglemachineandthusthereisnoprobabilityofdataredundancywhereasHDFSruns
onaclusterofdifferentmachinesthusthereisdataredundancybecauseofthereplicationprotocol.
NASstoresdataonadedicatedhardwarewhereasinHDFSallthedatablocksaredistributedacross
localdrivesofthemachines.
InNASdataisstoredindependentofthecomputationandhenceHadoopMapReducecannotbeused
forprocessingwhereasHDFSworkswithHadoopMapReduceasthecomputationsinHDFSare
movedtodata.

Whattechnologiesareyouworkingoncurrently?(Java,Datawarehouse,BusinessIntelligence,
ETL,etc.)
Enteryournamehere...

Writeyouranswerhere...

Submit

8.ExplainwhathappensifduringthePUToperation,HDFSblockisassignedareplicationfactor1
insteadofthedefaultvalue3.ClickheretoTweet
ReplicationfactorisapropertyofHDFSthatcanbesetaccordinglyfortheentireclustertoadjustthe
numberoftimestheblocksaretobereplicatedtoensurehighdataavailability.Foreveryblockthatisstored
https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

7/25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

inHDFS,theclusterwillhaven1duplicatedblocks.So,ifthereplicationfactorduringthePUToperationis
setto1insteadofthedefaultvalue3,thenitwillhaveasinglecopyofdata.Underthesecircumstanceswhen
thereplicationfactorissetto1,iftheDataNodecrashesunderanycircumstances,thenonlysinglecopyof
thedatawouldbelost.
9.WhatistheprocesstochangethefilesatarbitrarylocationsinHDFS?ClickheretoTweet
HDFSdoesnotsupportmodificationsatarbitraryoffsetsinthefileormultiplewritersbutfilesarewrittenby
asinglewriterinappendonlyformati.e.writestoafileinHDFSarealwaysmadeattheendofthefile.
10.ExplainabouttheindexingprocessinHDFS.ClickheretoTweet
IndexingprocessinHDFSdependsontheblocksize.HDFSstoresthelastpartofthedatathatfurtherpoints
totheaddresswherethenextpartofdatachunkisstored.
11.Whatisarackawarenessandonwhatbasisisdatastoredinarack?ClickheretoTweet
Allthedatanodesputtogetherformastorageareai.e.thephysicallocationofthedatanodesisreferredtoas
RackinHDFS.Therackinformationi.e.therackidofeachdatanodeisacquiredbytheNameNode.The
processofselectingcloserdatanodesdependingontherackinformationisknownasRackAwareness.
Thecontentspresentinthefilearedividedintodatablockassoonastheclientisreadytoloadthefileinto
thehadoopcluster.AfterconsultingwiththeNameNode,clientallocates3datanodesforeachdatablock.
Foreachdatablock,thereexists2copiesinonerackandthethirdcopyispresentinanotherrack.Thisis
generallyreferredtoastheReplicaPlacementPolicy.
12.WhathappenstoaNameNodethathasnodata?
TheredoesnotexistanyNameNodewithoutdata.IfitisaNameNodethenitshouldhavesomesortofdata
init.
13.WhathappenswhenausersubmitsaHadoopjobwhentheNameNodeisdowndoesthejobgetin
toholdordoesitfail.
TheHadoopjobfailswhentheNameNodeisdown.
14.WhathappenswhenausersubmitsaHadoopjobwhentheJobTrackerisdowndoesthejobget
intoholdordoesitfail.
TheHadoopjobfailswhentheJobTrackerisdown.
15.Wheneveraclientsubmitsahadoopjob,whoreceivesit?
NameNodereceivestheHadoopjobwhichthenlooksforthedatarequestedbytheclientandprovidesthe
blockinformation.JobTrackertakescareofresourceallocationofthehadoopjobtoensuretimely
completion.
WehavefurthercategorizedHadoopHDFSInterviewQuestionsforFreshersandExperienced
HadoopInterviewQuestionsandAnswersforFreshersQ.Nos2,3,7,9,10,11,13,14
HadoopInterviewQuestionsandAnswersforExperiencedQ.Nos1,2,4,5,6,7,8,12,15
HerearefewmorefrequentlyaskedHadoopHDFSInterviewQuestionsandAnswersforFreshersand
Experienced
ClickheretoknowmoreaboutourIBMCertifiedHadoopDevelopercourse

HadoopMapReduceInterviewQuestionsandAnswers
https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

8/25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

1.ExplaintheusageofContextObject.ClickheretoTweet
ContextObjectisusedtohelpthemapperinteractwithotherHadoopsystems.ContextObjectcanbeused
forupdatingcounters,toreporttheprogressandtoprovideanyapplicationlevelstatusupdates.
ContextObjecthastheconfigurationdetailsforthejobandalsointerfaces,thathelpsittogeneratingthe
output.
2.WhatarethecoremethodsofaReducer?ClickheretoTweet
The3coremethodsofareducerare
1)setup()Thismethodofthereducerisusedforconfiguringvariousparametersliketheinputdatasize,
distributedcache,heapsize,etc.
FunctionDefinitionpublicvoidsetup(context)
2)reduce()itisheartofthereducerwhichiscalledonceperkeywiththeassociatedreducetask.
FunctionDefinitionpublicvoidreduce(Key,Value,context)
3)cleanup()Thismethodiscalledonlyonceattheendofreducetaskforclearingallthetemporaryfiles.
FunctionDefinitionpublicvoidcleanup(context)
3.Explainaboutthepartitioning,shuffleandsortphaseClickheretoTweet
ShufflePhaseOncethefirstmaptasksarecompleted,thenodescontinuetoperformseveralothermaptasks
andalsoexchangetheintermediateoutputswiththereducersasrequired.Thisprocessofmovingthe
intermediateoutputsofmaptaskstothereducerisreferredtoasShuffling.
SortPhaseHadoopMapReduceautomaticallysortsthesetofintermediatekeysonasinglenodebefore
theyaregivenasinputtothereducer.
PartitioningPhaseTheprocessthatdetermineswhichintermediatekeysandvaluewillbereceivedbyeach
reducerinstanceisreferredtoaspartitioning.Thedestinationpartitionissameforanykeyirrespectiveofthe
mapperinstancethatgeneratedit.
4.HowtowriteacustompartitionerforaHadoopMapReducejob?ClickheretoTweet
StepstowriteaCustomPartitionerforaHadoopMapReduceJob
AnewclassmustbecreatedthatextendsthepredefinedPartitionerClass.
getPartitionmethodofthePartitionerclassmustbeoverridden.
ThecustompartitionertothejobcanbeaddedasaconfigfileinthewrapperwhichrunsHadoop
MapReduceorthecustompartitionercanbeaddedtothejobbyusingthesetmethodofthepartitioner
class.
WehavefurthercategorizedHadoopMapReduceInterviewQuestionsforFreshersandExperienced
HadoopInterviewQuestionsandAnswersforFreshersQ.Nos2
HadoopInterviewQuestionsandAnswersforExperiencedQ.Nos1,3,4,
HereareafewmorefrequentlyaskedHadoopMapReduceInterviewQuestionsandAnswers

HadoopHBaseInterviewQuestionsandAnswers
1.WhenshouldyouuseHBaseandwhatarethekeycomponentsofHBase?
HBaseshouldbeusedwhenthebigdataapplicationhas
https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

9/25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

1)Avariableschema
2)Whendataisstoredintheformofcollections
3)Iftheapplicationdemandskeybasedaccesstodatawhileretrieving.
KeycomponentsofHBaseare
RegionThiscomponentcontainsmemorydatastoreandHfile.
RegionServerThismonitorstheRegion.
HBaseMasterItisresponsibleformonitoringtheregionserver.
ZookeeperIttakescareofthecoordinationbetweentheHBaseMastercomponentandtheclient.
CatalogTablesThetwoimportantcatalogtablesareROOTandMETA.ROOTtabletrackswheretheMETA
tableisandMETAtablestoresalltheregionsinthesystem.
2.WhatarethedifferentoperationalcommandsinHBaseatrecordlevelandtablelevel?
RecordLevelOperationalCommandsinHBaseareput,get,increment,scananddelete.
TableLevelOperationalCommandsinHBasearedescribe,list,drop,disableandscan.
3.WhatisRowKey?
EveryrowinanHBasetablehasauniqueidentifierknownasRowKey.Itisusedforgroupingcellslogically
anditensuresthatallcellsthathavethesameRowKeysarecolocatedonthesameserver.RowKeyis
internallyregardedasabytearray.
4.ExplainthedifferencebetweenRDBMSdatamodelandHBasedatamodel.
RDBMSisaschemabaseddatabasewhereasHBaseisschemalessdatamodel.
RDBMSdoesnothavesupportforinbuiltpartitioningwhereasinHBasethereisautomatedpartitioning.
RDBMSstoresnormalizeddatawhereasHBasestoresdenormalizeddata.
5.ExplainaboutthedifferentcatalogtablesinHBase?
ThetwoimportantcatalogtablesinHBase,areROOTandMETA.ROOTtabletrackswheretheMETA
tableisandMETAtablestoresalltheregionsinthesystem.
6.Whatiscolumnfamilies?WhathappensifyoualtertheblocksizeofColumnFamilyonanalready
populateddatabase?
ThelogicaldeviationofdataisrepresentedthroughakeyknownascolumnFamily.Columnfamiliesconsist
ofthebasicunitofphysicalstorageonwhichcompressionfeaturescanbeapplied.Inanalreadypopulated
database,whentheblocksizeofcolumnfamilyisaltered,theolddatawillremainwithintheoldblocksize
whereasthenewdatathatcomesinwilltakethenewblocksize.Whencompactiontakesplace,theolddata
willtakethenewblocksizesothattheexistingdataisreadcorrectly.
7.ExplainthedifferencebetweenHBaseandHive.
HBaseandHivebotharecompletelydifferenthadoopbasedtechnologiesHiveisadatawarehouse
infrastructureontopofHadoopwhereasHBaseisaNoSQLkeyvaluestorethatrunsontopofHadoop.Hive
helpsSQLsavvypeopletorunMapReducejobswhereasHBasesupports4primaryoperationsput,get,scan
anddelete.HBaseisidealforrealtimequeryingofbigdatawhereHiveisanidealchoiceforanalytical
queryingofdatacollectedoverperiodoftime.
https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

10/25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

8.ExplaintheprocessofrowdeletioninHBase.
OnissuingadeletecommandinHBasethroughtheHBaseclient,dataisnotactuallydeletedfromthecells
butratherthecellsaremadeinvisiblebysettingatombstonemarker.Thedeletedcellsareremovedatregular
intervalsduringcompaction.
9.WhatarethedifferenttypesoftombstonemarkersinHBasefordeletion?
Thereare3differenttypesoftombstonemarkersinHBasefordeletion
1)FamilyDeleteMarkerThismarkersmarksallcolumnsforacolumnfamily.
2)VersionDeleteMarkerThismarkermarksasingleversionofacolumn.
3)ColumnDeleteMarkerThismarkersmarksalltheversionsofacolumn.
10.ExplainaboutHLogandWALinHBase.
AlleditsintheHStorearestoredintheHLog.EveryregionserverhasoneHLog.HLogcontainsentriesfor
editsofallregionsperformedbyaparticularRegionServer.WALabbreviatestoWriteAheadLog(WAL)in
whichalltheHLogeditsarewrittenimmediately.WALeditsremaininthememorytilltheflushperiodin
caseofdeferredlogflush.
WehavefurthercategorizedHadoopHBaseInterviewQuestionsforFreshersandExperienced
HadoopInterviewQuestionsandAnswersforFreshersQ.Nos1,2,4,5,7
HadoopInterviewQuestionsandAnswersforExperiencedQ.Nos2,3,6,8,9,10

HadoopSqoopInterviewQuestionsandAnswers
1.ExplainaboutsomeimportantSqoopcommandsotherthanimportandexport.
CreateJob(create)
Herewearecreatingajobwiththenamemyjob,whichcanimportthetabledatafromRDBMStableto
HDFS.Thefollowingcommandisusedtocreateajobthatisimportingdatafromtheemployeetableinthe
dbdatabasetotheHDFSfile.
$Sqoopjobcreatemyjob\
import\
connectjdbc:mysql://localhost/db\
usernameroot\
tableemployeem1
VerifyJob(list)
listargumentisusedtoverifythesavedjobs.Thefollowingcommandisusedtoverifythelistofsaved
Sqoopjobs.
$Sqoopjoblist
InspectJob(show)
showargumentisusedtoinspectorverifyparticularjobsandtheirdetails.Thefollowingcommandand
sampleoutputisusedtoverifyajobcalledmyjob.
https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

11/25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

$Sqoopjobshowmyjob
ExecuteJob(exec)
execoptionisusedtoexecuteasavedjob.Thefollowingcommandisusedtoexecuteasavedjobcalled
myjob.
$Sqoopjobexecmyjob
2.HowSqoopcanbeusedinaJavaprogram?
TheSqoopjarinclasspathshouldbeincludedinthejavacode.AfterthisthemethodSqoop.runTool()
methodmustbeinvoked.ThenecessaryparametersshouldbecreatedtoSqoopprogrammaticallyjustlike
forcommandline.
3.WhatistheprocesstoperformanincrementaldataloadinSqoop?
TheprocesstoperformincrementaldataloadinSqoopistosynchronizethemodifiedorupdateddata(often
referredasdeltadata)fromRDBMStoHadoop.Thedeltadatacanbefacilitatedthroughtheincremental
loadcommandinSqoop.
IncrementalloadcanbeperformedbyusingSqoopimportcommandorbyloadingthedataintohivewithout
overwritingit.ThedifferentattributesthatneedtobespecifiedduringincrementalloadinSqoopare
1)Mode(incremental)ThemodedefineshowSqoopwilldeterminewhatthenewrowsare.Themodecan
havevalueasAppendorLastModified.
2)Col(Checkcolumn)Thisattributespecifiesthecolumnthatshouldbeexaminedtofindouttherowsto
beimported.
3)Value(lastvalue)Thisdenotesthemaximumvalueofthecheckcolumnfromthepreviousimport
operation.
4.IsitpossibletodoanincrementalimportusingSqoop?
Yes,Sqoopsupportstwotypesofincrementalimports
1)Append
2)LastModified
ToinsertonlyrowsAppendshouldbeusedinimportcommandandforinsertingtherowsandalsoupdating
LastModifiedshouldbeusedintheimportcommand.
5.WhatisthestandardlocationorpathforHadoopSqoopscripts?
/usr/bin/HadoopSqoop
6.HowcanyoucheckallthetablespresentinasingledatabaseusingSqoop?
ThecommandtocheckthelistofalltablespresentinasingledatabaseusingSqoopisasfollows
Sqooplisttablesconnectjdbc:mysql://localhost/user
7.HowarelargeobjectshandledinSqoop?
Sqoopprovidesthecapabilitytostorelargesizeddataintoasinglefieldbasedonthetypeofdata.Sqoop
supportstheabilitytostore
1)CLOBsCharacterLargeObjects
https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

12/25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

2)BLOBsBinaryLargeObjects
LargeobjectsinSqooparehandledbyimportingthelargeobjectsintoafilereferredasLobFilei.e.Large
ObjectFile.TheLobFilehastheabilitytostorerecordsofhugesize,thuseachrecordintheLobFileisa
largeobject.
8.CanfreeformSQLqueriesbeusedwithSqoopimportcommand?Ifyes,thenhowcantheybe
used?
SqoopallowsustousefreeformSQLquerieswiththeimportcommand.Theimportcommandshouldbe
usedwiththeeandqueryoptionstoexecutefreeformSQLqueries.Whenusingtheeandquery
optionswiththeimportcommandthetargetdirvaluemustbespecified.
9.DifferentiatebetweenSqoopanddistCP.
DistCPutilitycanbeusedtotransferdatabetweenclusterswhereasSqoopcanbeusedtotransferdataonly
betweenHadoopandRDBMS.
10.WhatarethelimitationsofimportingRDBMStablesintoHcatalogdirectly?
ThereisanoptiontoimportRDBMStablesintoHcatalogdirectlybymakinguseofhcatalogdatabase
optionwiththehcatalogtablebutthelimitationtoitisthatthereareseveralargumentslikeasavrofile,
direct,assequencefile,targetdir,exportdirarenotsupported.
WehavefurthercategorizedHadoopSqoopInterviewQuestionsforFreshersandExperienced
HadoopInterviewQuestionsandAnswersforFreshersQ.Nos4,5,6,9
HadoopInterviewQuestionsandAnswersforExperiencedQ.Nos1,2,3,6,7,8,10
HerearefewmorefrequentlyaskedSqoopInterviewQuestionsandAnswersforFreshersandExperienced

HadoopFlumeInterviewQuestionsandAnswers
1)ExplainaboutthecorecomponentsofFlume.
ThecorecomponentsofFlumeare
EventThesinglelogentryorunitofdatathatistransported.
SourceThisisthecomponentthroughwhichdataentersFlumeworkflows.
SinkItisresponsiblefortransportingdatatothedesireddestination.
ChannelitistheductbetweentheSinkandSource.
AgentAnyJVMthatrunsFlume.
ClientThecomponentthattransmitseventtothesourcethatoperateswiththeagent.
2)DoesFlumeprovide100%reliabilitytothedataflow?
Yes,ApacheFlumeprovidesendtoendreliabilitybecauseofitstransactionalapproachindataflow.
3)HowcanFlumebeusedwithHBase?
ApacheFlumecanbeusedwithHBaseusingoneofthetwoHBasesinks
HBaseSink(org.apache.flume.sink.hbase.HBaseSink)supportssecureHBaseclustersandalsothe
novelHBaseIPCthatwasintroducedintheversionHBase0.96.
AsyncHBaseSink(org.apache.flume.sink.hbase.AsyncHBaseSink)hasbetterperformancethanHBase
https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

13/25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

sinkasitcaneasilymakenonblockingcallstoHBase.
WorkingoftheHBaseSink
InHBaseSink,aFlumeEventisconvertedintoHBaseIncrementsorPuts.Serializerimplementsthe
HBaseEventSerializerwhichistheninstantiatedwhenthesinkstarts.Foreveryevent,sinkcallstheinitialize
methodintheserializerwhichthentranslatestheFlumeEventintoHBaseincrementsandputstobesentto
HBasecluster.
WorkingoftheAsyncHBaseSink
AsyncHBaseSinkimplementstheAsyncHBaseEventSerializer.Theinitializemethodiscalledonlyonceby
thesinkwhenitstarts.SinkinvokesthesetEventmethodandthenmakescallstothegetIncrementsand
getActionsmethodsjustsimilartoHBasesink.Whenthesinkstops,thecleanUpmethodiscalledbythe
serializer.
4)ExplainaboutthedifferentchanneltypesinFlume.Whichchanneltypeisfaster?
The3differentbuiltinchanneltypesavailableinFlumeare
MEMORYChannelEventsarereadfromthesourceintomemoryandpassedtothesink.
JDBCChannelJDBCChannelstorestheeventsinanembeddedDerbydatabase.
FILEChannelFileChannelwritesthecontentstoafileonthefilesystemafterreadingtheeventfroma
source.Thefileisdeletedonlyafterthecontentsaresuccessfullydeliveredtothesink.
MEMORYChannelisthefastestchannelamongthethreehoweverhastheriskofdataloss.Thechannelthat
youchoosecompletelydependsonthenatureofthebigdataapplicationandthevalueofeachevent.
5)WhichisthereliablechannelinFlumetoensurethatthereisnodataloss?
FILEChannelisthemostreliablechannelamongthe3channelsJDBC,FILEandMEMORY.
6)ExplainaboutthereplicationandmultiplexingselectorsinFlume.
ChannelSelectorsareusedtohandlemultiplechannels.BasedontheFlumeheadervalue,aneventcanbe
writtenjusttoasinglechannelortomultiplechannels.Ifachannelselectorisnotspecifiedtothesource
thenbydefaultitistheReplicatingselector.Usingthereplicatingselector,thesameeventiswrittentoallthe
channelsinthesourceschannelslist.Multiplexingchannelselectorisusedwhentheapplicationhastosend
differenteventstodifferentchannels.
7)HowmultihopagentcanbesetupinFlume?
AvroRPCBridgemechanismisusedtosetupMultihopagentinApacheFlume.
8)DoesApacheFlumeprovidesupportforthirdpartyplugins?
MostofthedataanalystsuseApacheFlumehaspluginbasedarchitectureasitcanloaddatafromexternal
sourcesandtransferittoexternaldestinations.
9)IsitpossibletoleveragerealtimeanalysisonthebigdatacollectedbyFlumedirectly?Ifyes,then
explainhow.
DatafromFlumecanbeextracted,transformedandloadedinrealtimeintoApacheSolrserversusing
MorphlineSolrSink
10)DifferentiatebetweenFileSinkandFileRollSink
ThemajordifferencebetweenHDFSFileSinkandFileRollSinkisthatHDFSFileSinkwritestheeventsinto
https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

14/25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

theHadoopDistributedFileSystem(HDFS)whereasFileRollSinkstorestheeventsintothelocalfile
system.
HadoopFlumeInterviewQuestionsandAnswersforFreshersQ.Nos1,2,4,5,6,10
HadoopFlumeInterviewQuestionsandAnswersforExperiencedQ.Nos3,7,8,9

HadoopZookeeperInterviewQuestionsandAnswers
1)CanApacheKafkabeusedwithoutZookeeper?
ItisnotpossibletouseApacheKafkawithoutZookeeperbecauseiftheZookeeperisdownKafkacannot
serveclientrequest.
2)NameafewcompaniesthatuseZookeeper.
Yahoo,Solr,Helprace,Neo4j,Rackspace
3)WhatistheroleofZookeeperinHBasearchitecture?
InHBasearchitecture,ZooKeeperisthemonitoringserverthatprovidesdifferentservicesliketracking
serverfailureandnetworkpartitions,maintainingtheconfigurationinformation,establishingcommunication
betweentheclientsandregionservers,usabilityofephemeralnodestoidentifytheavailableserversinthe
cluster.
4)ExplainaboutZooKeeperinKafka
ApacheKafkausesZooKeepertobeahighlydistributedandscalablesystem.ZookeeperisusedbyKafkato
storevariousconfigurationsandusethemacrossthehadoopclusterinadistributedmanner.Toachieve
distributedness,configurationsaredistributedandreplicatedthroughouttheleaderandfollowernodesinthe
ZooKeeperensemble.WecannotdirectlyconnecttoKafkabybyepassingZooKeeperbecauseifthe
ZooKeeperisdownitwillnotbeabletoservetheclientrequest.
5)ExplainhowZookeeperworks
ZooKeeperisreferredtoastheKingofCoordinationanddistributedapplicationsuseZooKeepertostoreand
facilitateimportantconfigurationinformationupdates.ZooKeeperworksbycoordinatingtheprocessesof
distributedapplications.ZooKeeperisarobustreplicatedsynchronizationservicewitheventualconsistency.
Asetofnodesisknownasanensembleandpersisteddataisdistributedbetweenmultiplenodes.
3ormoreindependentserverscollectivelyformaZooKeeperclusterandelectamaster.Oneclientconnects
toanyofthespecificserverandmigratesifaparticularnodefails.TheensembleofZooKeepernodesisalive
tillthemajorityofnodsareworking.ThemasternodeinZooKeeperisdynamicallyselectedbythe
consensuswithintheensemblesoifthemasternodefailsthentheroleofmasternodewillmigratetoanother
nodewhichisselecteddynamically.WritesarelinearandreadsareconcurrentinZooKeeper.
6)ListsomeexamplesofZookeeperusecases.
FoundbyElasticusesZookeepercomprehensivelyforresourceallocation,leaderelection,high
prioritynotificationsanddiscovery.TheentireserviceofFoundbuiltupofvarioussystemsthatread
andwritetoZookeeper.
ApacheKafkathatdependsonZooKeeperisusedbyLinkedIn
StormthatreliesonZooKeeperisusedbypopularcompanieslikeGrouponandTwitter.
7)HowtouseApacheZookeepercommandlineinterface?
ZooKeeperhasacommandlineclientsupportforinteractiveuse.ThecommandlineinterfaceofZooKeeper
issimilartothefileandshellsystemofUNIX.DatainZooKeeperisstoredinahierarchyofZnodeswhere
eachznodecancontaindatajustsimilartoafile.Eachznodecanalsohavechildrenjustlikedirectoriesin
https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

15/25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

theUNIXfilesystem.
Zookeeperclientcommandisusedtolaunchthecommandlineclient.Iftheinitialpromptishiddenbythe
logmessagesafterenteringthecommand,userscanjusthitENTERtoviewtheprompt.
8)WhatarethedifferenttypesofZnodes?
Thereare2typesofZnodesnamelyEphemeralandSequentialZnodes.
TheZnodesthatgetdestroyedassoonastheclientthatcreateditdisconnectsarereferredtoas
EphemeralZnodes.
SequentialZnodeistheoneinwhichsequentialnumberischosenbytheZooKeeperensembleandis
prefixedwhentheclientassignsnametotheznode.
9)Whatarewatches?
Clientdisconnectionmightbetroublesomeproblemespeciallywhenweneedtokeepatrackonthestateof
Znodesatregularintervals.ZooKeeperhasaneventsystemreferredtoaswatchwhichcanbesetonZnode
totriggeraneventwheneveritisremoved,alteredoranynewchildrenarecreatedbelowit.
10)WhatproblemscanbeaddressedbyusingZookeeper?
Inthedevelopmentofdistributedsystems,creatingownprotocolsforcoordinatingthehadoopclusterresults
infailureandfrustrationforthedevelopers.Thearchitectureofadistributedsystemcanbeproneto
deadlocks,inconsistencyandraceconditions.Thisleadstovariousdifficultiesinmakingthehadoopcluster
fast,reliableandscalable.Toaddressallsuchproblems,ApacheZooKeepercanbeusedasacoordination
servicetowritecorrectdistributedapplicationswithouthavingtoreinventthewheelfromthebeginning.
HadoopZooKeeperInterviewQuestionsandAnswersforFreshersQ.Nos1,2,8,9
HadoopZooKeeperInterviewQuestionsandAnswersforExperiencedQ.Nos3,4,5,6,7,10

HadoopPigInterviewQuestionsandAnswers
1)WhataredifferentmodesofexecutioninApachePig?
ApachePigrunsin2modesoneisthePig(LocalMode)CommandModeandtheotheristheHadoop
MapReduce(Java)CommandMode.LocalModerequiresaccesstoonlyasinglemachinewhereallfiles
areinstalledandexecutedonalocalhostwhereasMapReducerequiresaccessingtheHadoopcluster.
2)ExplainaboutcogroupinPig.
COGROUPoperatorinPigisusedtoworkwithmultipletuples.COGROUPoperatorisappliedon
statementsthatcontainorinvolvetwoormorerelations.TheCOGROUPoperatorcanbeappliedonupto
127relationsatatime.WhenusingtheCOGROUPoperatorontwotablesatoncePigfirstgroupsboththe
tablesandafterthatjoinsthetwotablesonthegroupedcolumns.
WehavefurthercategorizedHadoopPigInterviewQuestionsforFreshersandExperienced
HadoopInterviewQuestionsandAnswersforFreshersQ.No1
HadoopInterviewQuestionsandAnswersforExperiencedQ.No2
HereareafewmorefrequentlyaskedPigHadoopInterviewQuestionsandAnswersforFreshersand
Experienced

HadoopHiveInterviewQuestionsandAnswers
1)ExplainabouttheSMBJoininHive.
https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

16/25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

InSMBjoininHive,eachmapperreadsabucketfromthefirsttableandthecorrespondingbucketfromthe
secondtableandthenamergesortjoinisperformed.SortMergeBucket(SMB)joininhiveismainlyused
asthereisnolimitonfileorpartitionortablejoin.SMBjoincanbestbeusedwhenthetablesarelarge.In
SMBjointhecolumnsarebucketedandsortedusingthejoincolumns.Alltablesshouldhavethesame
numberofbucketsinSMBjoin.
2)Howcanyouconnectanapplication,ifyourunHiveasaserver?
WhenrunningHiveasaserver,theapplicationcanbeconnectedinoneofthe3ways
ODBCDriverThissupportstheODBCprotocol
JDBCDriverThissupportstheJDBCprotocol
ThriftClientThisclientcanbeusedtomakecallstoallhivecommandsusingdifferentprogramming
languagelikePHP,Python,Java,C++andRuby.
3)WhatdoestheoverwritekeyworddenoteinHiveloadstatement?
OverwritekeywordinHiveloadstatementdeletesthecontentsofthetargettableandreplacesthemwiththe
filesreferredbythefilepathi.e.thefilesthatarereferredbythefilepathwillbeaddedtothetablewhen
usingtheoverwritekeyword.
4)WhatisSerDeinHive?HowcanyouwriteyourowncustomSerDe?
SerDeisaSerializerDeSerializer.HiveusesSerDetoreadandwritedatafromtables.Generally,usersprefer
towriteaDeserializerinsteadofaSerDeastheywanttoreadtheirowndataformatratherthanwritingtoit.
IftheSerDesupportsDDLi.e.basicallySerDewithparameterizedcolumnsanddifferentcolumntypes,the
userscanimplementaProtocolbasedDynamicSerDeratherthanwritingtheSerDefromscratch.
WehavefurthercategorizedHadoopHiveInterviewQuestionsforFreshersandExperienced
HadoopHiveInterviewQuestionsandAnswersforFreshersQ.Nos3
HadoopHiveInterviewQuestionsandAnswersforExperiencedQ.Nos1,2,4
HereareafewmorefrequentlyaskedHadoopHiveInterviewQuestionsandAnswersforFreshersand
Experienced

HadoopYARNInterviewQuestionsandAnswers
1)WhatarethestableversionsofHadoop?
Release2.7.1(stable)
Release2.4.1
Release1.2.1(stable)
2)WhatisApacheHadoopYARN?
YARNisapowerfulandefficientfeaturerolledoutasapartofHadoop2.0.YARNisalargescaledistributed
systemforrunningbigdataapplications.
3)IsYARNareplacementofHadoopMapReduce?
YARNisnotareplacementofHadoopbutitisamorepowerfulandefficienttechnologythatsupports
MapReduceandisalsoreferredtoasHadoop2.0orMapReduce2.
4)WhataretheadditionalbenefitsYARNbringsintoHadoop?
https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

17/25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

EffectiveutilizationoftheresourcesasmultipleapplicationscanberuninYARNallsharinga
commonresource.InHadoopMapReducethereareseperateslotsforMapandReducetaskswhereasin
YARNthereisnofixedslot.ThesamecontainercanbeusedforMapandReducetasksleadingto
betterutilization.
YARNisbackwardcompatiblesoalltheexistingMapReducejobs.
UsingYARN,onecanevenrunapplicationsthatarenotbasedontheMaReducemodel
5)HowcannativelibrariesbeincludedinYARNjobs?
TherearetwowaystoincludenativelibrariesinYARNjobs
1)BysettingtheDjava.library.pathonthecommandlinebutinthiscasetherearechancesthatthenative
librariesmightnotbeloadedcorrectlyandthereispossibilityoferrors.
2)ThebetteroptiontoincludenativelibrariesistothesettheLD_LIBRARY_PATHinthe.bashrcfile.
6)ExplainthedifferencesbetweenHadoop1.xandHadoop2.x
InHadoop1.x,MapReduceisresponsibleforbothprocessingandclustermanagementwhereasin
Hadoop2.xprocessingistakencareofbyotherprocessingmodelsandYARNisresponsiblefor
clustermanagement.
Hadoop2.xscalesbetterwhencomparedtoHadoop1.xwithcloseto10000nodespercluster.
Hadoop1.xhassinglepointoffailureproblemandwhenevertheNameNodefailsithastobe
recoveredmanually.However,incaseofHadoop2.xStandByNameNodeovercomestheSPOF
problemandwhenevertheNameNodefailsitisconfiguredforautomaticrecovery.
Hadoop1.xworksontheconceptofslotswhereasHadoop2.xworksontheconceptofcontainersand
canalsorungenerictasks.
7)WhatarethecorechangesinHadoop2.0?
Hadoop2.xprovidesanupgradetoHadoop1.xintermsofresourcemanagement,schedulingandthemanner
inwhichexecutionoccurs.InHadoop2.xtheclusterresourcemanagementcapabilitiesworkinisolation
fromtheMapReducespecificprogramminglogic.ThishelpsHadooptoshareresourcesdynamically
betweenmultipleparallelprocessingframeworkslikeImpalaandthecoreMapReducecomponent.Hadoop
2.xHadoop2.xallowsworkableandfinegrainedresourceconfigurationleadingtoefficientandbetter
clusterutilizationsothattheapplicationcanscaletoprocesslargernumberofjobs.
8)DifferentiatebetweenNFS,HadoopNameNodeandJournalNode.
HDFSisawriteoncefilesystemsoausercannotupdatethefilesoncetheyexisteithertheycanreador
writetoit.However,undercertainscenariosintheenterpriseenvironmentlikefileuploading,file
downloading,filebrowsingordatastreamingitisnotpossibletoachieveallthisusingthestandardHDFS.
ThisiswhereadistributedfilesystemprotocolNetworkFileSystem(NFS)isused.NFSallowsaccessto
filesonremotemachinesjustsimilartohowlocalfilesystemisaccessedbyapplications.
NamenodeistheheartoftheHDFSfilesystemthatmaintainsthemetadataandtrackswherethefiledatais
keptacrosstheHadoopcluster.
StandByNodesandActiveNodescommunicatewithagroupoflightweightnodestokeeptheirstate
synchronized.TheseareknownasJournalNodes.
9)WhatarethemodulesthatconstitutetheApacheHadoop2.0framework?
Hadoop2.0containsfourimportantmodulesofwhich3areinheritedfromHadoop1.0andanewmodule
YARNisaddedtoit.
1.HadoopCommonThismoduleconsistsofallthebasicutilitiesandlibrariesthatrequiredbyother
modules.
2.HDFSHadoopDistributedfilesystemthatstoreshugevolumesofdataoncommoditymachines
https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

18/25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

acrossthecluster.
3.MapReduceJavabasedprogrammingmodelfordataprocessing.
4.YARNThisisanewmoduleintroducedinHadoop2.0forclusterresourcemanagementandjob
scheduling.
CLICKHEREtoreadmoreabouttheYARNmoduleinHadoop2.x.
10)HowisthedistancebetweentwonodesdefinedinHadoop?
MeasuringbandwidthisdifficultinHadoopsonetworkisdenotedasatreeinHadoop.Thedistancebetween
twonodesinthetreeplaysavitalroleinformingaHadoopclusterandisdefinedbythenetworktopology
andjavainterfaceDNStoSwitchMapping.Thedistanceisequaltothesumofthedistancetotheclosest
commonancestorofboththenodes.ThemethodgetDistance(Nodenode1,Nodenode2)isusedtocalculate
thedistancebetweentwonodeswiththeassumptionthatthedistancefromanodetoitsparentnodeis
always1.
WehavefurthercategorizedHadoopYARNInterviewQuestionsforFreshersandExperienced
HadoopInterviewQuestionsandAnswersforFreshersQ.Nos2,3,4,6,7,9
HadoopInterviewQuestionsandAnswersforExperiencedQ.Nos1,5,8,10

WhatotherquestionsdoyouhaveregardingyourHadoopcareer?
Enteryournamehere...

Writeyouranswerhere...

Submit

HadoopInterviewFAQsAnIntervieweeShouldAskanInterviewer
Formanyhadoopjobseekers,thequestionfromtheinterviewerDoyouhaveanyquestionsforme?
indicatestheendofaHadoopdeveloperjobinterview.ItisalwaysenticingforaHadoopjobseekerto
immediatelysayNotothequestionforthesakeofkeepingthefirstimpressionintact.However,tolanda
hadoopjoboranyotherjob,itisalwayspreferabletofightthaturgeandaskrelevantquestionstothe
interviewer.
AskingquestionsrelatedtotheHadooptechnologyimplementation,showsyourinterestintheopenhadoop
jobroleandalsoconveysyourinterestinworkingwiththecompany.Justlikeanyotherinterview,even
hadoopinterviewsareatwowaystreetithelpstheinterviewerdecidewhetheryouhavethedesiredhadoop
skillstheyinarelookingforinahadoopdeveloper,andhelpsanintervieweedecideifthatisthekindofbig
datainfrastructureandhadooptechnologyimplementationyouwanttodevoteyourskillsforforeseeable
futuregrowthinthebigdatadomain.
Candidatesshouldnotbeafraidtoaskquestionstotheinterviewer.Toeasethisforhadoopjobseekers,
DeZyrehascollatedfewhadoopinterviewFAQsthateverycandidateshouldaskaninterviewerduringtheir
nexthadoopjobinterview
1)WhatisthesizeofthebiggesthadoopclusteracompanyXoperates?
Askingthisquestionhelpsahadoopjobseekerunderstandthehadoopmaturitycurveatacompany.Basedon
theansweroftheinterviewer,acandidatecanjudgehowmuchanorganizationinvestsinHadoopandtheir
enthusiasmtobuybigdataproductsfromvariousvendors.Thecandidatecanalsogetanideaonthehiring
needsofthecompanybasedontheirhadoopinfrastructure.
2)Forwhatkindofbigdataproblems,didtheorganizationchoosetouseHadoop?
Askingthisquestiontotheinterviewershowsthecandidateskeeninterestinunderstandingthereasonfor
hadoopimplementationfromabusinessperspective.Thisquestiongivestheimpressiontotheinterviewer
thatthecandidateisnotmerelyinterestedinthehadoopdeveloperjobrolebutisalsointerestedinthe
growthofthecompany.
https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

19/25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

3)Basedontheanswertoquestionno1,thecandidatecanasktheinterviewerwhythehadoop
infrastructureisconfiguredinthatparticularway,whythecompanychosetousetheselectedbigdata
toolsandhowworkloadsareconstructedinthehadoopenvironment.
Askingthisquestiontotheinterviewergivestheimpressionthatyouarenotjustinterestedinmaintainingthe
bigdatasystemanddevelopingproductsarounditbutarealsoseriouslythoughtfulonhowtheinfrastructure
canbeimprovedtohelpbusinessgrowthandmakecostsavings.
4)WhatkindofdatatheorganizationworkswithorwhataretheHDFSfileformatsthecompany
uses?
Thequestiongivesthecandidateanideaonthekindofbigdataheorshewillbehandlingifselectedforthe
hadoopdeveloperjobrole.Basedonthedata,itgivesanideaonthekindofanalysistheywillberequiredto
performonthedata.
StayTunedtotheblogformoreupdatesonHadoopInterviewFAQ's!!!
WehopethattheseHadoopInterviewQuestionsandAnswershaveprechargedyouforyournextHadoop
Interview.GettheBallRollingandshareyourhadoopinterviewexperiencesinthecommentsbelow.Please
do!It'sallpartofoursharedmissiontoeaseHadoopInterviewsforallprospectiveHadoopers.Weinviteyou
togetinvolved.
ClickheretoknowmoreaboutourIBMCertifiedHadoopDevelopercourse
RelatedPosts
HowmuchJavaisrequiredtolearnHadoop?
Top50HadoopInterviewQuestionsfor2016
TopHadoopAdminInterviewQuestionsandAnswersfor2016
HadoopDeveloperInterviewQuestionsatTopTechCompanies

PREVIOUS

NEXT

Answers

Currentlyhave61answers
Q:WhatcompaniesareyouapplyingtoforHadoopjobroles?

JambulingamREmc
Jul012016,10:51AM
https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

20/25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

GGHortonworks
Jun292016,10:02PM

chanduinfosysltd
Jun282016,02:09PM

AtanuSarkarAdobe,amazon,facebook
Jun282016,01:27AM

Anonymousuhg
Jun232016,03:03AM

View24moreanswers

Q:Whattechnologiesareyouworkingoncurrently?(Java,Datawarehouse,BusinessIntelligence,ETL,etc.)

RanjithkumarDatawarehouse,BusinessIntelligence
Jul012016,12:10PM

GunjanBhadrahadoop,java,python
Jul012016,07:44AM

AlkaJAVA
Jun232016,07:10PM

AnkitasahooETLAbinitio
Jun232016,06:36PM

Csdfsa
Jun172016,03:16PM

View25moreanswers

Q:WhatotherquestionsdoyouhaveregardingyourHadoopcareer?

AnonymousDoIreallyneedJavatositfortheseinterviewssinceIamnotsoproficientinthis?
Apr152016,03:25PM

AnonymousWhathandsonprojectsshouldIbeworkingontogettherightjobs.
Apr152016,03:25PM

https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

21/25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

3Comments

DeZyre

Recommend 8

Share

Login

SortbyNewest

Jointhediscussion
balu0407 amonthago

WhichoneisgoodtolearnHadoopadminorhadoopdevelopment.Pleaseassistme

Reply Share

KhushbuShah

Mod >balu0407 amonthago

BalanicetoknowthatyouareinterestedtopursueacareerinHadoop,howeverto
suggestifHadoopAdminstrationorHadoopDevelopmentwouldbeagoodchoicelearn
weneedtohaveadetailedunderstandingofyourcareerbackground.Pleasedropanemail
toanjali@dezyre.comforfurtherassistanceonthis.Onreceivingyouremail,oneofour
careercounselorswillgetintouchwithyoutoansweranyqueriesyouhaveonthese
grounds.Youcanalsoleaveyouremailidincommentsandourcareercounsellorswill
contactyoutoguidefurther.

Reply Share

Hareesh@Disqus 10monthsago

ThanksDezyre.

Reply Share

UpcomingLiveOnlineHadoopTraining

09

SatandSun(5weeks)
7:00AM11:00AMPST

$499

LearnMore

Jul

17

SuntoThurs(4weeks)
6:30PM8:30PMPST

$499

LearnMore

Jul

https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

22/25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

RelevantCourses
HadoopOnlineTraining
ApacheSparkTraining
DataScienceinPythonTraining
DataScienceinRLanguageTraining
SalesforceCertificationTraining
NoSQLDatabaseTraining
HadoopAdminTraining

BlogCategories
BigData
CRM
DataScience
MobileAppDevelopment
NoSQLDatabase
WebDevelopment

Youmightalsolike
RecapofHadoopNewsforJune
HadoopJobsSalaryTrendsinIndia
GlobalBigData&HadoopDeveloperSalariesReview
ApacheKafkaNextGenerationDistributedMessagingSystem
SqoopInterviewQuestionsandAnswersfor2016
WorkingwithSparkRDDforFastDataProcessing
ImproveYourLinkedInProfileandfindtherightHadoopJob!
Top5ApacheSparkUseCases
LifeCycleofaDataScienceProject
RecapofDataScienceNewsforMay

Tutorials
https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

23/25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

ApacheSparkTutorialRunyourFirstSparkProgram
PySparkTutorialLearntouseApacheSparkwithPython
RTutorialLearnDataVisualizationwithRusingGGVIS
NeuralNetworkTrainingTutorial
PythonListTutorial
MatPlotLibTutorial
DecisionTreeTutorial
NeuralNetworkTutorial
PerformanceMetricsforMachineLearningAlgorithms
RTutorial:Data.Table
SciPyTutorial
StepbyStepApacheSparkInstallationTutorial
IntroductiontoApacheSparkTutorial
RTutorial:ImportingDatafromWeb
RTutorial:ImportingDatafromRelationalDatabase
RTutorial:ImportingDatafromExcel
IntroductiontoMachineLearningTutorial
MachineLearningTutorial:LinearRegression
MachineLearningTutorial:LogisticRegression
SupportVectorMachineTutorial(SVM)
KMeansClusteringTutorial
dplyrManipulationVerbs
Introductiontodplyrpackage
ImportingDatafromFlatFilesinR
PrincipalComponentAnalysisTutorial
PandasTutorialPart3
PandasTutorialPart2
PandasTutorialPart1
TutorialHadoopMultinodeClusterSetuponUbuntu
DataVisualizationsToolsinR
RStatisticalandLanguagetutorial
IntroductiontoDataSciencewithR
ApachePigTutorial:UserDefinedFunctionExample
ApachePigTutorialExample:WebLogServerAnalytics
ImpalaCaseStudy:WebTraffic
ImpalaCaseStudy:FlightDataAnalysis
HadoopImpalaTutorial
ApacheHiveTutorial:Tables
FlumeHadoopTutorial:TwitterDataExtraction
FlumeHadoopTutorial:WebsiteLogAggregation
HadoopSqoopTutorial:ExampleDataExport
HadoopSqoopTutorial:ExampleofDataAggregation
ApacheZookepeerTutorial:ExampleofWatchNotification
ApacheZookepeerTutorial:CentralizedConfigurationManagement
HadoopZookeeperTutorial
HadoopSqoopTutorial
HadoopPIGTutorial
HadoopOozieTutorial
HadoopNoSQLDatabaseTutorial
HadoopHiveTutorial
HadoopHDFSTutorial
HadoophBaseTutorial
HadoopFlumeTutorial
Hadoop2.0YARNTutorial
HadoopMapReduceTutorial
BigDataHadoopTutorialforBeginnersHadoopInstallation

OnlineCourses
HadoopTraining
https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

24/25

03/07/2016

Top100HadoopInterviewQuestionsandAnswers2016

SparkCertificationTraining
DataScienceinPython
DataScienceinR
DataScienceTraining
HadoopTraininginCalifornia
HadoopTraininginNewYork
HadoopTraininginTexas
HadoopTraininginVirginia
HadoopTraininginWashington
HadoopTraininginNewJersey

Courses
CertificateinBigDataandHadoop
ApacheSparkCertificationTraining
DataScienceinPython
DataScienceinRProgramming
DataSciencetraining
SalesforceCertificationsADM201andDEV401
HadoopAdministrationforBigData
CertificateinNoSQLDatabasesforBigData
AdvancedMSExcelwithMacro,VBAandDashboards
EVSSLCertificate

AboutDeZyre
AboutUs
ContactUs
DeZyreReviews
Blog
Tutorials
Webinar
PrivacyPolicy
Disclaimer

Connectwithus
Twitter
Facebook
Youtube
GooglePlus
Linkedin
DezyreOnline
Copyright2016IconiqInc.Allrightsreserved.Alltrademarksarepropertyoftheirrespectiveowners.

https://www.dezyre.com/article/top100hadoopinterviewquestionsandanswers2016/159

25/25

You might also like