Administrating Ha Do Op

An Overview
Credits
Author: Michael Guenther Editor: Aaron Loucks Dancing Elephants: Michael V. Shuman
What developers and architects see
What capacity planning folks see
What network folks see
What operations folks see
Learning the Hard Way
Introductions
Aaron Loucks
y Senior Technical Operations Engineer,
CCHA y ~11 months active Hadoop Admin Experience
Michael Guenther
y Technical Operations Team Lead, CCHA y ~16 months active Hadoop Admin
Experience
Learning the Hard Way
Early Adoption of Hadoop has some of its own issues.

y The knowledge base is growing, but still
pretty thin. y Manning (finally) released their book, so now we have 3 Hadoop books. y HBase has even less documentation available and no books. (July for Lars Georges book. Probably, hopefully) y Cloudera didnt officially support HBase until CDH3
Playing Catch Up
IS and Ops came to the game a bit later than development so we had to play catch up early on in the project. We had to write a lot of our own tools and implement our own processes (rack awareness, log cleanup, metadata backups, deploy configs, etc.) Additionally, we needed to learn a lot about Linux system details and network setup and configuration.
New Admin Blues
Tech Ops (Aaron and I), arent part of the IS department.

y This might be different at your company. Some
places, Ops are part of IS. The correct model depends on staffing and which group fulfills various enterprise roles.
Administrating Hadoop/HBase created a problem for our traditional support model and non-SA activity on the machines It took some time to get used to the new system and what was needed for us to run and maintain it. Most of which changed with CDH3.
Enterprise Wide Admins?

Since we have no centralized team administrating all clouds, configuration and set up varies across the enterprise creating additional challenges. Staffing Hadoop Administrators is difficult. Especially since we arent in the Bay Area.
Configuration File Management
Configuration File Management can be a challenge.

y We settled on a central folder on a common
mount and an ssh script to push configs. y Cloudera recommended using Puppet or Chef. y We havent made that jump yet. When the cluster goes heterogeneous, we will investigate further.
A Look At Our Cluster
Initial Cluster Setup

Prod started with 3 Masters and 20 DNs across 2 racks (10 and 10) UAT started with 3 Masters and 15 DNs across 2 racks (5 and 10)
Current Hardware Breakdown
Name Node (HMaster), Secondary Name Node, and Job Tracker

y y y y
Dell R710s Dual Intel Quad-cores (spec) 72GB of RAM SCSI Drives in RAID configuration (~70GB)
Data Nodes (Task Tracker, Data Node, Region Server) - 30 nodes

y y y y
Dell R410s Dual Intel Quad-cores 64GB of RAM 4x2TB 5400RPM SATA in JBOD configuration Dell R610s Dual Intel Quad-cores (Specs) 32GB of RAM SCSI Drives in RAID configuration (~70GB)
Zookeeper Servers (Standalone mode)

y y y y
Cluster Network Information
Rack Details
y TOR Switches Cisco 4948s y 1GB/E links to TOR y 42U rack, ~32U usable for servers
Network
y TORs are 1GB/E to Core (Cisco 6509s).
Channel bonding possible if needed. y 10GB/E is being investigated if needed. y 192GB Backbone
Growing Our Cluster

Early on, we were unsure of how many servers were needed for launch. Capacity planning was a total unknown:
y Reserving data center space was very
difficult. y Budgeting for future growth was also difficult.
Ideal Growth Versus Reality

When we did add new servers, we ran into rack space issues. Our rack breakdown for UAT datanodes is 5, 10, and 15 servers Uneven datanode distribution isnt handled well by HBase and Hadoop. Re-racking was not an option. Options: Turn off rack-awareness, go with the uneven rack arrangement, or lie to Hadoop?
Server Build Out

Initially, we received new machines from the Sys Admins and we had to install Hadoop and HBase. We worked with the SAs to create a Cobbler image for new types of Hadoop servers. Now, new machines only need configuration files and are ready for use.
First Cluster Growth Issue

Since we had to spoof rack-awareness, mis-replicated blocks started showing up. Run the balancer to fix it right? Not quite. The Hadoop balancer doesnt fix mis-replicated blocks. You have to modify your replication factor on the folders with mis-replicated blocks.
Be Paranoid.
Paranoia Its Not So Bad

Be paranoid. Hadoop punishes the unwary (trust us). Two dfs.name.dir folders are a must. Back up your Name Node images and edits on a regular basis (hourly). Run fsck blocks / once a day. Run your dfsadmin report once a week.
Paranoia - Youll Get Used To It
Check your various web pages once a day.

y Name Node, Job Tracker, and HMaster
Set up monitoring and alerting. Set up your trash option in HDFS to greater than the 10 minute default. Lock down your cluster
y Keep everyone off of your cluster y Provide a client server for user interaction.
Fuse is a good addition to this server.
Backing Up Your Cluster

Again, multiple dfs.name.dirs Run wgets regularly on the namenode image and edits URL to create a backup. Back up your config files prior to any major change (or even minor). Save your job statistics to HDFS.
y mapred.job.tracker.persist.jobstatus.dir
Data Node Metadata Zookeeper Data Directory
Learning by Experience (Sometimes Painful Experience)
Issues and Epiphanies
Pinning your yum repository

y We had this for our cloudera repo mirror list
initially:
mirrorlist=http://archive.cloudera.com/redhat/c dh/3/mirrors
y Thats the latest and greatest CDH3 build
repo (B2, B3, B4, etc). y We are on CDH3B3, so we needed to set our repo mirror list to this:
mirrorlist=http://archive.cloudera.com/redhat/c dh/3b3/mirrors
FSCK returned CORRUPT!

y Initially, we thought this was much, much
worse than it turned out to be when it happened. y Its still bad, but only the files listed as corrupt are lost. It wasnt the swath of destruction we thought it would be.
Cloudera might be able to work some magic, but youve almost certainly lost your file(s).
Sudo permissions are key

y We avoid using root whenever possible. y All config files and folders are owned by our generic
account. y Our generic account has some nice permissions though:

sudo u hdfs/hbase/zookeeper/mapred * sudo /etc/init.d/hbase-regionserver * sudo /etc/init.d/hadoop *
y root access might be extremely difficult to come by.
It depends heavily on your business and IS policies.
These cover 95% of our day-to-day activity on the cloud.
Document EVERYTHING
y Its a bit tiresome at first, but issues can
sometimes be months between reoccurrence. y Write it down now and save yourself having to research again. y This is especially true when you are setting up your first cluster. Theres a lot to learn, its really easy to forget. y Pay special attention to the error message that goes along with the problem. HBase tends to have extremely vague exceptions and error logging.
Fair Scheduler Woes

y While nice, the fair scheduler page has
caused some serious problems. y Users grow frustrated when their jobs arent running, so they increase the priority. y Now their job is running, but others are being starved. y We ended up restricting page access to a very small subset of users.
Do NOT let dfs.name.dir run out of space.

y This is extremely bad news if you only have
one dfs.name.dir. y We have two

One Name Node local mount directory One SAN mount (also our common mount)
You absolutely need monitoring in place to keep this from happening.
Smaller Issues
y Missing pid files? y Users receive a zip file exception when
running a job y CDH3 Install/Upgrade requires a local hadoop user. y The Job Tracker complains about port already in use. Check your mapred-site.xml.
Memory Settings Hadoop

y Set your SN and NN to be the same size. y Set your starting JVM starting size to be
equal to your max. y Set your memory explicitly per process, not using HADOOP_HEAPSIZE. y Set your map and reduce heap size as final in your mapred-site.xml.
HBase Issues and Epiphanies

Set your hbase users ulimits high 64k is good. Sometimes the HBase take a really long time to start back up (2 hours one Saturday). 0.89 WAL File corruption problem. Keep your quorum off of your data nodes (off that rack really). HBase is extremely sensitive to network events/maintenance/connectivity issues/etc.
HBase Issues and Epiphanies
Memory Settings HBase

y Region Servers need a lot more memory y y y y
than your HMaster. Region Servers can, and will, run out of memory and crash. Rowcounter is your friend for nonresponsive region servers. Zookeeper should be set to 1 GB of JVM heap. Talk to Cloudera about special JVM settings for your HBase daemons.
Questions? We Might Have Answers.

Administrating Ha Do Op

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Administrating Ha Do Op

Uploaded by

Copyright:

Available Formats

An Overview

What developers and architects see

What capacity planning folks see

What network folks see

What operations folks see

Learning the Hard Way

CCHA y ~11 months active Hadoop Admin Experience

Learning the Hard Way

Early Adoption of Hadoop has some of its own issues.

New Admin Blues

Tech Ops (Aaron and I), arent part of the IS department.

Enterprise Wide Admins?

Configuration File Management

Configuration File Management can be a challenge.

A Look At Our Cluster

Initial Cluster Setup

Current Hardware Breakdown

Name Node (HMaster), Secondary Name Node, and Job Tracker

Data Nodes (Task Tracker, Data Node, Region Server) - 30 nodes

Zookeeper Servers (Standalone mode)

Cluster Network Information

Growing Our Cluster

y Reserving data center space was very

difficult. y Budgeting for future growth was also difficult.

Ideal Growth Versus Reality

Server Build Out

First Cluster Growth Issue

Paranoia Its Not So Bad

Paranoia - Youll Get Used To It

Check your various web pages once a day.

Fuse is a good addition to this server.

Backing Up Your Cluster

Data Node Metadata Zookeeper Data Directory

Learning by Experience (Sometimes Painful Experience)

Issues and Epiphanies

Pinning your yum repository

Issues and Epiphanies

FSCK returned CORRUPT!

Issues and Epiphanies

Sudo permissions are key

account. y Our generic account has some nice permissions though:

y root access might be extremely difficult to come by.

It depends heavily on your business and IS policies.

These cover 95% of our day-to-day activity on the cloud.

Issues and Epiphanies

Issues and Epiphanies

Fair Scheduler Woes

Issues and Epiphanies

Do NOT let dfs.name.dir run out of space.

one dfs.name.dir. y We have two

You absolutely need monitoring in place to keep this from happening.

Issues and Epiphanies

Issues and Epiphanies

Memory Settings Hadoop

HBase Issues and Epiphanies

HBase Issues and Epiphanies

Memory Settings HBase

Questions? We Might Have Answers.

You might also like