Professional Documents
Culture Documents
Who We Are
2
2011 Cloudera, Inc. All Rights Reserved.
Users of Cloudera
3
2011 Cloudera, Inc. All Rights Reserved.
What is Apache Hadoop?
4
2011 Cloudera, Inc. All Rights Reserved.
What Makes Hadoop Different?
5
2011 Cloudera, Inc. All Rights Reserved.
Why the Need for Hadoop?
10,000
GIGABYTES OF DATA CREATED (IN BILLIONS)
6
2011 Cloudera, Inc. All Rights Reserved.
Hadoop Use Cases
Use Case Application Industry Application Use Case
DATA PROCESSING
Network Analytics Telco Mediation
7
2011 Cloudera, Inc. All Rights Reserved.
Hadoop in the Enterprise
Management Enterprise
IDEs BI / Analytics
Tools Reporting
CUSTOMERS
Enterprise Data
Warehouse
Web
Application
Relational
Logs Files Web Data
Databases
8
2011 Cloudera, Inc. All Rights Reserved.
What is CDH?
9
2011 Cloudera, Inc. All Rights Reserved.
Clouderas Commitment to the Open
Source Community
Component Cloudera Committers Cloudera Founder 2011 Commits
Common 6 Yes #1
HDFS 6 Yes #2
MapReduce 5 Yes #1
HBase 2 No #2
Zookeeper 1 Yes #2
Oozie 1 Yes #1
Pig 0 No #3
Hive 1 No #2
Sqoop 2 Yes #1
Flume 3 Yes #1
Hue 3 Yes #1
Snappy 2 No #1
Bigtop 8 Yes #1
Avro 4 Yes #1
Whirr 2 Yes #1
10
2011 Cloudera, Inc. All Rights Reserved.
Components of CDH
Cloudera Enterprise
User Interface
HUE
Languages / Compilers
APACHE PIG, APACHE HIVE
Fast Read/Write
Data Integration
Access
11
2011 Cloudera, Inc. All Rights Reserved.
Hadoop Distributed File System
5 5
1
2 1
HDFS
3 3
4 4
5 2
5
1
3
3
Cost is $400-$500/TB 4
5
12
2011 Cloudera, Inc. All Rights Reserved.
Components of Hadoop
13
2011 Cloudera, Inc. All Rights Reserved.
Components of Hadoop
14
2011 Cloudera, Inc. All Rights Reserved.
Networking
15
2011 Cloudera, Inc. All Rights Reserved.
Map
Map Shuffle
(key 2, (key 1, int. Reduce Final (key,
Task Phase
values) values) Task values)
16
2011 Cloudera, Inc. All Rights Reserved.
Reduce
After the map phase is over, all the intermediate values for
a given output key are combined together into a list
Map Shuffle
(key 2, (key 1, int. Reduce Final (key,
Task Phase
values) values) Task values)
17
2011 Cloudera, Inc. All Rights Reserved.
MapReduce Execution
18
2011 Cloudera, Inc. All Rights Reserved.
Sqoop
SQL to Hadoop
Tool to import/export any JDBC-supported database into Hadoop
Transfer data between Hadoop and external databases or EDW
High performance connectors for some RDBMS
Developed at Cloudera
19
2011 Cloudera, Inc. All Rights Reserved.
Flume
Distributed, reliable, available service for efficiently moving
large amounts of data as it is produced
Suited for gathering logs from multiple systems
Inserting them into HDFS as they are generated
Design goals
Reliability, Scalability, Manageability, Extensibility
Developed at Cloudera
20
2011 Cloudera, Inc. All Rights Reserved.
Flume: high-level architecture
Master send
Configurable levels of reliability
configuration to all
Guarantee delivery in event of
Agents failure
Agent Agent Agent Agent
Deployable, centrally administered
encrypt
MASTER
Optionally pre-process incoming
Processor Processor data: perform transformations,
suppressions, metadata enrichment
compress batch
encrypt
21
2011 Cloudera, Inc. All Rights Reserved.
HBase
22
2011 Cloudera, Inc. All Rights Reserved.
HBase
23
2011 Cloudera, Inc. All Rights Reserved.
Hive
Example:
SELECT s.word, s.freq, k.freq FROM shakespeares
JOIN ON (s.word= k.word) WHERE s.freq >= 5;
24
2011 Cloudera, Inc. All Rights Reserved.
Pig
Example:
emps=LOAD 'people.txt AS(id,name,salary);
rich = FILTER emps BY salary > 100000; srtd =
ORDER rich BY salary DESC; STORE srtd INTO
rich_people.txt';
25
2011 Cloudera, Inc. All Rights Reserved.
Oozie
Oozie is a workflow/cordination service to manage data
processing jobs for Hadoop
26
2011 Cloudera, Inc. All Rights Reserved.
Zookeeper
27
2011 Cloudera, Inc. All Rights Reserved.
Pipes and Streaming
28
2011 Cloudera, Inc. All Rights Reserved.
FUSE - DFS
29
2011 Cloudera, Inc. All Rights Reserved.
Hadoop Security
30
2011 Cloudera, Inc. All Rights Reserved.
Cloudera Enterprise
Cloudera Enterprise makes CLOUDERA ENTERPRISE COMPONENTS
open source Hadoop enterprise-easy
Cloudera Production-Level
Simplify and Accelerate Hadoop Deployment
Manager Support
Reduce Adoption Costs and Risks
Lower the Cost of Administration
End-to-End Management Our Team of Experts On-
Increase the Transparency Control of Hadoop Application for Apache Call to Help You Meet
Hadoop Your SLAs
Leverage the Experience of Our Experts
EFFECTIVENESS EFFICIENCY
Ensuring You Enabling You to
Get Value From Your Hadoop Deployment Affordably Run Hadoop in Production
31
2011 Cloudera, Inc. All Rights Reserved.
Cloudera Manager
32
2011 Cloudera, Inc. All Rights Reserved.
Cloudera Enterprise
Feature Benefit
Flexible Support Windows Choose from 8x5 or 24x7 options to meet SLA
requirements
Configuration Checks Verify that your Hadoop cluster is fine-tuned for your
environment
Issue Resolution and Proven processes ensure that support cases get
resolved with maximum efficiency
Escalation Processes
34
2011 Cloudera, Inc. All Rights Reserved.
Cloudera University
Class Description
Developer Training & Certification Hands-on training and certification for developers who want
(4 Days) to analyze their data but are new to Apache Hadoop
System Administrator Training & Hands-on training and certification for administrators who
Certification (3 Days) will be responsible for setting up, configuring, monitoring an
Apache Hadoop cluster
HBase Training (2 Day) Covers the HBase architecture, data model, and Java API as
well as some advanced topics and best practices
Analyzing Data with Hive and Pig Hive and Pig training is designed for people who have a
(2 Days) basic understanding of how Apache Hadoop works and want
to utilize these languages for analysis of their data
Essentials for Managers (1 Day) Provides decision-makers the information they need to know
about Apache Hadoop, answering questions such as when
is Hadoop appropriate?, what are people using Hadoop
for? and what do I need to know about choosing Hadoop?
35
2011 Cloudera, Inc. All Rights Reserved.
Cloudera Consulting Services
Put Our Expertise To Work For You.
Service Description
Use Case Discovery Assess the appropriateness and value of Hadoop
for your organization
New Hadoop Deployment Set up and configure high performance,
production-ready Hadoop clusters
Proof of Concept Verify the prototype functionality and project
feasibility for a new Hadoop cluster
Production Pilot Deploy your first production-level project using
Hadoop
Process and Team Development Define the requirements and processes for
creating a new Hadoop team
Hadoop Deployment Certification Perform periodic health checks to certify and tune
up existing Hadoop clusters
36
2011 Cloudera, Inc. All Rights Reserved.
Journey of the Cloudera Customer
37
2011 Cloudera, Inc. All Rights Reserved.
Cloudera in Production
Consulting Services
Cloudera University Cloudera Services
Cloudera Enterprise
Management Cloudera Management Suite Enterprise Web
Cloudera Support IDEs BI / Analytics
Tools Reporting Application
Enterprise Data
Warehouse
Clouderas Distribution
Including Apache Hadoop (CDH)
& Operational Rules
SCM Express Engines
Relational
Logs Files Web Data
Databases
38
2011 Cloudera, Inc. All Rights Reserved.
Get Cloudera helps you profit
Hadoop from all your data.
facebook.com/
cloudera
39
2011 Cloudera, Inc. All Rights Reserved.
Cloudera Manager
Manages the
Incorporates comprehensive
Has built-in
40
2011 Cloudera, Inc. All Rights Reserved.
Cloudera Manager
Key and
ONLY
CLOUDERA
Installs the complete Hadoop stack in minutes. The simple, wizard-based
interface guides you through the steps.
Gives you complete, end-to-end visibility and control over your Hadoop
cluster from a single interface
ONLY
CLOUDERA
Set server roles, configure services and manage security across the cluster
Scans Hadoop logs for irregularities and warns you before they impact the
cluster
41
2011 Cloudera, Inc. All Rights Reserved.
Cloudera Manager
Key and
ONLY
CLOUDERA
Establishes the time context globally for almost all views
42
2011 Cloudera, Inc. All Rights Reserved.
Two Editions: FREE EDITION ENTERPRISE EDITION**
Automated Deployment
Host-Level Monitoring
Configuration Management
Audit Trails
Start/Stop/Restart Services
Service Monitoring
Activity Monitoring
Operational Reporting
Support Integration
43
2011 Cloudera, Inc. All Rights Reserved.
View Service Health and Performance
44
2011 Cloudera, Inc. All Rights Reserved.
Get Host-Level Snapshots
45
2011 Cloudera, Inc. All Rights Reserved.
Monitor and Diagnose Cluster Workloads
46
2011 Cloudera, Inc. All Rights Reserved.
Gather, View and Search Hadoop Logs
47
2011 Cloudera, Inc. All Rights Reserved.
Track Events From Across the Cluster
48
2011 Cloudera, Inc. All Rights Reserved.
Run Reports on System Performance & Usage
49
2011 Cloudera, Inc. All Rights Reserved.
New in Cloudera Manager 3.7
ONLY
50
2011 Cloudera, Inc. All Rights Reserved.
Cloudera Support
Feature Benefit
Flexible Support Windows Choose from 8x5 or 24x7 options to meet SLA
requirements
Configuration Checks Verify that your Hadoop cluster is fine-tuned for your
environment
Issue Resolution and Escalation Proven processes ensure that support cases get
Processes resolved with maximum efficiency
Proactive Notification of New Stay up to speed with whats going on in the Apache
Developments and Events Hadoop community
51
2011 Cloudera, Inc. All Rights Reserved.
Cloudera Enterprise
52
2011 Cloudera, Inc. All Rights Reserved.