Professional Documents
Culture Documents
Course
Contents
page2
Hadoop Introduction
Day1
MapReduce
Distributing Data with HDFS Day2
Understanding Hadoop I/O
Advanced MapReduce
Writing Map-Reduce Applications Day3
Map-Reduce Internals
Managing Hadoop Day4
Map-Reduce Ecosystem
Hadoop Ecosystem (Tools)
Map Reduce Design Patterns Day5
Hadoop-2
Analytics
Hadoop Introduction
MapReduce
Hadoop Streaming
Ruby
Mobile: +91 7719882295/ 9730463630
Email: sales@anikatechnologies.com
Website:www.anikatechnologies.com
Streaming in Hadoop
Interfaces
Hadoop Filesystems
The Design of HDFS
Limitations
Data Flow
Python
MapFile
Mobile: +91 7719882295/ 9730463630
Email: sales@anikatechnologies.com
Website:www.anikatechnologies.com
Serialization
Codecs
Using Compression in MapReduce
Compression and Input Splits
Data Integrity
Compression
SequenceFile
ChecksumFileSystem
LocalFileSystem
Data Integrity in HDFS
Advanced MapReduce
Chaining MapReduce jobs
Reduce-side joining
Replicated joins using DistributedCache
Semijoin: reduce-side join with map-side filtering
Map-Reduce Internals
Failures
Task Execution
Failures in YARN
Failures in Classic MapReduce
Job Scheduling
Managing Hadoop
Setting permissions
Enabling trash
Adding DataNodes
Managing NameNode and Secondary NameNode
Designing network layout and rack awareness
Checking systems health
Managing quotas
Setting up parameter values for practical use
Removing DataNodes
Recovering from a failed NameNode
Map-Reduce Features
Counters
Sorting
Side Data Distribution
Map-Reduce Library
Joins
Map-Reduce Ecosystem
Hive
HiveQL in details
Example queries
Hive Sum-up
Hbase
Intoduction
Clients
Concepts
Hbase vs RDBMS
Pig
Installing Pig
Running Pig
Mobile: +91 7719882295/ 9730463630
Email: sales@anikatechnologies.com
Website:www.anikatechnologies.com
Execution optimization
Expressions and functions
Relational operators
Data types and schemas
Co-Processor
Scan Operations
Column Value & Key Pair
Column Families
Index & Query
Counters
CRUD Operations
Result Scanner
Batch and Caching
MapReduce and HBase
Filters
Creating Table Shell and Programming
Importing into HBase
Sqoop
HBase DB Design
Handling Index
Designing Keys
Transaction
Integration for search
Schema Design
Flume
Metapatterns
Join Patterns
Summarization Patterns
The Effects of YARN
Data Organization Patterns
Filtering Patterns
Input and Output Patterns
Final Thoughts
Hadoop-2
Apache Tez
Apache YARN
Agility
global ResourceManager
per-node slave NodeManager
Scalability
Support for workloads other than MapReduce
Compatibility with MapReduce
per-application Container running on a NodeManager
Improved cluster utilization
per-application ApplicationMaster
HDFS-2
Analytics
Clustering
K-means clustering
Beyond k-means: an overview of clustering techniques
Inspecting clustering output
Analyzing clustering output
Fuzzy k-means clustering
Evaluating and improving clustering quality
Improving clustering quality
Model-based clustering
Representing data
Classification
Training a classifier
Mahout classifier
Choosing an algorithm to train the classifier
Classifying the 20 newsgroups data with naive Bayes
The classifier evaluation API
Process for deployment in huge systems
Thrift-based classification server
Building a training pipeline for large systems
When classifiers go bad
Classifier evaluation in Mahout
Determining scale and speed requirements
Deploying a classifier
Recommendations
Introducing recommenders
Making recommendations