Professional Documents
Culture Documents
Rahul Singh
Roll No:-1503314918 , MCA
Rajkumare Goel institute of technology
Contents
Introduction to Hadoop
Hadoop Architecture
HDFS Architecture
MapReduce
What is Hadoop
Hadoop is an Apache open source framework written in java that allows
distributed processing of large datasets across clusters of computers using
simple programming models. The Hadoop framework application works in
an environment that provides distributed storage and computation across
clusters of computers.
Hadoop Architecture
At its core, Hadoop has two major layers namely:
(a) Processing/Computation layer (MapReduce),
(b) Storage layer (Hadoop Distributed File System).
Hadoop core components
Hadoop Cluster
Hadoop cluster is a special type of computational cluster designed
for storing and analyzing vast amount of unstructured data in a
distributed computing environment. These clusters run on low cost
commodity computers.
Hadoop cluster has 3
components:
Client
Master
slave
HDFS Architecture
Hadoop- Typical Workflow in
HDFS
Let's try to find out answers of these questions
Take the example of input file as Sample.txt.