Professional Documents
Culture Documents
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
HCatalog What is it ?
A Hive metastore interface set Shared schema and data types for Hadoop tools Rest interface for external data access Assists inter operability between
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
Pig
HCatLoader + HCatStorer interface HCatInputFormat + HCatOutputFormat interface No interface necessary Direct access to meta data
Map Reduce
Hive
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
HCatalog Interfaces
Interface via
Pig Map Reduce Hive Streaming Orc file RC file Text file Sequence file Custom format
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
HCatalog Interfaces
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
HCatalog Architecture
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
HCatalog Example
A data flow example from hive.apache.org
First Joe in data acquisition uses distcp to get data onto the grid. hadoop distcp file:///file.dat hdfs://data/rawevents/20100819/data hcat "alter table rawevents add partition (ds='20100819') location 'hdfs://data/rawevents/20100819/data'" Second Sally in data processing uses Pig to cleanse and prepare the data. Without HCatalog, Sally must be manually informed by Joe when data is available, or poll on HDFS. A = load '/data/rawevents/20100819/data' as (alpha:int, beta:chararray, ); B = filter A by bot_finder(zeta) = 0; store Z into 'data/processedevents/20100819/data'; With HCatalog, HCatalog will send a JMS message that data is available. The Pig job can then be started. A = load 'rawevents' using HCatLoader(); B = filter A by date = '20100819' and by bot_finder(zeta) = 0; store Z into 'processedevents' using HcatStorer("date=20100819"); Note that the pig job refers to the data by name rawevents rather than a location Now access the data via Hive QL select advertiser_id, count(clicks) from processedevents where date = 20100819 group by advertiser_id;
www.semtech-solutions.co.nz
info@semtech-solutions.co.nz
Contact Us
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
We offer IT project consultancy We are happy to hear about your problems You can just pay for those hours that you need To solve your problems