You are on page 1of 8

Impala

What is it ? How does it work ? Performance Formats Architecture

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Impala What is it ?

Adhoc real time query for Hadoop Open source Developed by Cloudera Based on Google 2010 dremel paper Direct data access via Impala engine Future Hadoop parquet update will

Add columnar binary storage to Hadoop Improve Impala performance

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Impala How does it work ?

Direct data access Query planning / coordination on data nodes Node based query engine Low latency Perfomance imrovement Query data on HDFS or Hbase Uses same Hive QL syntax ( SQL like ) Has the Hue GUI Allows table joins and aggregation

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Impala Performance
Impala delivers performance gains

IO bound queries hardware limitations

Min 3 times Min 7 times Min 20 times

Complex multiple MapReduce stages

Cached queries

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Impala Formats
Supported formats

Text & Sequence Files which can be compressed as Snappy GZIP BZIP Future support for

Avro RCFile LZO text file Parquet

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Impala Architecture

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Impala Requirements
What does Impala need to run ?

CentOS 6.2 or RHEL (Red Hat Enterprise Linux) CDH 4.1 (Cloudera Hadoop Distribution) Cloudera Manager ( advised )

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Contact Us

Feel free to contact us at


www.semtech-solutions.co.nz info@semtech-solutions.co.nz

We offer IT project consultancy We are happy to hear about your problems You can just pay for those hours that you need To solve your problems

You might also like