Professional Documents
Culture Documents
on Traveloka
Andi N. Dirgantara
Lead Data Engineer
Speaker Profile
2
How we use our data
● Business Intelligence
● Analytics
● Personalization
● Fraud Detection
● Ads optimization
● Cross selling
● AB Test
● etc.
3
Problems
Client
● Web
● Android
Backend Database Big Data Platform ?
● etc.
Data Processing
● Analytics
● Machine Learning
● etc.
4
There are solutions exists, but ...
source: mattturck.com/bigdata2017
5
We need Data Lake
But what it is?
Data Lake by Definitions
● A data lake is a storage repository that holds a vast amount of raw data in its
native format until it is needed. - http://searchaws.techtarget.com
● A data lake is a storage repository that holds a vast amount of raw data in its
native format, including structured, semi-structured, and unstructured data.
The data structure and requirements are not defined until the data is needed.
- Tamara Dull, (SAS), https://www.kdnuggets.com
● It store the data in its native/ raw format
● The schema applied when on query time
● Sometimes it’s also just a “marketing label” to simplified people saying the
technology which complied with Hadoop, just like “big data” terms for
distributed storing and query engine
7
Data Lake implementation on Data Team Side
output
Presto
8
Hive + Presto Pros and Cons
Pros Cons
● More flexible in the context of ● Harder to maintain (also
managing (self managed) because of self managed)
○ Able to define nodes, replication
factor, cluster, etc.
○ Able to specify node specs.
● Good integration with other
Hadoop ecosystem
○ Spark
○ Kafka
○ Impala
● More mature
● Open sourced
9
Big Query
Pros Cons
● Easier to maintain ● Less mature compared to
(managed by GCP) Hadoop ecosystem
● Good integration with other ● Limited API yet
GCP managed tools (not supported Scala API)
○ Dataflow ● Unable to store data on S3,
○ PubSub need to be on Cloud Storage
○ Cloud Storage ● Close sourced
● Enterprise ready, support is
24/7
10
Conclusions
Conclusions
12
References and Other Presentations
13
Thank you for your time.
We are hiring...
visit https://www.traveloka.com/en/careers