Professional Documents
Culture Documents
2
About Me
Civil (Water) Engineer Data Scientist
2010 2015 2015
Consultant (UK) Virgin Media (UK)
Utilities Domino Data Lab (Silicon Valley,
Asset Management US)
Constrained Optimization
Industrial PhD (UK) 2016 Present
Infrastructure Design Optimization
H2O.ai (Silicon Valley, US)
Machine Learning +
Water Engineering
Discovered H2O in 2014
3
About Me
4
Agenda
About H2O.ai
Company
Machine Learning Platform
Tutorial
H2O Python Module
Download & Install
Step-by-Step Examples:
Basic Data Import / Manipulation
Regression & Classification (Basics)
Regression & Classification (Advanced)
Using H2O in the Cloud
5
Agenda
About H2O.ai
Company Background Information
Machine Learning Platform
Tutorial
H2O Python Module
Download & Install
Step-by-Step Examples: For beginners
Basic Data Import / Manipulation
Regression & Classification (Basics) Short Break
Regression & Classification (Advanced)
As if I am working on
Using H2O in the Cloud
Kaggle competitions
6
About H2O.ai
7
Company Overview
Founded 2011 Venture-backed, debuted in 2012
Products H2O Open Source In-Memory AI Prediction Engine
Sparkling Water
Steam
Mission Operationalize Data Science, and provide a platform for users to build beautiful data products
Team 70 employees
Distributed Systems Engineers doing Machine Learning
World-class visualization designers
Headquarters Mountain View, CA
8
Our Team
Kuba
Joe
9
Scientific Advisory Council
10
11
Joe (2015)
http://www.h2o.ai/gartner-magic-quadrant/
12
Check
out our
website
h2o.ai
13
H2O Machine Learning Platform
14
High Level Architecture
HDFS
H2O Compute Engine
S3 Exploratory & Supervised &
Load Data Unsupervised
Descriptive Predict
Analysis Modeling
Distributed
In-Memory Feature Model
NFS Data & Model
Loss-less Engineering & Evaluation &
Storage
Compression Selection Selection
SQL
Production Scoring Environment
Your
Imagination
15
Import Data from
High Level Architecture Multiple Sources
HDFS
H2O Compute Engine
S3 Exploratory & Supervised &
Load Data Unsupervised
Descriptive Predict
Analysis Modeling
Distributed
In-Memory Feature Model
NFS Data & Model
Loss-less Engineering & Evaluation &
Storage
Compression Selection Selection
SQL
Production Scoring Environment
Your
Imagination
16
Fast, Scalable & Distributed
High Level Architecture Compute Engine Written in
Java
HDFS
H2O Compute Engine
S3 Exploratory & Supervised &
Load Data Unsupervised
Descriptive Predict
Analysis Modeling
Distributed
In-Memory Feature Model
NFS Data & Model
Loss-less Engineering & Evaluation &
Storage
Compression Selection Selection
SQL
Production Scoring Environment
Your
Imagination
17
Fast, Scalable & Distributed
High Level Architecture Compute Engine Written in
Java
HDFS
H2O Compute Engine
S3 Exploratory & Supervised &
Load Data Unsupervised
Descriptive Predict
Analysis Modeling
Distributed
In-Memory Feature Model
NFS Data & Model
Loss-less Engineering & Evaluation &
Storage
Compression Selection Selection
SQL
Production Scoring Environment
Your
Imagination
18
Algorithms Overview
Supervised Learning Unsupervised Learning
19
H2O Deep Learning in Action
20
Multiple Interfaces
High Level Architecture
HDFS
H2O Compute Engine
S3 Exploratory & Supervised &
Load Data Unsupervised
Descriptive Predict
Analysis Modeling
Distributed
In-Memory Feature Model
NFS Data & Model
Loss-less Engineering & Evaluation &
Storage
Compression Selection Selection
SQL
Production Scoring Environment
Your
Imagination
21
H2O + Python
22
H2O + R
23
H2O Flow (Web) Interface
24
Export Standalone Models
High Level Architecture for Production
HDFS
H2O Compute Engine
S3 Exploratory & Supervised &
Load Data Unsupervised
Descriptive Predict
Analysis Modeling
Distributed
In-Memory Feature Model
NFS Data & Model
Loss-less Engineering & Evaluation &
Storage
Compression Selection Selection
SQL
Production Scoring Environment
Your
Imagination
25
docs.h2o.ai
26
H2O + Python Tutorial
27
Learning Objectives
Start and connect to a local H2O cluster from Python.
Import data from Python data frames, local files or web.
Perform basic data transformation and exploration.
Train regression and classification models using various H2O machine
learning algorithms.
Evaluate models and make predictions.
Improve performance by tuning and stacking.
Connect to H2O cluster in the cloud.
28
29
Install H2O
h2o.ai -> Download -> Install in Python
30
31
Start and Connect to a
Local H2O Cluster
py_01_data_in_h2o.ipynb
32
Local H2O Cluster
Import H2O module
35
36
37
38
Basic Data Transformation &
Exploration
py_02_data_manipulation.ipynb
(see notebooks)
39
40
41
42
43
Regression Models (Basics)
py_03a_regression_basics.ipynb
44
Algorithms Overview
Supervised Learning Unsupervised Learning
45
docs.h2o.ai
46
47
48
49
50
Regression Performance MSE
51
52
53
54
55
56
Classification Models (Basics)
py_04_classification_basics.ipynb
57
58
59
60
61
Classification Performance Confusion Matrix
62
Confusion Matrix
63
64
65
66
67
68
Regression Models (Tuning)
py_03b_regression_grid_search.ipynb
69
Improving Model Performance (Step-by-Step)
Model Settings MSE (CV) MSE (Test)
70
71
72
Improving Model Performance (Step-by-Step)
Model Settings MSE (CV) MSE (Test)
73
74
Improving Model Performance (Step-by-Step)
Model Settings MSE (CV) MSE (Test)
75
Cross-Validation
76
77
78
Improving Model Performance (Step-by-Step)
Model Settings MSE (CV) MSE (Test)
79
Early Stopping
80
81
Improving Model Performance (Step-by-Step)
Model Settings MSE (CV) MSE (Test)
82
Grid Search
87
88
89
Improving Model Performance (Step-by-Step)
Model Settings MSE (CV) MSE (Test)
90
Regression Models (Ensembles)
py_03c_regression_ensembles.ipynb
91
https://github.com/h2oai/h2o-
meetups/blob/master/2017_02_23_
Metis_SF_Sacked_Ensembles_Deep_
Water/stacked_ensembles_in_h2o_fe
b2017.pdf
92
93
94
95
Lowest MSE =
96
Best Performance
Improving Model Performance (Step-by-Step)
Model Settings MSE (CV) MSE (Test)
97
Classification Models (Ensembles)
py_04_classification_ensembles.ipynb
98
Highest AUC =
99
Best Performance
H2O in the Cloud
py_05_h2o_in_the_cloud.ipynb
100
101
102
Recap
103
Learning Objectives
Start and connect to a local H2O cluster from Python.
Import data from Python data frames, local files or web.
Perform basic data transformation and exploration.
Train regression and classification models using various H2O machine
learning algorithms.
Evaluate models and make predictions.
Improve performance by tuning and stacking.
Connect to H2O cluster in the cloud.
104
Improving Model Performance (Step-by-Step)
Model Settings MSE (CV) MSE (Test)
105
106
H2O Tutorial
Friday 4:00 pm
bit.ly/joe_h2o_tutorials
Contact
joe@h2o.ai
@matlabulous
github.com/woobe
Find us at PyData Conference
Live Demos
Please search/ask questions on
Stack Overflow
Use the tag `h2o` (not H2 zero)
108