You are on page 1of 43

Hadoop Security with HDP/PHD

Page 1

Hortonworks Inc. 2011 2014. All Rights Reserved

Disclaimer
This document may contain product features and technology directions that are under
development or may be under development in the future.

Technical feasibility, market demand, user feedback, and the Apache Software Foundation
community development process can all effect timing and final delivery.

This documents description of these features and technology directions does not represent a
contractual commitment from Hortonworks to deliver these features in any generally available
product.

Product features and technology directions are subject to change, and must not be included in
contracts, purchase orders, or sales agreements of any kind.

Page 2

Hortonworks Inc. 2011 2014. All Rights Reserved

Agenda
Hadoop Security
Kerberos
Authorization and Auditing with Ranger
Gateway Security with Knox
Encryption

Page 3

Hortonworks Inc. 2011 2014. All Rights Reserved

Security today in Hadoop with HDP/PHD

HDP\PHD

Enterprise
Enterprise Services:
Services: Security
Security

Centralized Security Administration

Page 4

Authentication
Who am I/prove it?

Authorization
What can I do?

Audit
What did I do?

Data Protection

Kerberos
API security with
Apache Knox

Fine grain access


control with
Apache Ranger

Centralized
audit reporting
w/ Apache
Ranger

Wire encryption
in Hadoop
Native and
partner
encryption

Hortonworks Inc. 2011 2014. All Rights Reserved

Can data be encrypted


at rest and over the
wire?

Security needs are changing


Security needs are changing
YARN unlocks the data lake

Administration
Centrally management &
consistent security

Multi-tenant: Multiple applications for data


access
Different kinds of data
Changing and complex compliance environment

Authentication
Authenticate users and systems

Authorization
Provision access to data

Audit
Maintain a record of data access

Data Protection
Protect data at rest and in motion

Page 5

Hortonworks Inc. 2011 2014. All Rights Reserved

Fall 2013
Largely silod deployments
with single workload clusters

2014
65% of clusters host
multiple workloads

Typical Flow Hive Access through Beeline client

A
HiveServer 2
Beeline
Client

Page 6

Hortonworks Inc. 2011 2014. All Rights Reserved

B
HDFS

Typical Flow Authenticate through Kerberos

Hive creates
map reduce
using NN
Service Ticket

Use Hive
Service T,icket
submit query

A
HiveServer 2

Beeline
Client

Page 7

Client
Requests a TGT
Receives TGT
Client dcrypts it with the password
hash
Sends the TGT and receives a Service
Ticket

Hortonworks Inc. 2011 2014. All Rights Reserved

Hive gets
Namenode
(NN) service
ticket

KDC

B
HDFS

Typical Flow Add Authorization through Ranger(XA


Secure)
Ranger

Hive creates
map reduce
using NN ST

Use Hive ST,


submit query

HiveServer 2
Beeline
Client

Client gets
service ticket for
Hive

Hive gets
Namenode
(NN) service
ticket

KDC

Page 8

Hortonworks Inc. 2011 2014. All Rights Reserved

B
HDFS

Typical Flow Firewall, Route through Knox


Gateway
Ranger

Original
request w/user
id/password

Apache
Use Hive ST,
submit query
Knox

Hive creates
map reduce
using NN ST

Knox runs as proxy


user using Hive ST

HiveServer 2

Client gets
query result

Beeline
Client
Knox gets
service ticket for
Hive

Hive gets
Namenode
(NN) service
ticket

KDC

Page 9

Hortonworks Inc. 2011 2014. All Rights Reserved

B
HDFS

Typical Flow Add Wire and File Encryption

SSL

Original
request w/user
id/password

SSL

Apache
Use Hive ST,
submit query
Knox

SASL

SSL

HiveServer 2

Beeline
Client
Knox gets
service ticket for
Hive

Hive gets
Namenode
(NN) service
ticket

KDC

Page 10

Hortonworks Inc. 2011 2014. All Rights Reserved

SSL

Hive creates
map reduce
using NN ST

Knox runs as proxy


user using Hive ST

Client gets
query result

Ranger

B
HDFS

Security Features
PHD/HDP Security
Authentication
Kerberos Support

Perimeter Security For services and rest API

Authorizations
Fine grained access control
Role base access control
Column level
Permission Support

HDFS, Hbase and Hive, Storm


and Knox

Create, Drop, Index, lock, user

Auditing
Resource access auditing
Policy auditing
Page 11

Hortonworks Inc. 2011 2014. All Rights Reserved

Extensive Auditing

Security Features
HDP/PHD Security w/ Ranger

Data Protection
Wire Encryption
Volume Encryption
File/Column Encryption

TDE
HDFS TDE & Partners

Reporting
Global view of policies and audit data

Manage
User/ Group mapping
Global policy manager, Web UI
Delegated administration

Page 12

Hortonworks Inc. 2011 2014. All Rights Reserved

Partner Integration

Security Integrations:
Ranger plugins: centralize authorization/audit of 3rd party s/w in Ranger
UI
Via Custom Log4J appender, can stream audit events to INFA infrastructure

Knox: Route partner APIs through Knox after validating compatibility


Provide SSO capability to end users

Page 13

Hortonworks Inc. 2011 2014. All Rights Reserved

Authentication w/ Kerberos

Page 14

Hortonworks Inc. 2011 2014. All Rights Reserved

Page 14

Kerberos in the field


Kerberos no longer too complex. Adoption growing.
Ambari helps automate and manage kerberos integration with cluster
Use: Active directory or a combine Kerberos/Active Directory
Active Directory is seen most commonly in the field
Many start with separate MIT KDC and then later grow into the AD KDC
Knox should be considered for API/Perimeter security
Removes need for Kerberos for end users
Enables integration with different authentication standards
Single location to manage security for REST APIs & HTTP based services
Tip: In DMZ

Page 15

Hortonworks Inc. 2011 2014. All Rights Reserved

Authorization and Auditing


Apache Ranger

Page 22

Hortonworks Inc. 2011 2014. All Rights Reserved

Page 22

Authorization and Audit


Authorization
Fine grain access control

HDFS Folder, File

Hive Database, Table, Column

HBase Table, Column Family, Column

Storm, Knox and more

Audit
Extensive user access auditing in
HDFS, Hive and HBase

IP Address

Resource type/ resource

Timestamp

Access
granted or denied
Hortonworks Inc. 2011 2014. All Rights Reserved

Page 23

Flexibility
in defining
policies

Control
access into
system

Central Security Administration


Apache Ranger

Delivers a single pane of glass for


the security administrator

Centralizes administration of
security policy

Ensures consistent coverage across


the entire Hadoop stack

Page 24

Hortonworks Inc. 2011 2014. All Rights Reserved

Setup Authorization Policies

file level
access
control,
flexible
definition

Control
permissions

25

Page 25

Hortonworks Inc. 2011 2014. All Rights Reserved

Monitor through Auditing

26

Page 26

Hortonworks Inc. 2011 2014. All Rights Reserved

Apache Ranger Flow

Page 27

Hortonworks Inc. 2011 2014. All Rights Reserved

Hadoop Components

Enterprise
Enterprise Services:
Services: Security
Security

Enterprise
Users

Authorization and Auditing w/ Ranger


Ranger Administration Portal
RDBMS

Ranger Audit
Server

HDFS

HBase

Ranger
Plugin

Hive Server2

Ranger
Plugin

HDFS

Ranger
Plugin

Storm

Ranger
Plugin
HDP 2.2 Additions

Page 28

Hortonworks Inc. 2011 2014. All Rights Reserved

Ranger Policy
Server

Planned for 2015

Legacy Tools
& Data
Governance
Integration API

Ranger
Plugin

Knox

Ranger
Plugin*

TBD

Installation Steps
Install PHD 3.0
Install Apache Ranger (https://tinyurl.com/mlgs3jy)

Install Policy Manager

Install User Sync

Install Ranger Plugins

Start Policy Manager

service ranger-admin start

Verify http://<host>:6080/
-

Page 29

admin/admin

Hortonworks Inc. 2011 2014. All Rights Reserved

Ranger Plugins
HDFS

Steps to Enable plugins

HIVE

1. Start the Policy Manager

KNOX

2. Create the Plugin repository in the Policy Manager

STORM

3. Install the Plugin

HBASE

Edit the install.properties


Execue ./enable-<plugin>.sh

4. Restart the plugin service (e.g. HDFS, Hive etc)

Page 30

Hortonworks Inc. 2011 2014. All Rights Reserved

Ranger Console

The Repository Manager Tab


The Policy Manager Tab
The User/Group Tab
The Analytics Tab
The Audit Tab

31

Page 31

Hortonworks Inc. 2011 2014. All Rights Reserved

Repository Manager
Add New Repository
Edit Repository
Delete Repository

32

Page 32

Hortonworks Inc. 2011 2014. All Rights Reserved

Demo

33

Page 33

Hortonworks Inc. 2011 2014. All Rights Reserved

REST API Security through Knox


Securely share Hadoop Cluster

Page 34

Hortonworks Inc. 2011 2014. All Rights Reserved

Page 34

Share Data Lake with everyone - Securely


Simplifies access:Extends Hadoops REST/HTTP services by encapsulating Kerberos to
within the Cluster.
Enhances security:Exposes Hadoops REST/HTTP services without revealing network
details, providing SSL out of the box.
Centralized control:Enforces REST API security centrally, routing requests to multiple
Hadoop clusters.
Enterprise integration:Supports LDAP, Active Directory, SSO, SAML and other
authentication systems.

Page 35

Hortonworks Inc. 2011 2014. All Rights Reserved

Apache Knox
Knox can be used with both unsecured Hadoop clusters, and Kerberos secured clusters. In an enterprise
solution that employs Kerberos secured clusters, the Apache Knox Gateway provides an enterprise security
solution that:
Integrates well with enterprise identity management solutions
Protects the details of the Hadoop cluster deployment (hosts and ports are hidden from end users)
Simplifies the number of services with which a client needs to interact

Page 36

Hortonworks Inc. 2011 2014. All Rights Reserved

Extend Hadoop API reach with Knox


App A
Business
User

Falcon
Oozie
Scoop
Flume

Data
Operator

Page 37

App B

REST/HTTP

Application Tier

Load Balancer

App C

App N

JDBC/ODBC

Knox

Data Ingest
Hadoop Cluster
ETL

Hortonworks Inc. 2011 2014. All Rights Reserved

RPC Call

Bastian Node
SSH

Hadoop
Admin

Admin/
Operators

Typical Flow Add Wire and File Encryption

SSL

Original
request w/user
id/password

SSL

Apache
Use Hive ST,
submit query
Knox

SASL

SSL

HiveServer 2

Beeline
Client
Knox gets
service ticket for
Hive

Hive gets
Namenode
(NN) service
ticket

KDC

Page 38

Hortonworks Inc. 2011 2014. All Rights Reserved

SSL

Hive creates
map reduce
using NN ST

Knox runs as proxy


user using Hive ST

Client gets
query result

Ranger

B
HDFS

Why Knox?
Enhanced Security
Protect network details
SSL for non-SSL services
WebApp vulnerability filter

Simplified Access
Kerberos encapsulation
Extends API reach
Single access point
Multi-cluster support
Single SSL certificate
Page 39

Hortonworks Inc. 2011 2014. All Rights Reserved

Centralized Control
Central REST API auditing
Service-level authorization
Alternative to SSH edge node

Enterprise Integration
LDAP integration
Active Directory integration
SSO integration
Apache Shiro extensibility
Custom extensibility

Hadoop REST API with Knox


Service

Direct URL

Knox URL

WebHDFS http://namenode-host:50070/webhdfs

https://knox-host:8443/webhdfs

WebHCat

http://webhcat-host:50111/templeton

https://knox-host:8443/templeton

Oozie

http://ooziehost:11000/oozie

https://knox-host:8443/oozie

HBase

http://hbasehost:60080

https://knox-host:8443/hbase

http://hivehost:10001/cliservice

https://knox-host:8443/hive

http://yarn-host:yarn-port/ws

https://knox-host:8443/resourcemanager

Hive
YARN

Page 40

Masters could
be on many
different
hosts
Hortonworks
Inc. 2011 2014.
All Rights Reserved

One hosts,
one port

SSL config
at one host

Consistent
paths

Hadoop REST API Security: Drill-Down


Hadoop
Hadoop Cluster
Cluster 11

Firewall

Firewall

DMZ

Masters
Masters

NN
NN

Edge
Node/Hado
op CLIs

RM
RM
RPC

Oozie
Oozie

Web
Web
HCat
HCat

Knox
Knox Gateway
Gateway
HTTP

LB

NM
NM

HTTP
HTTP

GW
GW
GW
GW

Hadoop
Hadoop Cluster
Cluster 22
Masters
Masters

NN
NN
RM
RM
LDAP

Page 41

Hortonworks Inc. 2011 2014. All Rights Reserved

HS2
HS2

Slaves
Slaves
DN
DN

REST
REST
Client
Client

HBase
HBase

Enterprise
Enterprise
Identity
Identity
Provider
Provider
LDAP/AD
LDAP/AD

Oozie
Oozie

HBase
HBase

Web
Web
HCat
HCat

Slaves
Slaves
DN
DN

Page 41

NM
NM

HS2
HS2

Knox features in PHD

Use Ambari for Install/start/stop/configuration


Knox support for HDFS HA
Support for YARN REST API
Support for SSL to Hadoop Cluster Services (WebHDFS, HBase,
Hive & Oozie)
Integration with Ranger for Knox Service Level Authorization
Knox Management REST API

Page 42

Hortonworks Inc. 2011 2014. All Rights Reserved

Installation
Installed via Ambari
This can be done manually
Start the embeded ldap

There is good examples in the Apache doc with groovy scripts


https://knox.apache.org/books/knox-0-4-0/knox-0-4-0.html

Page 43

Hortonworks Inc. 2011 2014. All Rights Reserved

Data Protection
Wire and data at rest encryption

Page 44

Hortonworks Inc. 2011 2014. All Rights Reserved

Page 44

Data Protection
HDP allows you to apply data protection policy at
different layers across the Hadoop stack
Layer

What?

How ?

Storage and
Access

Encrypt data while it is at rest

Partners, HDFS Tech Preview, Hbase


encryption, OS level encrypt,

Transmission

Encrypt data as it moves

Supported from HDP 2.1

Page 45

Hortonworks Inc. 2011 2014. All Rights Reserved

HDFS Transparent Data Encryption (TDE) in 2.2


Data encryption on a higher level than the OS one whilst remaining native
and transparent to Hadoop
End-to-end: data can be both encrypted and decrypted by the clients
Encryption/decryption using the usual HDFS functions from the client
No need to requiring to change user application code
No need to store data encryption keys on HDFS itself
No need to unencrypted data.

Data is effectively encrypted at rest, but since it is decrypted on the client


side, it means that it is also encrypted on the wire while being transmitted.
HDFS file encryption/decryption is transparent to its client
users can read/write files to/from encryption zone as long they have the permission to
access it

Depends on installing a Key Management Server


Page 49

Hortonworks Inc. 2011 2014. All Rights Reserved

HDFS Transparent Data Encryption (TDE) in 2.2


Data encryption on a higher level than the OS one whilst remaining native and transparent to Hadoop
End-to-end: data can be both encrypted and decrypted by the clients
Encryption/decryption using the usual HDFS functions from the client
No need to requiring to change user application code
No need to store data encryption keys on HDFS itself
No need to unencrypted data.
Data is effectively encrypted at rest, but since it is decrypted on the client side, it means that it is also
encrypted on the wire while being transmitted.
HDFS file encryption/decryption is transparent to its client
users can read/write files to/from encryption zone as long they have the permission to access it
Depends on installing a Key Management Server

Page 53

Hortonworks Inc. 2011 2014. All Rights Reserved

HDFS Transparent Data Encryption (TDE) - Steps


Install and run KMS on top of HDP 2.2
Change HDFS params via Ambari
Create encryption key
hadoop key create key1 -size 256
hadoop key list metadata
Create an encryption zone using the key
hdfs dfs -mkdir /zone1
hdfs crypto -createZone -keyName key1 /zone1
hdfs listZones

http://hortonworks.com/kb/hdfs-transparent-data-encryption/

Page 54

Hortonworks Inc. 2011 2014. All Rights Reserved

Thank You

Page 55

Hortonworks Inc. 2011 2014. All Rights Reserved

You might also like