You are on page 1of 113

1

InfoQ

PaddlePaddle Angel

TalkingData Fregata

2
Tensorflow

Tensorflow

Deeplearning4j

Twitter

InfoQ

editors@cn.infoq.com

InfoQ Tina

05 TensorFlow Kubernetes

13 PaddlePaddle

25

53 Fregata

67 Twitter

83

95

4
TensorFlow Kubernetes

Artificial Neural Network

1 0

Imagenet

ImageNet 1500

2012 25%

2012 ImageNet

16%

2016 3.5% ImageNet

5.1%

TensorFlow
TensorFlow 11

AlphaGo DeepMind

TensorFlow TensorFlow

CPUGPU TensorFlow

TensorFlow RankBrain

CMU TensorFlow

8
ppt TensorFlow

TensorFlow Python C++ API Python API

TensorFlow Python

import TensorFlow TensorFlow

Tensor

session

TensorFlow

TensorFlow tf.

Variable

output

TensorFlow TFLearn

TensorFlow-Slim 10 MNIST ppt

TensorFlow TensorFlow

TensorFlow

TensorFlow

9
2015 Inception-v3

ImageNet 95% 2500

50

Inception-v3 78% 5

95%

TensorFlow on Kubernetes

Google Brain TensorFlow Borg

TensorFlow Kubernetes TensorFlow

Kubernetes

10

TensorFlow

TensorFlow

In-graph replication

TensorFlow Between-

graph replication

In-graph replication

Between-graph replication

TensorFlow

Kubernetes TensorFlow Hadoop

Hadoop

11
YarnHDFS mapreduce TensorFlow

Hadoop Mapreduce

TensorFlow Yarn HDFS

Kubernetes Kubernetes

TensorFlow

TensorFlow

2013


Spark

12
PaddlePaddle

PaddlePaddle (IDL)

PaddlePaddle

IDL INF( ) SYS( )

10 Github

AI

PaddlePaddle

PaddlePaddle

PaddlePaddle

13
30 PaddlePaddle

1.

PaddlePaddle

14

0 3

10 40

PaddlePaddle

SVM

x1-x5 w1-w5 y

15
y

x w

DNA

16

y y z y z

y x

17

AlphaGo

AI AI

2. PaddlePaddle
PaddlePaddle

GPU

layer

Paddle2013

Pserver Trainer

Pserver

Trainer

Pserver

18
Pserver

PServer Trainer

PServer Trainer

PServer

Trainer Pserver

Trainer mini-

batch Parmeterserver

SGD

19
PServer

PServer PServer

Paddle

PServer

PServer

20

0 0 SGD

PServer

Parmeter block PServer

PaddlePaddle

Trainer

200G

200G PServer Trainer

server

SGD 0

L2

L2 PServer

21
Trainer PServer

2013 PServer

P2P P2P

NodeA A

B C

NodeA A0 B0

NodeB B1 C1

theta1theta2theta3

Pserver

22
23
SGD SGD

SGD

PaddlePaddle GPU

caffe GPU

GPU SPMD(Single Program/Multiple Data)

GPU

24
GPU GPU SGD

GPU CPU

GPU PaddlePaddle

GPU

P2P CPU

25
PaddlePaddle

sequenceorder

a vector of features a set of features

PaddlePaddle

sequence PaddlePaddle

GatherLayer

sequence

Paddle Memory RNN

Memory

26
Memory Memory Memory

Paddle RNN RNN

Sepuence batch batch

GPU (SPMD)

Sepuence

Tensorflow MXNet Padding

Sepuence Pading 0

Padding Sepuence Padding 0

PaddlePaddle

27

PaddlePaddle Padding

3. PaddlePaddle
OP

OP TensorFlow

Layercaffe

PaddlePaddle OP layer PaddlePaddle

PaddlePaddle Layer OP (

) layer LSTM

C++ LSTM Layer

OP LSTM

Layer C++ PaddlePaddle OP

Layer

PaddlePaddle LSTM

PaddlePaddle MPI

Spark k8s + Docker PaddlePaddle

PaddlePaddle

PaddlePaddle

28
RDMA PaddlePaddle

GPU

PaddlePaddle

PaddlePaddle

PaddlePaddle

1.
github

256

2828

PaddlePaddle Python

PaddlePaddle

pixel labelPaddlePaddle

29
2.
demo PaddlePaddle

embedding_

layer LSTM

30

demo

APP

Code

Code

APP

PaddlePaddle

PaddlePaddle demo

31

PaddlePaddle demo PaddlePaddle

demo

Github IM

PaddlePaddle demo

PaddlePaddle 9

PaddlePaddle

Github PaddlePaddle

Github Github

32

Github

Bug Issue

paddle-dev

1 PaddlePaddle

PaddlePaddle RNN

RNN RNN

Sepuence

2PaddlePaddle

PaddlePaddle

ARM

3 PaddlePaddle

33

4 FPGA

FPGA

FPGA

PaddlePaddle

15

34

AI

60

20 60

TB

20 GB

= +

VC VC 1960 1990

35
Vapnik Chervonenkis VC

VC

VC

VC

VC

VC

2 64

36

ID

ID

GDBTGeneral Distributed Brain TechnologyGDBT C++

MPIYarnMesos

GDBT

No Free Lunch

ETL

GDBT

GDBT

GDBT

HadoopSpark ETL

ETL

37

ETL

overhead

ETL

ETL

ETL

ETL

ETL

GDBT

GDBT

GPU

FPGA

GDBT

38

GDBT

overheadGDBT

, ETL ,

, ,

, ,

GDBT ,

( MPI Broadcast

AllReduceGatherScatter ) ,

Parameter Server

, ,

, ,

GDBT ,

39
GDBT , ,

GDBT Key-Value

, , ,

GDBT

GDBT

DAG

GDBT

GDBT

GDBT

GDBT

pattern

GDBT

GDBT

GDBT

Tensorflow

40

GDBT GDBT C++ 14

GDBT

GDBT

Hadoop/Spark JVM JVM

C++ JVM

CPU GPU

Spark Project Tungsten

GC

C++

C++ JVM-Based

Spark Parameter Server

PS Spark

C++

41

API

42

1.

DAG SQL/

PySpark/

DAG

AWS

IDC

SSD

SSD

API alluxio

tachyonalluxio

43

S3Ceph SSD Alluxio

GDBT alluxio

SSD

DataManager

Datamanager DAG

URI

2.

Schema

SQL

DAG

44

GDBT GDBT

3.

DAG

DAG

API

QPS

Cannon KV T

GDBT

45

ACM

AI

Q&A

Q1

focus

google wide&deep learning

tensorflow

IO

Q2

46
GDBT

Q3 SVRG

batch/stochastic lbfgsFOBOS

RDAFTRLSVRGFrank-wolfe

Q4 bayes

bayesian

Q5

100%

Q6

asp bsp ssp

parameter server

Q7: Tensor flow

Tensorflow GDBT GDBT Tensorflow

47
Tensorflow

benchmark tensorflow

Tensorslow Tensorflow

Tensorflow

GDBT

Q8GDBT

GDBT

Q9GDBT

GPU IO

Q10 GDBT

GDBT CPU GPU

FPGA GPU nvidia

Tesla P4 FPGA FPGA

Q11 Spark

Prophet

case by case

case by case

48

AI

Prophet

Prophet GDBT

GBDT General Distributed Brilliant

Technology TM Lamma

Prophet

Service

GDBT Spark GDBT

Spark MLLib

MLLib

Prophet SparkProphet Spark

Prophet AI

GDBT Spark

AI for everyonehttps://www.zhihu.com/

question/48743915#

Spark Spark

49
Q12

APP

10

Q13

API SaaS GDBT

NextParadigm

Q14

BAT

09

Q15

gradient boosting factorization machine

nn-

based NN feature

combination NN

FPGA GPU

FPGA

50

CPU

CPU

FPGA GPU

GPU

FPGA

FPGA GPU

FPGA GPU

FPGA GPU FPGA

FPGA

/ GPU

FPGA

FPGA

FPGAGPU

GPU GPU

FPGA

51

trinity FPGA

ELF



Angel

52
Fregata

[1]

53

IO IO

90%

Map Reduce

Hadoop Spark Map Reduce

SSD

Map Reduce

Parameter Server[2]

54
Parameter Server

Parameter Server

Map Reduce Parameter Server

Parameter Server

[3]

55

TalkingData

TalkingData Fregata [4]

. Fregata

Fregata TalkingData Spark

Spark 1.6.x, Spark 2.0 Fregata Logistic

Regression, Softmax, Random Decision Trees

Logistic Regression, Softmax

Greedy Step Averaging[5]

SGD

GSA Logstic Regression Softmax

GSA Spark Logistic Regression

Softmax

IO

Logsitic Regression

56

IO

Random Decision Trees[6][7]

Hash

Trick

Fregata

. GSA

GSA , Fregata

(SGD) SGD

SGD

SGD

Adagrad, Adadelta, Adam

GSA

SGD

GSA Logistic Softmax libsvm 16

SGDAdadeltaSCSG(SVRG )

GSA

57
GSA

GSA

softmax

. GSASpark

GSA Spark

Spark

Spark BSP

58

Ensemble Learning

Parameter Server

Spark MLLib

Spark

Map Reuce

Rosenblatt[8]

Map

Reduce Fregata GSA

Map

Reduce

. FregataMLLib

Fregata Spark Spark MLLib

Logistic

Regression Softmax

AUC

Fregata MLLib

AUC

59
Lookalike Fregata

Lookalike

Logistic Regression

Lookalike

class imblance (4

2 ) Lookalike Fregata LR

MLLib LR

4 Fregata LR AUC

0.93 MLLib LR

AUC AUC 0.55

MLLib Fregata 6

eplison[9] 40 2000 , Fregata LR

MLLib LR 5

Fregata LR

MLLib LR 5 Fregata LR

60
MLLib LR Fregata LR 5

6 Fregata LR MLLib LR 6 AUC

Fregata LR MLLib LR

Fregata LR AUC

MNIST Softmax 7

Fregata Softmax AUC

MLLib Softmax 40 Fregata Softmax

MLLib

Softmax Fregata Softmax 50

61
Fregata Spark Fregata

10 100 1000 10000 4

4 Fregata Logistic

Regression 511412394

5 48 Executor

500 Executor

2G 800

Executor 8GFregata Spark

. Fregata

Fregata

Fregata 3 Fregata

Maven pom.xml

<dependency>
<groupId>com.talkingdata.fregata</groupId>
<artifactId>core</artifactId>
<version>0.0.1</version>
</dependency>
<dependency>

62
<groupId>com.talkingdata.fregata</groupId>
<artifactId>spark</artifactId>
<version>0.0.1</version>
</dependency>
SBT build.sbt

// maven
// resolvers += Resolver.mavenLocal
libraryDependencies += "com.talkingdata.fregata" % "core" %
"0.0.1"
libraryDependencies += "com.talkingdata.fregata" % "spark"
% "0.0.1"
maven

git clone https://github.com/TalkingData/Fregata.git


cd Fregata
mvn clean package install
Logistic Regression Fregata

1.

import fregata.spark.data.LibSvmReader
import fregata.spark.metrics.classification.{AreaUnderRoc,
Accuracy}
import fregata.spark.model.classification.LogisticRegression
import org.apache.spark.{SparkConf, SparkContext}
2. Fregata LibSvmReader

LibSvm [10]

val (_, trainData) = LibSvmReader.read(sc, trainPath,


numFeatures.toInt)
val (_, testData) = LibSvmReader.read(sc, testPath,
numFeatures.toInt)
3. Logsitic Regression

63
val model = LogisticRegression.run(trainData)
4.

val pd = model.classPredict(testData)
5. Fregata

val auc = AreaUnderRoc.of( pd.map{


case ((x,l),(p,c)) =>
p -> l
})
Fregata breeze.linalg.Vector[Double]

LibSvm Fregata LibSvmReader.

read()

breeze.linalg.Vector[Double]

// indices Array00
// values Array indices
// length Int
// label Double
sc.textFile(input).map{
val indicies = ...
val values = ...
val label = ...
...
(new SparseVector(indices, values, length).asInstanceOf[Vector],
asNum(label))
}

. Freagata

Fregata

Fregata 3

Fregata Spark

Fregata Spark Spark

64

Fregata

Fregata

Spark

Fregata

LR

LR

IO

Fregata TalkingData

418km/ 1.5

2.3 Fregata

Fregata ,

1. Cheng T. Chu, Sang K. Kim, Yi A. Lin, Yuanyuan Yu, Gary R.

65
Bradski, Andrew Y. Ng, Kunle Olukotun Map-Reduce for Machine

Learning on Multicore, NIPS, 2006.

2. https://www.zhihu.com/question/48282030

3. https://github.com/TalkingData/Fregata

4. http://arxiv.org/abs/1611.03608

5. http://www.ibm.com/developerworks/cn/analytics/library/ba-

1603-random-decisiontree-algorithm-1/index.html

6. http://www.ibm.com/developerworks/cn/analytics/library/ba-

1603-random-decisiontree-algorithm-2/index.html

7. Rosenblatt J D, Nadler B. On the optimality of averaging in

distributed statistical learning[J]. Information and Inference,

2016: iaw013 MLA

8. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/

binary.html#epsilon

9. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/

TalkingData 12

12 9 IBM

CRLKDD2015DSS2016

Dice

66
Twitter

Twitter

1. Data Availability
slide

2. Computation Power
GPUTPU

67

computation power

computation power

Overfit

3. Development in Algorithms
push the

boundary of machine learning

Scale

10 100 1000

68

10

Twitter Twitter

80% DAU 90%

Twitter

ads ranking

ads targeting

timeline rankingfeed ranking

anti-spam

recommendation

moments ranking

trends

Twitter Twitter

10 trillion weights 10

million training example 1 features

Feature Space

TB

Twitter RealtimeTwitter

realtimeTwitter is all about realtimelike news

69
eventsvideostrends Twitter

NBA

traffic

Twitter

Twitter

1. Ads Ranking
Twitter 1 1

Twitter CPC

Cost Per Click Model

Twitter

2. Timeline RankingFeed Ranking


Tweet

3. Recommendation

4. Anti-Spam
Abuse Detection Twitter

NSFW Detection

70

Twitter client team

prototype

pipeline

pipeline

( Twitter )

feature

transform( )model( )

onboarding

client team

team

client team

own maintain training pipeline serving runtime on call

client team

1.

71
2. workflow management

3. Online Serving

high QPSlow latency A/B testing 1%

traffic

launch 100%

Enable feature sharing across teams and make machine-learning platform

iteration very easy.

feature identifier to feature

value mapping 4 dense types

Binary

Continuous

Categorical

Text

2 sparse feature types

SparseBinary

72
SparseContinuous

feature identifier 64 feature id

feature id feature name hash

string

CPU feature id feature id

to feature name mapping feature id

metadata feature

id to feature name mapping

production scribe

join sampling transform persistent

storage

DataAPI

API

machine-learning task

20 30

Scala API fluent interface

feature id to feature name mapping

metadata keep consistency

Scala

FeatureSource

tweetTopic input

path filter/sample by 10% randomly

73
discretizer transform tweetTopic

join join key tweet id LeftJoin

hdfs database

API FeatureSource read

Trainer

trainer trainer

offline training pipeline large scale logistic

regression learner

1.Vowpal Wabbit
John Langford C++ trainer

74
2.Lolly
Twitter JVM online learning trainer

Twitter stack JVM JavaScala

learner Twitter Stack

discretizer Boosting treeGBDT

AdaBoostRandom forestMDL discretizer

Deep Learning torch Deep Learning

libraries

PredictionEngine

Twitter

PredictionEngine Large scale online

SGD learing Feeds Ranking

PredictionEngine

offline training PredictionEngine

application layer online serving PredictionEngine

online service layer layer RPC

Transform transform

feature vector

CrossCross

Cross

advertiser id id features

Cross effectively

personalized feature

Logistic Regression

Architecture

75
transform layer cross layer nonlinearity

logistic regression

transform layer cross layer

PredictionEngine

1.
model collocationmodel collocation

tweet

tweet

pair

5 5 RPC call

physical container call 5

5 prediction

Batch request API

pair Batch API

amortise cost for user feature

2.CPUCost

feature identifier id feature name

Transform sharing PredictionEngine

transform model collocation

tramsform

76
transform model Transform

feature cross done on the fly feature cross

cross

cross feature

cross cross

on the fly cross

3. Training/Serving throughput
trainer update

update

GB

Gigabyte training throughput

sharding 10

update

10 10 queue buffer 10

worker

training prediction

instance

training 1 instances

1 instances training

training training service

queue fanout

predition service instance

client call prediction

service client client

77
prediction service

prediction service prediction service

2000

1 CPU level

Level failover

traffic failover

CPU utilization 40%

4. Realtime feedback
feedback

positive training example

negative training example

negative training example

5. Fault tolerance
instances instances

78
snapshot instance

model snapshot load

anomaly traffic detection

pipeline queue queue positive training example

negative training example positive queue

negative training example

Twitter

on call on call on call

page 5

anomaly traffic detection traffic

page on

call on call

Tooling

Auto

Hyper-parameter Tuning

learning-rate

Random Search

hyper-parameter

parameter setting parameter

peformance

79

tooling

workflow management offline

Insight Interpretation
tool

Feature selection tool forward/backward greedy


search

Work in Progress

1.

RMatlabScikit-Learn

PredictionEngine Transform Cross

Logistic Regression

torch-based large scale

torch

2. feeds

Google

80
3. visualization interactive

exploration

Q&A

Twitter

Twitter90%

Twitter

Twitter 30%

Twitter business30%

1%2%

feature

business

30% CTO

share

81

Twitter

open source

Twitter

ads ranking

Twitter Timeline Ranking

82
1 11.11

1 Ranking Model

Ranking Model 4 (Universal

Ranking Model) (Region-based Ranking Model)

(Category-based Ranking Model)

(User-based Ranking Model)

training data -> learning

83
algorithm -> ranking model

user query -> top-k retrieval -> ranking

model -> results page

1 1

84

Context

Query

LTR

A/B Test 1

RankLib

RankNet, LambdaMART Random Forests

LambdaMART( )

(Supervised Learning)

LambdaMART Lambda MART

MARTMultiple Additive Regression Tree GBDT(Gradient Boosting

Decision Tree)

+ Lambda MART

LambdaMART

LambdaMART

85

2

<q, p, r-score> pair q

queryp q productr-score p

query , query

1- 2- 3- 4- 5-

query

CTR

query-product

query

query_1 3 a, b c

b b a

query

position bias

2 6

86
Click

2,3,4,5,6,7,8

4 NDCG 20%-30% NDCG@60 on

validation data NDCG ( 1

LTR Feature

2 1

2
1 Click

87

Query

Query Query

profile-based

HBase

Item Feature Repository

LambdaMART

<target> <qid> query

<feature> <value> <info>

<target> qid:<qid> <feature>:<value> <feature>:<value> ...


<feature>:<value> # <info>
3 10

# Query

LambdaMART

p a i r

deltaNDCGlambda

lambdaL

88
3 LambdaMART

regression lambda

shrinkageregularization

RankLib

java -jar ~/bin/RankLib.jar -train ~/train_all.csv -gmax 4 -tvs


0.8 -norm zscore -ranker 6 -metric2t NDCG@60 -tree 1000 -leaf
10 -shrinkage 0.1 -save ~/models/learned_lambdamart_model.mod

-train

4 LambdaMART

89
5 LambdaMART

-ranker6LambdaMART

-gmax45

{0,1,2,3,4}

-tvstrain

validation0.8:0.2

-normzscore

-metric2tERR@10

-treelambdamart1000

-leaflambdamart10

-shrinkagelambdamart0.1

-save

-save 5

90

java -jar ~/bin/RankLib.jar -load ~/models/learned_lambdamart_


model.mod -test ~/test_samples.csv -metric2T NDCG@60 -score ~/
rerank_scores.txt
-load -test

-metric2T -score

query

LTR

A B A B 40

NDCG

4 AverageOverlapScore

RBOScore LTR LikeNDCG1 LikeNDCG2

LTR

PC+IOS+Android

PC PC

LambdaMART

1.

2. 1tvs

3. 2leaf

4. 3treeshrinkage

4 6 PO1 1-8PO2

9-16 8 LTR

91
3 A

4 B

92
6 4
2 4

LambdaMART Training Data Validation Data

Click

RankLib

MySQL

LTR

A/B Test

93
AA LTR

LTR 2

LTR

CTR 2016 11.11 8.7%

11.11 4.5% ( LTR_1

LTR_2 )

eBay Google

94

1180 7 1500

920

1200 1207

95

96

97

98

99

Box-Cax

100

ARIMA

ARIMA

101

xgboost

if-then

[0,10]

9 7

102

xgboost GBDT

2 1

103
1 2

104

SVM

margin

k ( k=10)

2 10

2 9

10

105

RMSE

20

106
72

72

72 ( )

107

6 1 ARIMA

45 xgboost

108
RMSE

109

110

111

web PC

5 1

RMSERMSE

112

InfoQ

2016

C 1607
InfoQ
editors@cn.infoq.com
www.infoq.com.cn

113

You might also like