You are on page 1of 84

Kernel Methods and Support Vector

Machines

Dr. Juan Carlos Cuevas-Tello

Facultad de Ingeniera, UASLP


cuevastello@gmail.com
cuevas@uaslp.mx
Outline

AI Applications
Kernel Methods
Support Vector Machines
AI projects
Apple Amazon Google IBM Bioinformatics Startups ACM

Applications of the Artificial Intelligence:


Apple, Google, Amazon, IBM,
Bioinformatics, Startups, ACM Tech News

Dr. Juan Carlos Cuevas-Tello

Facultad de Ingenieria, UASLP


cuevastello@gmail.com
cuevas@uaslp.mx
Apple Amazon Google IBM Bioinformatics Startups ACM

Apple

https://techcrunch.com/2016/08/05/apple-acquires-turi-a-machine-learning-company/
Apple Amazon Google IBM Bioinformatics Startups ACM

Amazon

http://www.engadget.com/2016/03/23/amazon-secret-conference-of-the-future/
Apple Amazon Google IBM Bioinformatics Startups ACM

Google

http://techcrunch.com/2016/03/23/google-launches-new-machine-learning-platform/
Apple Amazon Google IBM Bioinformatics Startups ACM

Google

http://blog.kubernetes.io/2016/03/
scaling-neural-network-image-classification-using-Kubernetes-with-TensorFlow-Serving.
html
Apple Amazon Google IBM Bioinformatics Startups ACM

IBM

[1] Cognitive Computing, IBM brochure, 2016


Apple Amazon Google IBM Bioinformatics Startups ACM

IBM

[1] Cognitive Computing, IBM brochure, 2016


Apple Amazon Google IBM Bioinformatics Startups ACM

Bioinformatics

http://futurism.com/
ai-saves-womans-life-by-identifying-her-disease-when-other-methods-humans-failed/
Apple Amazon Google IBM Bioinformatics Startups ACM

Startups

https://www.cbinsights.com/blog/top-acquirers-ai-startups-ma-timeline/
Apple Amazon Google IBM Bioinformatics Startups ACM

ACM Tech News

Paper Date
Can Artificial Intelligence Predict Earthquakes? Feb17
(Machine learning and pattern recognition)
Japan announces AI supercomputer Feb17
(GPU, DNN, AI apps)
Chinas first deep learning lab ... Feb17
(DNN, national lab)
Chinas Artificial-Intelligence Boom Feb17
(AI labs: Baidu -> Google, Didi -> Uber, Tencent ->WeChat)
AI method allows to diagnose Alzheimers or Parkinsons Feb17
(DNN)
HPC Technique Propels Deep Learning at Scale Feb17
(DNN, HPC: OpenMPI -> SVAIL)
Apple Amazon Google IBM Bioinformatics Startups ACM

ACM Tech News

Paper Date
Thinking Deeply to Make Better Speech Mar17
(DNN, Speech as humans, WaveNet-DeepMind)
AI systems for air traffic controllers Mar17
(Automatic speech recognition)
AI system lets you control a robot with your mind Mar17
(EEG signals, Machine learning)
How to Upgrade Judges with Machine Learning Mar17
(Machine learning)
Kernel Methods

Kernel Methods

Dr. Juan Carlos Cuevas-Tello

Facultad de Ingenieria, UASLP


cuevastello@gmail.com
cuevas@uaslp.mx
Kernel Methods

Kernel

A kernel1 is a two variable function K (t 0 , t).


The basic idea is the transformation to the feature space

K (t 0 , t) = h (t 0 ), (t)i (1)

K : L L 7! < and t 0 , t 2 L; thus h, i denotes the dot


product.
The map : L 7! H is the so-called reproducing kernel
Hilbert space (RKHS).
Eq. (1) is known as kernel function 2 3

1
In operating computer systems, the concept of kernel is distinct since it means the core of the system.
2
C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273 297,1995
3
J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004
Kernel Methods

Kernel trick

This transformation allows us to deal with nonlinearity


through the linear space H (feature space)
It is also known as the kernel trick.

Figure from: Cuevas Tello, J.C., Hernndez-Ramrez, Daniel, Garca-Seplveda, Christian A. (2013), Support Vector
Machine algorithms in the search of KIR gene associations with disease, Computers in Biology and Medicine 43
(2013) 20532062
Kernel Methods

Gram matrix

Types of kernels including polynomial, Gaussian and


sigmoid kernels
Gaussian kernels4 : K (t 0 , t) = exp( |t, t 0 |2 /! 2 )
Because few parameters are involved (centres t 0 and width
!)
They are widely used in both theory and practice.
Given a training set T = {t1 , ..., tn } and a kernel function
K (, )
There is a Gram matrix

Kij = K (tj , ti ), for i, j = 1, ..., n. (2)

Kij is the core ingredient in the theory of kernel methods.


It is the main data structure in their implementation.
4
J. Bengio, o. Delalleau, and N. Le Roux. The curse of dimensionality for local kernel machines. Technical
Report 1258, Dpartement dInformatique et Recherche Oprationnelle, Universit de Montral, May 2005.
Kernel Methods

Representer theorem

The larger the training set T , the larger the Gram matrix Kij
Kernels are also considered to be memory-based
methods5
Nevertheless, one is able to create complicated kernels
from simple building blocks.
Moreover, there are other kernel constructions such as
graph kernels, string kernels, P-kernels among many
others6 .
For regression, the kernels are represented as (known as
representer theorem7 )
n
X
f (t) = j K (tj , t) (3)
j=1
5
S. Haykin. Neural Networks: a Comprehensive Foundation. Prentice Hall, 1999.
6
J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004
7
A. J. Smola and B. Schlkopf. On a kernel-based method for pattern recognition, regression, approximation
and operator inversion. Algorithmica, 22:211231, 1998. Technical Report 1064, GMD FIRST, April 1997.
Kernel Methods

Linear combination of kernels

Pn
f (t) = j=1 j K (tj , t)
It is a linear combination of kernels (basis functions) and
j 2 < are the kernel weights.
For example8 , let T = {10, 15, 25, 32, 38, 43}
K (tj , t) = exp( |t tj |2 /! 2 ) be a training set of size six and
a Gaussian kernel with ! = 3, respectively.
The time t goes from 0 to 50 with a resolution of 0.1, and
= [0.5, 1, 0.5, 1, 1.5, 1].
This set of Gaussian kernels K (tj , ) scaled by j .
T has the centres of Gaussian functions and contains
the heights.

8
Cuevas-Tello, J.: Estimating time delays between irregularly sampled time series. Ph.D. thesis, School of
Computer Science, University of Birmingham (2007). http://etheses.bham.ac.uk/88/
Kernel Methods

The set of Gaussian kernels (basis functions)

A set of 6 Gaussians of width =3


4

3.5

2.5 f(t)
K(tj,t) ; f(t)

1.5

0.5

0
0 5 10 15 20 25 30 35 40 45 50
Time

Now it is clear that if one wants to fit an arbitrary curve


g(t), we may do it through f (t) in (3).
Kernel Methods

The set of Gaussian kernels (basis functions)

We fixed centres and weights .


However, the most common is to locate centres at
observations ti
But, kernels are quite flexible and they can be located
anywhere.
The weights , or heights, play an important role, and here
is where the concept of learning is involved.
Kernel Methods

Self-organized selection centers

As with ordinary radial-basis functions, a smaller


number of normalized radial-basis functions can be
used, with their centers treated as free parameters to
be chosen according to some heuristic (Moody and
Darken, 1999) or determined in a principled manner
(Poggio and Girosi, 1990a)"

(Haykin 1999, pp. 298).


For self organized learning process we need s
clustering algorithm that partitions the given set of
data points into subgroups. . . such algorithm is
k -means clustering algorithm. . . "

(Haykin 1999, pp. 299).

S. Haykin. Neural Networks: a Comprehensive Foundation. Prentice Hall, 1999.


Kernel Methods

The set of Gaussian kernels (basis functions)

MATLAB Example
% Plot a set of Gaussian kernels
% Aug06 jcct
clear all;
close all;
sigma=3; % width of Gaussians
t_min = 0; % min time
t_max = 50; % max time
inc = 0.1; % resolution of time
ct = [10 15 25 32 38 43]; % centres of Gaussians
alpha = [0.5 1 0.5 1 1.5 1];
n = size(ct,2);
m = size(t_min:inc:t_max,2);
figure;
hold on;
g = zeros(1,m); % Generation of a set of Gaussian functions, and get them together
for j=1:n,
idx =1;
for i = t_min:inc:t_max,
f(idx) = alpha(j)*exp(-abs(i-ct(j))^2/sigma^2); % Gauss function, mean=j, and std=sigma
idx = idx + 1;
end
plot(t_min:inc:t_max,f,k);
g = g + f;
end
plot(t_min:inc:t_max,g+2,k);
box;
xlabel(Time);
ylabel(K(t_j,t) ; f(t));
title([A set of ,num2str(n), Gaussians of width \omega=,num2str(sigma)])
text(2,2.5,f(t));
Kernel Methods

Kernel function in MATLAB

% INPUT:
% x -> the time, according to the real data
% n -> Points per gaussian function, quantity of points in real data
% c -> Center of kernels
% m -> Kernel number
% d -> Distance at each point with their neighbors (left-right), kernelswidth
%
% OUTPUT
% K_c -> matrix (m x n) (kernels x points)
% JCCT Mar17

function [K_c] = K1(x,n,c,m,d)


% K(c,x)=e^(-|x-c|^2/sigma^2) ; c is the kernels center, and x is the time
for j = 1:m, % each Kernel
for i = 1:n, % all points, in real scale are x(i)
K_c(j,i) = exp(-abs(x(i)-c(j))^2/d(j)^2); % d(j) is sigma, at each center of Kernel c(j)
end
end
Kernel Methods

Set of Gaussian with Kernel function in MATLAB

% Plot a set of Gaussian kernels


% Mar17 jcct
clear all;
close all;

sigma = 3; % width of Gaussians

t_min = 0; % min time


t_max = 50; % max time
inc = 0.1; % resolution of time
t = t_min:inc:t_max; %time
n = size(t,2); % number of samples/points

ct = [10 15 25 32 38 43]; % centres of Gaussians


alpha = [0.5 1 0.5 1 1.5 1]; % weights of Gaussians
m = size(ct,2); % number of kernels

Gram_matrix = K1(t,n,ct,m,ones(1,n).*sigma);

figure;
plot(Gram_matrix);
xlabel(Samples);
ylabel(Magnitude);

figure;
hold on;
f = zeros(1,n); % sum set of Gaussian with weights
for i=1:m,
f = f + alpha(i) .* Gram_matrix(i,:);
plot(t,alpha(i) .* Gram_matrix(i,:),k);
end
plot(t,f,b);
xlabel(Time);
ylabel(Magnitude);
Kernel Methods

Online learning vs batch learning

On-line learning is processing the training data one at a


time as it is received9 .
In real-time applications is a very important issue.
In the above kernel formulation, we process all training
data at once, which is batch learning.
Again, given a training set T = {t1 , ..., tn } and a kernel
function K (, ), the Gram matrix is straightforward.
But is not, and it needs to be learned from the data.
There are a number of different directions, but these can
be grouped in two: eigen-decompositions and convex
optimizations10 .

9
Y. Engel and S. Mannor. The Kernel Recursive Least-Squares Algorithm. IEEE Transactions on Signal
Processing, 52(8):2275285, Aug 2004.
10
J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004
Kernel Methods

Online learning vs batch learning

The latter leads to Support Vector Machines11 (SVM).


For regression, eigen-decompositions are used.
The concept of kernel machines comes from learning
machines12 and SVM13 .

11
T. Hastie, R Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and
Prediction. Springer, 2001
12
K. Brown. Diversity in Neural Networks Ensembles. PhD thesis, School of Computer Science, University of
Birmingham, UK, 2004.
13
J. Bengio, o. Delalleau, and N. Le Roux. The curse of dimensionality for local kernel machines. Technical
Report 1258, Dpartement dInformatique et Recherche Oprationnelle, Universit de Montral, May 2005.
Kernel Methods

Time series and regression

A model of the observed data is represented as a time


series x(ti ) = h(ti ) + "(ti ), where ti , i = 1, 2, ..., n are
discrete observation times.
The observation errors "(ti ) can be modelled as zero-mean
Normal distributions N(0, (ti ))
The kernel-based model of the observed data is
N
X
h(ti ) = j K (cj , ti ) (4)
j=1

This function is a linear superposition of N kernels K (, )


centred at cj , and the model has N free parameters j that
need to be determined by (learned from) the data, where
j = 1, 2, ..., N.
With Gaussian kernels, the kernel width ! > 0 determines
the degree of smoothness.
We position kernels on all observations, i.e., N = n.
Kernel Methods

Learning weights by eigen-decompositions

Eq. 4 can be rewritten as

K = x, (5)

where = (1 , 2 , ..., N )T , and


2 3 2 x(t1 )
3
K (c1 , t1 ) K (cN , t1 ) (t1 )
6 .. .. .. 7 6 .. 7
K=4 . . . 5, x = 6
4 .
7.
5 (6)
K (c1 , tn ) K (cN , tn ) x(tn )
(tn )

Hence,
= K+ x. (7)
Kernel Methods

Matrix inverse or Pseudoinverse

=K 1x or = K+ x
We regularise the inversion in (7) through singular value
decomposition (SVD).
K = U W VT , and K+ = V [diag(1/wi )] UT is the pseudo
inverse14 15 (or Moore-Penrose inverse).
SVD has some interesting properties such as W is a
diagonal matrix with positive or zero elements (known as
singular values).
where wi 2 W, U and V are orthogonal so
UT U = VT V = 1, and V is also square and
row-orthonormal; i.e., V V = 1.

14
G.H. Golub and C.F. Van Loan. Matrix Computations. The Johns Hopkins University Press, second edition,
1989
15
W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical Recipes in C++: The Art of
Scientific Computing. Cambridge University Press, 2nd edition, 2002
Kernel Methods

Fitting

Given the observed data, the likelihood16 of our model


reads
Yn
P(Data | Model) = p(x | {j }), (8)
i=1
n o
1 (x(ti ) h(ti ))2
p(x(ti ) | {j }) = 2 2 (t ) exp 2 2 (ti )
i

The negative log-likelihood (without constant terms)


simplifies to
n
X (x(ti ) h(ti ))2
Q= 2 (t )
(9)
i=1 i

This is the goodness of fit between the observed data and


the model.
16
T. Hastie, R Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and
Prediction. Springer, 2001
Kernel Methods

Example time series in astronomy - Artificial data DS-5

DS51G0N0.dat with error bars of 0.106% DS51G51N0.dat with error bars of 0.466%
17.8
A A
B B
17.8
17.7

17.7
17.6
17.6

17.5
17.5
mag

mag
17.4 17.4

17.3
17.3

17.2
17.2
17.1

17.1
17

17 16.9
0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50
time time
Kernel Methods

Load time series data in MATLAB

% Time series example


% by JCCT Mar17
% Data: DS-5

clear all;
close all;

d1 = load(ArtificialData/DS-5-1-GAP-0-1-N-0_v2.dat);
figure;
plot(d1(:,1),d1(:,2),.-k);
xlabel(time);
ylabel(mag);
title(DS-5-1-GAP-0-1-N-0);

d2 = load(ArtificialData/DS-5-1-GAP-1-1-N-1_v2.dat);
figure;
plot(d2(:,1),d2(:,2),.-k);
xlabel(time);
ylabel(mag);
title(DS-5-1-GAP-1-1-N-1);
Kernel Methods

Example time series

a) DS-5-1-G-0-1-N-0 b) DS-5-1-G-1-1-N-1
DS51GAP01N0 DS51GAP11N1
17.9 17.9

17.8 17.8

17.7 17.7

17.6 17.6

17.5 17.5
mag

mag
17.4 17.4

17.3 17.3

17.2 17.2

17.1 17.1

17 17
0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50
time time
Kernel Methods

Perform regression in MATLAB

% Perform regression on d2
sigma = 3; % width of Gaussians
t = d2(:,1); %time
n = size(t,1); % number of samples/points

ct = t; % centres of Gaussians at observations


m = size(ct,1); % number of kernels

Gram_matrix = K1(t,n,ct,m,ones(1,m).*sigma);

x = d2(:,2); % Observed data (mag)


alpha = pinv(Gram_matrix)*x; % Learning weights

h = alpha*Gram_matrix; % Kernel-based model

error = mean((h - x).^2); % MSE

figure;
hold on;
plot(d1(:,1),d1(:,2),*-g);
plot(d2(:,1),d2(:,2),.-k);
plot(t,h,.-b);
legend([DS-5-1-GAP-0-1-N-0;DS-5-1-GAP-1-1-N-1;Kernel-based model],0);
xlabel(time);
ylabel(mag);
title([Observed data, kernel-based model MSE = ,num2str(error)]);
box on;
Kernel Methods

Figures regression

Observed data, kernelbased model MSE = 1.6874e005 Observed data, kernelbased model MSE = 1.6874e005
17.9
DS51GAP01N0 17.52 DS51GAP01N0
DS51GAP11N1 DS51GAP11N1
17.8 Kernelbased model 17.5 Kernelbased model

17.48
17.7

17.46
17.6
17.44

17.5
17.42
mag

mag
17.4 17.4

17.38
17.3

17.36
17.2
17.34

17.1
17.32

17
0 5 10 15 20 25 30 35 40 45 50 14 16 18 20 22 24 26 28 30 32
time time
Kernel Methods

Figures regression

Observed data, kernelbased model MSE = 1.6874e005


17.52 DS51GAP01N0
DS51GAP11N1
17.5 Kernelbased model

17.48

17.46

17.44

17.42
mag

17.4

17.38

17.36

17.34

17.32

14 16 18 20 22 24 26 28 30 32
time
Kernel Methods

Perform reconstruction in MATLAB

% Reconstruction at all points


t1 = d1(:,1); %time
n1 = size(t1,1); % number of samples/points
Gram_matrix1 = K1(t1,n1,ct,m,ones(1,m).*sigma);
h1 = alpha*Gram_matrix1; % Kernel-based model
error1 = mean((h1 - d1(:,2)).^2); % MSE

miss = [];
figure;
hold on;
plot(d1(:,1),d1(:,2),*-g);
plot(d2(:,1),d2(:,2),.-k);
plot(t1,h1,.-b);
legend([DS-5-1-GAP-0-1-N-0;DS-5-1-GAP-1-1-N-1;Kernel-based model],0);
for i=1:n1,
if sum(t1(i)==t)==0,
plot(t1(i),h1(i),ob);
end
end
%plot(t,h,*-k);
legend([DS-5-1-GAP-0-1-N-0;Kernel-based model]);
xlabel(time);
ylabel(mag);
title([Observed data, kernel-based model MSE = ,num2str(error1)]);
box on;
Kernel Methods

Figures reconstruction

Observed data, kernelbased model MSE = 2.2963e005 Observed data, kernelbased model MSE = 2.2963e005
17.9
DS51GAP01N0 DS51GAP01N0
DS51GAP11N1 DS51GAP11N1
Kernelbased model 17.5
17.8 Kernelbased model

17.48
17.7
17.46

17.6
17.44

17.5 17.42
mag

mag
17.4
17.4

17.38
17.3
17.36

17.2
17.34

17.1 17.32

17.3
17
0 5 10 15 20 25 30 35 40 45 50 14 16 18 20 22 24 26 28 30 32
time time
Kernel Methods

Figures reconstruction

Observed data, kernelbased model MSE = 2.2963e005

DS51GAP01N0
17.5 DS51GAP11N1
Kernelbased model

17.48

17.46

17.44

17.42
mag

17.4

17.38

17.36

17.34

17.32

17.3
14 16 18 20 22 24 26 28 30 32
time
SVM

Support Vector Machines

Dr. Juan Carlos Cuevas-Tello

Facultad de Ingenieria, UASLP


cuevastello@gmail.com
cuevas@uaslp.mx
SVM

Kernel methods versus SVM

A SVM classifier is defined as


X
f (x) = sgn( i K (xi , x)) (1)
i2SVs
SVM

Definition

Support Vector Machines (SVM) were introduced by


Cortes and Vapnik1 as support-vector networks.
The SVM term is widely divulged by Cristianini and
Shawe-Taylor2
SVM have been extensively described by research groups
in statistical learning3 and kernel machines.
SVM have been proposed for classification, but they are
also used for regression.
SVM excel over other classification algorithms thanks to
their capacity to perform multivariate classifications in a
non-linear manner.

1
C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273 297,1995
2
Cristianini, N. and Shawe-Taylor, J. (2000). Support Vector Machines and other kernel-based learning
methods. Cambridge University Press.
3
T. Hastie, R Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and
Prediction. Springer, 2001
SVM

Linear classifier

The classification problem can be restricted to


consideration of the two-class problem without loss of
generality.
One goal is to separate the two classes by a function
which is induced from available examples.
Another goal is to produce a classifier that will work well on
unseen examples (generalization).

Gunn, S. (1998). Support vector machines for classification and regression. Technical report, University of
Southampton. http://www.isis.ecs.soton.ac.uk/resources/svminfo/
SVM

The margin

Here there are many possible linear classifiers that can


separate the data.
But there is only one that maximizes the margin
(maximizes the distance between it and the nearest data
point of each class).
This linear classifier is termed the optimal separating
hyperplane: hw, x i i + b = 0
SVM

The margin

Consider the problem of separating the set of training


vectors belonging to two separate classes:
D = {(x1 , x2 ), , (x l , y l )}, where x 2 R, y 2 { 1, +1}
The set of vectors is said to be optimally separated by the
hyperplane.
If it is separated without error and the distance between
the closest vector to the hyperplane is maximal.
SVM

The margin

Hence the hyperplane that optimally separates the data is


the one that minimizes: (w) = ||w||2
Pl
(w, b, ) = 12 ||w||2 i i
i=1 i (y [hw, x i + b] 1)
where are the Lagrange multipliers.
The Lagrangian has to be minimized with respect to w, b
and maximized with respect to 0
The solution to the problem
Pl isPgiven by Pl
1 2 l
= arg min 2 ||w||

i=1 j=1 i j yi yj hxi , xj i k =1 k
with the constrains P
i 0, i = 1, , l and lj=1 j yj = 0
SVM

Generalization in high dimensional future space

For linearly non-separable data, the optimization problem


becomes:
l X
X l l
X
1
= arg min ||w||2

i j yi yj K (xi , xj ) k
2
i=1 j=1 k =1
(2)
K (xi , xj ) is the kernel function performing the non-linear
mapping into feature space.
SVM

Non-linear classifier

The
Pl constrains unchanged: i 0, i = 1, , l and
j=1 j yj = 0
The classifier implementing the optimal separating
hyperplane in the feature space is given by
X
f (x) = sgn( i yi K (xi , x) + b) (3)
i2SVs

where
sgn : R ! { 1, 0, 1}
(4)
x ! y = sgn(x)
The Support Vectors (SVs) will have non-zero Lagrange
multipliers, i
SVM

SVM classifier

If the Kernel contains the bias term, the classifier is simply


X
f (x) = sgn( i K (xi , x)) (5)
i2SVs

which is a linear combination of kernels, as kernel


methods,
where the sign function (sgn) gives the class.
SVM

C parameter

The uncertain part of Cortess approach is that the


coefficient C has to be determined.
This parameter introduces additional capacity control
within the classifier.
It can be directly related to a regularization parameter
(Girosi, 1997; Smola and Schlkopf, 1998; Blanz et al.,
1996)
C must be chosen to reflect the knowledge of the noise.
Finally, it controls the trade-off between miss-classification
and the size of the SVM margin.
SVM

XOR with SVM in MATLAB

% XOR classification problem with SVM


X = [0 0; 0 1; 1 0; 1 1]; % INPUT
Y = [1 ; 0 ; 0 ; 1 ]; % OUTPUT

ker=rbf; % RBF as basis functions (Gaussian)


sigma = 0.3; % Kernel width, sigma
C = 5; % Miss-classification/Margin parameter

[nsv, alpha, b0] = svc(X,Y,ker,C,sigma); % Create/train support vector machine


svcplot(X,Y,ker,alpha,b0,sigma); % plot results
X_test = X; % Input: test/predict output
Y_svm = svcoutput(X,Y,X_test,ker,alpha,b0,sigma,1) % Predicted output
error = mean(power(Y-Y_svm,2)) % shows the classification error as MSE

Gunn, S. (1998). Support vector machines for classification and regression. Technical report, University of

Southampton. http://www.isis.ecs.soton.ac.uk/resources/svminfo/
SVM

XOR with SVM in MATLAB

1
x2

0
0 1
x1
XOR problem svcplot(X,Y,ker,alpha,b0,sigma)
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Artificial Intelligence Projects

Dr. Juan Carlos Cuevas-Tello

Facultad de Ingenieria, UASLP


cuevastello@gmail.com
cuevas@uaslp.mx
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Outline

Astronomy and Astrophysics


Computer Vision
Speech Recognition
Bioinformatics
Fault diagnosis
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Astronomy and Astrophysics

In collaboration with
Dr. Peter Tino, School of Computer Science, University of
Birmingham, UK
Dr. Ilya Mandel, School of Physics and Astronomy,
University of Birmingham, UK

Sultanah AL Otaibi; Peter Tino; Juan C. Cuevas-Tello; Ilya Mandel; Somak Raychaudhury (2016) Kernel regression
estimates of time delays between gravitationally lensed fluxes, Monthly Notices of the Royal Astronomical Society,
doi: 10.1093/mnras/stw510 ISSN: 0035-8711 http://arxiv.org/abs/1508.03439
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Artificial Intelligence Algorithms

Kernel Methods and Support Vector Machines


Artificial Neural Networks for Regression
Genetic Algorithms for Parameter Optimization
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Gravitational lensing and time delay problem

Gonzalez-Grimaldo, R.A., Cuevas-Tello, J.C., (2008) Analysis of Time Series with Neural Networks.
Mexican International Conference on Artificial Intelligence, IEEE Computer Society Proceedings,
pp. 131-137
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Kernel Method and Fitness landscape non linear optimization problem


PN
hA (ti ) = j=1 j K (cj , ti )
is the underlying" light curve that
P
underpins image A, whereas hB (ti ) = N j=1 j K (cj + , ti ) is a
time-delayed (by ) version of hA (ti ) underpinning image B.

Cuevas-Tello, J.: Estimating time delays between irregularly sampled time series. Ph.D. thesis,
School of Computer Science, University of Birmingham (2007).
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Population and Simple GA

A.J. Chipperfield, P.J. Fleming, H. Pohlheim, C.M. Fonseca, Genetic Algorithm Toolbox for use with MATLAB, first
ed., Automatic Control and Systems Engenieering, University of Sheffield, 1996

Cuevas-Tello, J., Tino, P., Raychaudhury, S., Yao, X., Harva, M.: Uncovering delayed patterns in noisy and irregularly
sampled time series: an astronomy application. Pattern Recognition 3(43), 11651179 (2010)
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Population and Simple GA

Time Delay Challenge


Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Computer Vision

In collaboration with
Dr. Cesar A. Puente Montejano, UASLP
Dr. J. Ignacio Nunez Varela, UASLP
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Artificial Intelligence Algorithms

Artificial Neural Networks


Backpropagation FFNN
Deep Neural Networks (DNN)
Probabilistic Neural Networks (PNN)
Support Vector Machines (SVM)
Statistical Learning
Bayesian Methods
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Example: Figure recognition, supervised learning on labeled data

Image: [16x16] 32 hidden neurons, threshold=0.8


Inputs
Hidden neurons Outputs

x16
Fig30 0 1
..
. .. ..
. .
x2
Fig1 1 0
x1
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

DNNs & MNIST database

DNNs for handwritten digits recognition


10 label units
WL
RBM2 800 hidden units

RBM1 800 hidden units


Train: 60,000 examples
2828 pixels (patterns)
Visible units digit image
Test: 10,000 examples
Tanaka et al. (2014)
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Smart Vision Group UASLP


Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Speech Recognition

In collaboration with
Dr. Manuel Valenzuela, ITESM Mty
Dr. Juan Nolazco, ITESM Mty
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Artificial Intelligence Algorithms

Backpropagation FFNN
Probabilistic Neural Networks (PNN)
Deep Neural Networks (DNN)
Gaussian Mixture Models (GMM)
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Speech Recognition

Biometric Recognition Systems

Fingerprints, voice and face are specific to an individual.


Contrary to passwords and PIN cannot be forgotten and
easily stolen.
Human speech is the least obtrusive biometric measure.
Simple to acquire thanks to the pervasive way in society.
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Speech Processing

Speech
Recogni-
tion
Language
Identifica-
tion
Speech
Recognition
Processing
Speaker
Recogni-
tion

Detection
Verifica- Identi-
Analysis tion fication
Coding or
Synthesis
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Speech Processing

Speaker Recognition

Speaker Verfication
Verify a persons claimed identity from his voice.
The identity claim includes entering a employee number,
smart card and others.
Speaker Identification
Deciding if the speaker is either a specific person.
Either belongs to a group of persons or is unknown.
There is no a priori identity claim.
(Campbell 1997, Cieri et al. 2014)
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Speech Processing

MFCCs

A time-domain sampled acoustic waveform (audio)


SRE08 Model ID 10058 (tcsns) Channel A
0.5

0.4

0.3

0.2

0.1
amplitude

0.1

0.2

0.3

0.4

0.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5


time (min)
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Speech Processing

MFCCs

A time-domain sampled acoustic waveform (audio)


features (MFCCs)
SRE08 Model ID 10058 (tcsns) Channel A
0.5

C1
0.4

0.3

0.2

0.1
...
amplitude

0.1
1
0.2

0.3
...
0.4

0.5

0 0.5 1 1.5 2 2.5


time (min)
3 3.5 4 4.5 5
1

Mel-Frequency spaced Cepstral Coefficients(MFCCs)


Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Speech Processing

Classification

MFCCs: Target user:


[3000, 48]
P ANN Tc
[3000, 5]

0 1
P=P1 P2 P5
B [3000 48] [3000 48] [3000 48]C
B C
B
Data B .
. C
+ + . + C
B C
@Tc = class = 1 class = 2 class = 2 A
[3000 5] [3000 5] [3000 5]
(1)
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Bioinformatics

In collaboration with
Dr. Christian A. Garcia Sepulveda, Fac. Medicina, UASLP
MSc. J. Salomon Altamirano, PhD Student, UASLP
MSc. D. Alejandro Glz. Bandala, PhD Student, UASLP
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Artificial Intelligence Algorithms

Decision Trees (ID3, J48)


Data mining
Apriori algorithm
Support Vector Machines (SVM)
Artificial Neural Networks
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

DNA - KIR genes - innate immune system


Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Bioinformatics: classification

Cuevas Tello, J.C., Hernandez-Ramirez, Daniel, Garcia-Sepulveda, Christian A. (2013) Support Vector Machine
algorithms in the search of KIR gene associations with disease, Computers in Biology and Medicine 43
(2013) 20532062
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Bioinformatics: data mining

J. Gilberto Rodriguez-Escobedo, Christian A. Garcia-Sepulveda, and Juan C. Cuevas-Tello (2015) KIR Genes and
Patterns Given by the A Priori Algorithm: Immunity for Haematological Malignancies, Computational and
Mathematical Methods in Medicine, vol. 2015, Article ID 141363, 11 pages, 2015. doi:10.1155/2015/141363
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Computational forecasting of infectious disease dynamics


Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Fault diagnosis
In collaboration with
Dr. Ciro A. Nunez Gtz, Electrical Engineering, UASLP
Dr. Nancy Visairo Cruz, Electrical Engineering, UASLP
M.Sc. Eugenio Camargo Trigueros, PhD student, Electrical
Engineering, UASLP
Eng. Juan Jose Acosta E. (2016)
Manuel Alejandro Gomez Vazquez, B.Tech student,
Informatics Engineering
Cristian Garcia Huerta, B.Tech student, Computer
Engineering
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Artificial Intelligence Algorithms

Backpropagation FFNN
General Regression Neural Networks (GRNN)
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Fault diagnosis
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis

Questions?

Questions?

You might also like