Press Uaemex

Kernel Methods and Support Vector
Machines
Dr. Juan Carlos Cuevas-Tello
Facultad de Ingeniera, UASLP

cuevastello@gmail.com
cuevas@uaslp.mx
Outline
AI Applications
Kernel Methods
Support Vector Machines
AI projects
Apple Amazon Google IBM Bioinformatics Startups ACM
Applications of the Artificial Intelligence:

Apple, Google, Amazon, IBM,
Bioinformatics, Startups, ACM Tech News
Facultad de Ingenieria, UASLP

cuevas@uaslp.mx
Apple
https://techcrunch.com/2016/08/05/apple-acquires-turi-a-machine-learning-company/
Amazon
http://www.engadget.com/2016/03/23/amazon-secret-conference-of-the-future/
Google
http://techcrunch.com/2016/03/23/google-launches-new-machine-learning-platform/
Google
http://blog.kubernetes.io/2016/03/
scaling-neural-network-image-classification-using-Kubernetes-with-TensorFlow-Serving.
html
IBM
[1] Cognitive Computing, IBM brochure, 2016

IBM
[1] Cognitive Computing, IBM brochure, 2016

Bioinformatics
http://futurism.com/
ai-saves-womans-life-by-identifying-her-disease-when-other-methods-humans-failed/
Startups
https://www.cbinsights.com/blog/top-acquirers-ai-startups-ma-timeline/
ACM Tech News
Paper Date
Can Artificial Intelligence Predict Earthquakes? Feb17
(Machine learning and pattern recognition)
Japan announces AI supercomputer Feb17
(GPU, DNN, AI apps)
Chinas first deep learning lab ... Feb17
(DNN, national lab)
Chinas Artificial-Intelligence Boom Feb17
(AI labs: Baidu -> Google, Didi -> Uber, Tencent ->WeChat)
AI method allows to diagnose Alzheimers or Parkinsons Feb17
(DNN)
HPC Technique Propels Deep Learning at Scale Feb17
(DNN, HPC: OpenMPI -> SVAIL)
ACM Tech News
Paper Date
Thinking Deeply to Make Better Speech Mar17
(DNN, Speech as humans, WaveNet-DeepMind)
AI systems for air traffic controllers Mar17
(Automatic speech recognition)
AI system lets you control a robot with your mind Mar17
(EEG signals, Machine learning)
How to Upgrade Judges with Machine Learning Mar17
(Machine learning)
Kernel Methods
Kernel Methods

cuevas@uaslp.mx
Kernel Methods
Kernel
A kernel1 is a two variable function K (t 0 , t).

The basic idea is the transformation to the feature space
K (t 0 , t) = h (t 0 ), (t)i (1)
K : L L 7! < and t 0 , t 2 L; thus h, i denotes the dot

product.
The map : L 7! H is the so-called reproducing kernel
Hilbert space (RKHS).
Eq. (1) is known as kernel function 2 3
1
In operating computer systems, the concept of kernel is distinct since it means the core of the system.
2
C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273 297,1995
3
J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004
Kernel Methods
Kernel trick
This transformation allows us to deal with nonlinearity

through the linear space H (feature space)
It is also known as the kernel trick.
Figure from: Cuevas Tello, J.C., Hernndez-Ramrez, Daniel, Garca-Seplveda, Christian A. (2013), Support Vector
Machine algorithms in the search of KIR gene associations with disease, Computers in Biology and Medicine 43
(2013) 20532062
Kernel Methods
Gram matrix
Types of kernels including polynomial, Gaussian and

sigmoid kernels
Gaussian kernels4 : K (t 0 , t) = exp( |t, t 0 |2 /! 2 )
Because few parameters are involved (centres t 0 and width
!)
They are widely used in both theory and practice.
Given a training set T = {t1 , ..., tn } and a kernel function
K (, )
There is a Gram matrix
Kij = K (tj , ti ), for i, j = 1, ..., n. (2)
Kij is the core ingredient in the theory of kernel methods.

It is the main data structure in their implementation.
4
J. Bengio, o. Delalleau, and N. Le Roux. The curse of dimensionality for local kernel machines. Technical
Report 1258, Dpartement dInformatique et Recherche Oprationnelle, Universit de Montral, May 2005.
Kernel Methods
Representer theorem
The larger the training set T , the larger the Gram matrix Kij
Kernels are also considered to be memory-based
methods5
Nevertheless, one is able to create complicated kernels
from simple building blocks.
Moreover, there are other kernel constructions such as
graph kernels, string kernels, P-kernels among many
others6 .
For regression, the kernels are represented as (known as
representer theorem7 )
n
X
f (t) = j K (tj , t) (3)
j=1
5
S. Haykin. Neural Networks: a Comprehensive Foundation. Prentice Hall, 1999.
6
7
A. J. Smola and B. Schlkopf. On a kernel-based method for pattern recognition, regression, approximation
and operator inversion. Algorithmica, 22:211231, 1998. Technical Report 1064, GMD FIRST, April 1997.
Kernel Methods
Linear combination of kernels
Pn
f (t) = j=1 j K (tj , t)
It is a linear combination of kernels (basis functions) and
j 2 < are the kernel weights.
For example8 , let T = {10, 15, 25, 32, 38, 43}
K (tj , t) = exp( |t tj |2 /! 2 ) be a training set of size six and
a Gaussian kernel with ! = 3, respectively.
The time t goes from 0 to 50 with a resolution of 0.1, and
= [0.5, 1, 0.5, 1, 1.5, 1].
This set of Gaussian kernels K (tj , ) scaled by j .
T has the centres of Gaussian functions and contains
the heights.
8
Cuevas-Tello, J.: Estimating time delays between irregularly sampled time series. Ph.D. thesis, School of
Computer Science, University of Birmingham (2007). http://etheses.bham.ac.uk/88/
Kernel Methods
The set of Gaussian kernels (basis functions)
A set of 6 Gaussians of width =3

4
3.5
2.5 f(t)
K(tj,t) ; f(t)
1.5
0.5
0
0 5 10 15 20 25 30 35 40 45 50
Time
Now it is clear that if one wants to fit an arbitrary curve

g(t), we may do it through f (t) in (3).
Kernel Methods
We fixed centres and weights .

However, the most common is to locate centres at
observations ti
But, kernels are quite flexible and they can be located
anywhere.
The weights , or heights, play an important role, and here
is where the concept of learning is involved.
Kernel Methods
Self-organized selection centers
As with ordinary radial-basis functions, a smaller

number of normalized radial-basis functions can be
used, with their centers treated as free parameters to
be chosen according to some heuristic (Moody and
Darken, 1999) or determined in a principled manner
(Poggio and Girosi, 1990a)"
(Haykin 1999, pp. 298).

For self organized learning process we need s
clustering algorithm that partitions the given set of
data points into subgroups. . . such algorithm is
k -means clustering algorithm. . . "
(Haykin 1999, pp. 299).
S. Haykin. Neural Networks: a Comprehensive Foundation. Prentice Hall, 1999.

Kernel Methods
MATLAB Example
% Plot a set of Gaussian kernels
% Aug06 jcct
clear all;
close all;
sigma=3; % width of Gaussians
t_min = 0; % min time
t_max = 50; % max time
inc = 0.1; % resolution of time
ct = [10 15 25 32 38 43]; % centres of Gaussians
alpha = [0.5 1 0.5 1 1.5 1];
n = size(ct,2);
m = size(t_min:inc:t_max,2);
figure;
hold on;
g = zeros(1,m); % Generation of a set of Gaussian functions, and get them together
for j=1:n,
idx =1;
for i = t_min:inc:t_max,
f(idx) = alpha(j)*exp(-abs(i-ct(j))^2/sigma^2); % Gauss function, mean=j, and std=sigma
idx = idx + 1;
end
plot(t_min:inc:t_max,f,k);
g = g + f;
end
plot(t_min:inc:t_max,g+2,k);
box;
xlabel(Time);
ylabel(K(t_j,t) ; f(t));
title([A set of ,num2str(n), Gaussians of width \omega=,num2str(sigma)])
text(2,2.5,f(t));
Kernel Methods
Kernel function in MATLAB
% INPUT:
% x -> the time, according to the real data
% n -> Points per gaussian function, quantity of points in real data
% c -> Center of kernels
% m -> Kernel number
% d -> Distance at each point with their neighbors (left-right), kernelswidth
%
% OUTPUT
% K_c -> matrix (m x n) (kernels x points)
% JCCT Mar17
function [K_c] = K1(x,n,c,m,d)

% K(c,x)=e^(-|x-c|^2/sigma^2) ; c is the kernels center, and x is the time
for j = 1:m, % each Kernel
for i = 1:n, % all points, in real scale are x(i)
K_c(j,i) = exp(-abs(x(i)-c(j))^2/d(j)^2); % d(j) is sigma, at each center of Kernel c(j)
end
end
Kernel Methods
Set of Gaussian with Kernel function in MATLAB
% Plot a set of Gaussian kernels

% Mar17 jcct
clear all;
close all;
sigma = 3; % width of Gaussians
t_min = 0; % min time

t_max = 50; % max time
inc = 0.1; % resolution of time
t = t_min:inc:t_max; %time
n = size(t,2); % number of samples/points
ct = [10 15 25 32 38 43]; % centres of Gaussians

alpha = [0.5 1 0.5 1 1.5 1]; % weights of Gaussians
m = size(ct,2); % number of kernels
Gram_matrix = K1(t,n,ct,m,ones(1,n).*sigma);
figure;
plot(Gram_matrix);
xlabel(Samples);
ylabel(Magnitude);
figure;
hold on;
f = zeros(1,n); % sum set of Gaussian with weights
for i=1:m,
f = f + alpha(i) .* Gram_matrix(i,:);
plot(t,alpha(i) .* Gram_matrix(i,:),k);
end
plot(t,f,b);
xlabel(Time);
ylabel(Magnitude);
Kernel Methods
Online learning vs batch learning
On-line learning is processing the training data one at a

time as it is received9 .
In real-time applications is a very important issue.
In the above kernel formulation, we process all training
data at once, which is batch learning.
Again, given a training set T = {t1 , ..., tn } and a kernel
function K (, ), the Gram matrix is straightforward.
But is not, and it needs to be learned from the data.
There are a number of different directions, but these can
be grouped in two: eigen-decompositions and convex
optimizations10 .
9
Y. Engel and S. Mannor. The Kernel Recursive Least-Squares Algorithm. IEEE Transactions on Signal
Processing, 52(8):2275285, Aug 2004.
10
Kernel Methods
Online learning vs batch learning
The latter leads to Support Vector Machines11 (SVM).

For regression, eigen-decompositions are used.
The concept of kernel machines comes from learning
machines12 and SVM13 .
11
T. Hastie, R Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and
Prediction. Springer, 2001
12
K. Brown. Diversity in Neural Networks Ensembles. PhD thesis, School of Computer Science, University of
Birmingham, UK, 2004.
13
J. Bengio, o. Delalleau, and N. Le Roux. The curse of dimensionality for local kernel machines. Technical
Report 1258, Dpartement dInformatique et Recherche Oprationnelle, Universit de Montral, May 2005.
Kernel Methods
Time series and regression
A model of the observed data is represented as a time

series x(ti ) = h(ti ) + "(ti ), where ti , i = 1, 2, ..., n are
discrete observation times.
The observation errors "(ti ) can be modelled as zero-mean
Normal distributions N(0, (ti ))
The kernel-based model of the observed data is
N
X
h(ti ) = j K (cj , ti ) (4)
j=1
This function is a linear superposition of N kernels K (, )

centred at cj , and the model has N free parameters j that
need to be determined by (learned from) the data, where
j = 1, 2, ..., N.
With Gaussian kernels, the kernel width ! > 0 determines
the degree of smoothness.
We position kernels on all observations, i.e., N = n.
Kernel Methods
Learning weights by eigen-decompositions
Eq. 4 can be rewritten as
K = x, (5)
where = (1 , 2 , ..., N )T , and

2 3 2 x(t1 )
3
K (c1 , t1 ) K (cN , t1 ) (t1 )
6 .. .. .. 7 6 .. 7
K=4 . . . 5, x = 6
4 .
7.
5 (6)
K (c1 , tn ) K (cN , tn ) x(tn )
(tn )
Hence,
= K+ x. (7)
Kernel Methods
Matrix inverse or Pseudoinverse
=K 1x or = K+ x
We regularise the inversion in (7) through singular value
decomposition (SVD).
K = U W VT , and K+ = V [diag(1/wi )] UT is the pseudo
inverse14 15 (or Moore-Penrose inverse).
SVD has some interesting properties such as W is a
diagonal matrix with positive or zero elements (known as
singular values).
where wi 2 W, U and V are orthogonal so
UT U = VT V = 1, and V is also square and
row-orthonormal; i.e., V V = 1.
14
G.H. Golub and C.F. Van Loan. Matrix Computations. The Johns Hopkins University Press, second edition,
1989
15
W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical Recipes in C++: The Art of
Scientific Computing. Cambridge University Press, 2nd edition, 2002
Kernel Methods
Fitting
Given the observed data, the likelihood16 of our model

reads
Yn
P(Data | Model) = p(x | {j }), (8)
i=1
n o
1 (x(ti ) h(ti ))2
p(x(ti ) | {j }) = 2 2 (t ) exp 2 2 (ti )
i
The negative log-likelihood (without constant terms)

simplifies to
n
X (x(ti ) h(ti ))2
Q= 2 (t )
(9)
i=1 i
This is the goodness of fit between the observed data and

the model.
16
Kernel Methods
Example time series in astronomy - Artificial data DS-5
DS51G0N0.dat with error bars of 0.106% DS51G51N0.dat with error bars of 0.466%
17.8
A A
B B
17.8
17.7
17.7
17.6
17.6
17.5
17.5
mag
mag
17.4 17.4
17.3
17.3
17.2
17.2
17.1
17.1
17
17 16.9
0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50
time time
Kernel Methods
Load time series data in MATLAB
% Time series example

% by JCCT Mar17
% Data: DS-5
clear all;
close all;
d1 = load(ArtificialData/DS-5-1-GAP-0-1-N-0_v2.dat);
figure;
plot(d1(:,1),d1(:,2),.-k);
xlabel(time);
ylabel(mag);
title(DS-5-1-GAP-0-1-N-0);
d2 = load(ArtificialData/DS-5-1-GAP-1-1-N-1_v2.dat);
figure;
plot(d2(:,1),d2(:,2),.-k);
xlabel(time);
ylabel(mag);
title(DS-5-1-GAP-1-1-N-1);
Kernel Methods
Example time series
a) DS-5-1-G-0-1-N-0 b) DS-5-1-G-1-1-N-1
DS51GAP01N0 DS51GAP11N1
17.9 17.9
17.8 17.8
17.7 17.7
17.6 17.6
17.5 17.5
mag
mag
17.4 17.4
17.3 17.3
17.2 17.2
17.1 17.1
17 17
0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50
time time
Kernel Methods
Perform regression in MATLAB
% Perform regression on d2
sigma = 3; % width of Gaussians
t = d2(:,1); %time
n = size(t,1); % number of samples/points
ct = t; % centres of Gaussians at observations

m = size(ct,1); % number of kernels
Gram_matrix = K1(t,n,ct,m,ones(1,m).*sigma);
x = d2(:,2); % Observed data (mag)

alpha = pinv(Gram_matrix)*x; % Learning weights
h = alpha*Gram_matrix; % Kernel-based model
error = mean((h - x).^2); % MSE
figure;
hold on;
plot(d1(:,1),d1(:,2),*-g);
plot(d2(:,1),d2(:,2),.-k);
plot(t,h,.-b);
legend([DS-5-1-GAP-0-1-N-0;DS-5-1-GAP-1-1-N-1;Kernel-based model],0);
xlabel(time);
ylabel(mag);
title([Observed data, kernel-based model MSE = ,num2str(error)]);
box on;
Kernel Methods
Figures regression
Observed data, kernelbased model MSE = 1.6874e005 Observed data, kernelbased model MSE = 1.6874e005
17.9
DS51GAP01N0 17.52 DS51GAP01N0
17.8 Kernelbased model 17.5 Kernelbased model
17.48
17.7
17.46
17.6
17.44
17.5
17.42
mag
mag
17.4 17.4
17.38
17.3
17.36
17.2
17.34
17.1
17.32
17
0 5 10 15 20 25 30 35 40 45 50 14 16 18 20 22 24 26 28 30 32
time time
Kernel Methods
Figures regression
Observed data, kernelbased model MSE = 1.6874e005

17.52 DS51GAP01N0
DS51GAP11N1
17.5 Kernelbased model
17.48
17.46
17.44
17.42
mag
17.4
17.38
17.36
17.34
17.32
14 16 18 20 22 24 26 28 30 32
time
Kernel Methods
Perform reconstruction in MATLAB
% Reconstruction at all points

t1 = d1(:,1); %time
n1 = size(t1,1); % number of samples/points
Gram_matrix1 = K1(t1,n1,ct,m,ones(1,m).*sigma);
h1 = alpha*Gram_matrix1; % Kernel-based model
error1 = mean((h1 - d1(:,2)).^2); % MSE
miss = [];
figure;
hold on;
plot(d1(:,1),d1(:,2),*-g);
plot(d2(:,1),d2(:,2),.-k);
plot(t1,h1,.-b);
legend([DS-5-1-GAP-0-1-N-0;DS-5-1-GAP-1-1-N-1;Kernel-based model],0);
for i=1:n1,
if sum(t1(i)==t)==0,
plot(t1(i),h1(i),ob);
end
end
%plot(t,h,*-k);
legend([DS-5-1-GAP-0-1-N-0;Kernel-based model]);
xlabel(time);
ylabel(mag);
title([Observed data, kernel-based model MSE = ,num2str(error1)]);
box on;
Kernel Methods
Figures reconstruction
Observed data, kernelbased model MSE = 2.2963e005 Observed data, kernelbased model MSE = 2.2963e005
17.9
Kernelbased model 17.5
17.8 Kernelbased model
17.48
17.7
17.46
17.6
17.44
17.5 17.42
mag
mag
17.4
17.4
17.38
17.3
17.36
17.2
17.34
17.1 17.32
17.3
17
0 5 10 15 20 25 30 35 40 45 50 14 16 18 20 22 24 26 28 30 32
time time
Kernel Methods
Figures reconstruction
Observed data, kernelbased model MSE = 2.2963e005
DS51GAP01N0
17.5 DS51GAP11N1
Kernelbased model
17.48
17.46
17.44
17.42
mag
17.4
17.38
17.36
17.34
17.32
17.3
14 16 18 20 22 24 26 28 30 32
time
SVM
Support Vector Machines

cuevas@uaslp.mx
SVM
Kernel methods versus SVM
A SVM classifier is defined as

X
f (x) = sgn( i K (xi , x)) (1)
i2SVs
SVM
Definition
Support Vector Machines (SVM) were introduced by

Cortes and Vapnik1 as support-vector networks.
The SVM term is widely divulged by Cristianini and
Shawe-Taylor2
SVM have been extensively described by research groups
in statistical learning3 and kernel machines.
SVM have been proposed for classification, but they are
also used for regression.
SVM excel over other classification algorithms thanks to
their capacity to perform multivariate classifications in a
non-linear manner.
1
C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273 297,1995
2
Cristianini, N. and Shawe-Taylor, J. (2000). Support Vector Machines and other kernel-based learning
methods. Cambridge University Press.
3
SVM
Linear classifier
The classification problem can be restricted to

consideration of the two-class problem without loss of
generality.
One goal is to separate the two classes by a function
which is induced from available examples.
Another goal is to produce a classifier that will work well on
unseen examples (generalization).
Gunn, S. (1998). Support vector machines for classification and regression. Technical report, University of
Southampton. http://www.isis.ecs.soton.ac.uk/resources/svminfo/
SVM
The margin
Here there are many possible linear classifiers that can

separate the data.
But there is only one that maximizes the margin
(maximizes the distance between it and the nearest data
point of each class).
This linear classifier is termed the optimal separating
hyperplane: hw, x i i + b = 0
SVM
The margin
Consider the problem of separating the set of training

vectors belonging to two separate classes:
D = {(x1 , x2 ), , (x l , y l )}, where x 2 R, y 2 { 1, +1}
The set of vectors is said to be optimally separated by the
hyperplane.
If it is separated without error and the distance between
the closest vector to the hyperplane is maximal.
SVM
The margin
Hence the hyperplane that optimally separates the data is

the one that minimizes: (w) = ||w||2
Pl
(w, b, ) = 12 ||w||2 i i
i=1 i (y [hw, x i + b] 1)
where are the Lagrange multipliers.
The Lagrangian has to be minimized with respect to w, b
and maximized with respect to 0
The solution to the problem
Pl isPgiven by Pl
1 2 l
= arg min 2 ||w||

i=1 j=1 i j yi yj hxi , xj i k =1 k
with the constrains P
i 0, i = 1, , l and lj=1 j yj = 0
SVM
Generalization in high dimensional future space
For linearly non-separable data, the optimization problem

becomes:
l X
X l l
X
1
= arg min ||w||2

i j yi yj K (xi , xj ) k
2
i=1 j=1 k =1
(2)
K (xi , xj ) is the kernel function performing the non-linear
mapping into feature space.
SVM
Non-linear classifier
The
Pl constrains unchanged: i 0, i = 1, , l and
j=1 j yj = 0
The classifier implementing the optimal separating
hyperplane in the feature space is given by
X
f (x) = sgn( i yi K (xi , x) + b) (3)
i2SVs
where
sgn : R ! { 1, 0, 1}
(4)
x ! y = sgn(x)
The Support Vectors (SVs) will have non-zero Lagrange
multipliers, i
SVM
SVM classifier
If the Kernel contains the bias term, the classifier is simply

X
f (x) = sgn( i K (xi , x)) (5)
i2SVs
which is a linear combination of kernels, as kernel

methods,
where the sign function (sgn) gives the class.
SVM
C parameter
The uncertain part of Cortess approach is that the

coefficient C has to be determined.
This parameter introduces additional capacity control
within the classifier.
It can be directly related to a regularization parameter
(Girosi, 1997; Smola and Schlkopf, 1998; Blanz et al.,
1996)
C must be chosen to reflect the knowledge of the noise.
Finally, it controls the trade-off between miss-classification
and the size of the SVM margin.
SVM
XOR with SVM in MATLAB
% XOR classification problem with SVM

X = [0 0; 0 1; 1 0; 1 1]; % INPUT
Y = [1 ; 0 ; 0 ; 1 ]; % OUTPUT
ker=rbf; % RBF as basis functions (Gaussian)

sigma = 0.3; % Kernel width, sigma
C = 5; % Miss-classification/Margin parameter
[nsv, alpha, b0] = svc(X,Y,ker,C,sigma); % Create/train support vector machine

svcplot(X,Y,ker,alpha,b0,sigma); % plot results
X_test = X; % Input: test/predict output
Y_svm = svcoutput(X,Y,X_test,ker,alpha,b0,sigma,1) % Predicted output
error = mean(power(Y-Y_svm,2)) % shows the classification error as MSE
Gunn, S. (1998). Support vector machines for classification and regression. Technical report, University of
Southampton. http://www.isis.ecs.soton.ac.uk/resources/svminfo/
SVM
XOR with SVM in MATLAB
1
x2
0
0 1
x1
XOR problem svcplot(X,Y,ker,alpha,b0,sigma)
Outline Astronomy Computer Vision Speech Recognition Bioinformatics Fault diagnosis
Artificial Intelligence Projects

cuevas@uaslp.mx
Outline
Astronomy and Astrophysics

Computer Vision
Speech Recognition
Bioinformatics
Fault diagnosis
Astronomy and Astrophysics
In collaboration with
Dr. Peter Tino, School of Computer Science, University of
Birmingham, UK
Dr. Ilya Mandel, School of Physics and Astronomy,
University of Birmingham, UK
Sultanah AL Otaibi; Peter Tino; Juan C. Cuevas-Tello; Ilya Mandel; Somak Raychaudhury (2016) Kernel regression
estimates of time delays between gravitationally lensed fluxes, Monthly Notices of the Royal Astronomical Society,
doi: 10.1093/mnras/stw510 ISSN: 0035-8711 http://arxiv.org/abs/1508.03439
Artificial Intelligence Algorithms
Kernel Methods and Support Vector Machines

Artificial Neural Networks for Regression
Genetic Algorithms for Parameter Optimization
Gravitational lensing and time delay problem
Gonzalez-Grimaldo, R.A., Cuevas-Tello, J.C., (2008) Analysis of Time Series with Neural Networks.
Mexican International Conference on Artificial Intelligence, IEEE Computer Society Proceedings,
pp. 131-137
Kernel Method and Fitness landscape non linear optimization problem

PN
hA (ti ) = j=1 j K (cj , ti )
is the underlying" light curve that
P
underpins image A, whereas hB (ti ) = N j=1 j K (cj + , ti ) is a
time-delayed (by ) version of hA (ti ) underpinning image B.
Cuevas-Tello, J.: Estimating time delays between irregularly sampled time series. Ph.D. thesis,
School of Computer Science, University of Birmingham (2007).
Population and Simple GA
A.J. Chipperfield, P.J. Fleming, H. Pohlheim, C.M. Fonseca, Genetic Algorithm Toolbox for use with MATLAB, first
ed., Automatic Control and Systems Engenieering, University of Sheffield, 1996
Cuevas-Tello, J., Tino, P., Raychaudhury, S., Yao, X., Harva, M.: Uncovering delayed patterns in noisy and irregularly
sampled time series: an astronomy application. Pattern Recognition 3(43), 11651179 (2010)
Population and Simple GA
Time Delay Challenge

Computer Vision
Dr. Cesar A. Puente Montejano, UASLP
Dr. J. Ignacio Nunez Varela, UASLP
Artificial Neural Networks

Backpropagation FFNN
Deep Neural Networks (DNN)
Probabilistic Neural Networks (PNN)
Support Vector Machines (SVM)
Statistical Learning
Bayesian Methods
Example: Figure recognition, supervised learning on labeled data
Image: [16x16] 32 hidden neurons, threshold=0.8

Inputs
Hidden neurons Outputs
x16
Fig30 0 1
..
. .. ..
. .
x2
Fig1 1 0
x1
DNNs & MNIST database
DNNs for handwritten digits recognition

10 label units
WL
RBM2 800 hidden units
RBM1 800 hidden units

Train: 60,000 examples
2828 pixels (patterns)
Visible units digit image
Test: 10,000 examples
Tanaka et al. (2014)
Smart Vision Group UASLP

Speech Recognition
Dr. Manuel Valenzuela, ITESM Mty
Dr. Juan Nolazco, ITESM Mty
Probabilistic Neural Networks (PNN)
Deep Neural Networks (DNN)
Gaussian Mixture Models (GMM)
Speech Recognition
Biometric Recognition Systems
Fingerprints, voice and face are specific to an individual.

Contrary to passwords and PIN cannot be forgotten and
easily stolen.
Human speech is the least obtrusive biometric measure.
Simple to acquire thanks to the pervasive way in society.
Speech Processing
Speech
Recogni-
tion
Language
Identifica-
tion
Speech
Recognition
Processing
Speaker
Recogni-
tion
Detection
Verifica- Identi-
Analysis tion fication
Coding or
Synthesis
Speech Processing
Speaker Recognition
Speaker Verfication
Verify a persons claimed identity from his voice.
The identity claim includes entering a employee number,
smart card and others.
Speaker Identification
Deciding if the speaker is either a specific person.
Either belongs to a group of persons or is unknown.
There is no a priori identity claim.
(Campbell 1997, Cieri et al. 2014)
Speech Processing
MFCCs
A time-domain sampled acoustic waveform (audio)

SRE08 Model ID 10058 (tcsns) Channel A
0.5
0.4
0.3
0.2
0.1
amplitude
0.1
0.2
0.3
0.4
0.5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

time (min)
Speech Processing
MFCCs
A time-domain sampled acoustic waveform (audio)

features (MFCCs)
SRE08 Model ID 10058 (tcsns) Channel A
0.5
C1
0.4
0.3
0.2
0.1
...
amplitude
0.1
1
0.2
0.3
...
0.4
0.5
0 0.5 1 1.5 2 2.5

time (min)
3 3.5 4 4.5 5
1
Mel-Frequency spaced Cepstral Coefficients(MFCCs)

Speech Processing
Classification
MFCCs: Target user:

[3000, 48]
P ANN Tc
[3000, 5]
0 1
P=P1 P2 P5
B [3000 48] [3000 48] [3000 48]C
B C
B
Data B .
. C
+ + . + C
B C
@Tc = class = 1 class = 2 class = 2 A
[3000 5] [3000 5] [3000 5]
(1)
Bioinformatics
Dr. Christian A. Garcia Sepulveda, Fac. Medicina, UASLP
MSc. J. Salomon Altamirano, PhD Student, UASLP
MSc. D. Alejandro Glz. Bandala, PhD Student, UASLP
Decision Trees (ID3, J48)

Data mining
Apriori algorithm
Support Vector Machines (SVM)
Artificial Neural Networks
DNA - KIR genes - innate immune system

Bioinformatics: classification
Cuevas Tello, J.C., Hernandez-Ramirez, Daniel, Garcia-Sepulveda, Christian A. (2013) Support Vector Machine
algorithms in the search of KIR gene associations with disease, Computers in Biology and Medicine 43
(2013) 20532062
Bioinformatics: data mining
J. Gilberto Rodriguez-Escobedo, Christian A. Garcia-Sepulveda, and Juan C. Cuevas-Tello (2015) KIR Genes and
Patterns Given by the A Priori Algorithm: Immunity for Haematological Malignancies, Computational and
Mathematical Methods in Medicine, vol. 2015, Article ID 141363, 11 pages, 2015. doi:10.1155/2015/141363
Computational forecasting of infectious disease dynamics

Fault diagnosis
Dr. Ciro A. Nunez Gtz, Electrical Engineering, UASLP
Dr. Nancy Visairo Cruz, Electrical Engineering, UASLP
M.Sc. Eugenio Camargo Trigueros, PhD student, Electrical
Engineering, UASLP
Eng. Juan Jose Acosta E. (2016)
Manuel Alejandro Gomez Vazquez, B.Tech student,
Informatics Engineering
Cristian Garcia Huerta, B.Tech student, Computer
Engineering
General Regression Neural Networks (GRNN)
Fault diagnosis
Questions?
Questions?

Press Uaemex

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Press Uaemex

Uploaded by

Copyright:

Available Formats

Kernel Methods and Support Vector

Dr. Juan Carlos Cuevas-Tello

Facultad de Ingeniera, UASLP

Applications of the Artificial Intelligence:

Dr. Juan Carlos Cuevas-Tello

Facultad de Ingenieria, UASLP

[1] Cognitive Computing, IBM brochure, 2016

[1] Cognitive Computing, IBM brochure, 2016

ACM Tech News

ACM Tech News

Dr. Juan Carlos Cuevas-Tello

Facultad de Ingenieria, UASLP

A kernel1 is a two variable function K (t 0 , t).

K : L L 7! < and t 0 , t 2 L; thus h, i denotes the dot

This transformation allows us to deal with nonlinearity

Types of kernels including polynomial, Gaussian and

Kij = K (tj , ti ), for i, j = 1, ..., n. (2)

Kij is the core ingredient in the theory of kernel methods.

Linear combination of kernels

The set of Gaussian kernels (basis functions)

A set of 6 Gaussians of width =3

Now it is clear that if one wants to fit an arbitrary curve

The set of Gaussian kernels (basis functions)

We fixed centres and weights .

Self-organized selection centers

As with ordinary radial-basis functions, a smaller

(Haykin 1999, pp. 298).

(Haykin 1999, pp. 299).

S. Haykin. Neural Networks: a Comprehensive Foundation. Prentice Hall, 1999.

The set of Gaussian kernels (basis functions)

Kernel function in MATLAB

function [K_c] = K1(x,n,c,m,d)

Set of Gaussian with Kernel function in MATLAB

% Plot a set of Gaussian kernels

sigma = 3; % width of Gaussians

t_min = 0; % min time

ct = [10 15 25 32 38 43]; % centres of Gaussians

Online learning vs batch learning

On-line learning is processing the training data one at a

Online learning vs batch learning

The latter leads to Support Vector Machines11 (SVM).

Time series and regression

A model of the observed data is represented as a time

This function is a linear superposition of N kernels K (, )

Learning weights by eigen-decompositions

Eq. 4 can be rewritten as

where = (1 , 2 , ..., N )T , and

Matrix inverse or Pseudoinverse

Given the observed data, the likelihood16 of our model

The negative log-likelihood (without constant terms)

This is the goodness of fit between the observed data and

Example time series in astronomy - Artificial data DS-5

Load time series data in MATLAB

% Time series example

Example time series

Perform regression in MATLAB

ct = t; % centres of Gaussians at observations

x = d2(:,2); % Observed data (mag)

h = alpha*Gram_matrix; % Kernel-based model

error = mean((h - x).^2); % MSE

Observed data, kernelbased model MSE = 1.6874e005

Perform reconstruction in MATLAB

% Reconstruction at all points

Observed data, kernelbased model MSE = 2.2963e005