Mixture of Random Prototype-Based Local Experts: Nima Hatami Dept. of Elec. & Elec. Eng. Univ. of Cagliari, Italy

Mixture of Random Prototype-based Local Experts
Nima Hatami Dept. of Elec. & Elec. Eng. Univ. of Cagliari, Italy
Agenda
Classifier ensembles Mixture of Experts (ME) model Hierarchical ME Mixture of Random Prototype-based Experts (MRPE) Hierarchical MRPE Experimental results Conclusions & Future work
PhD Day 2011
N. Hatami, Mixture of Random Prototype-based Local Experts
Classifier ensembles
Also Known as classifier fusion, combining classifiers, MCS Most real-world PR problems are too complicated for a single classifier to solve Divide-and-conquer has proved to be efficient in many of these complex situations Combination of classifiers which have complementary properties
PhD Day 2011
Mixture of Experts
Jacobs et al. have proposed the ME based on the divide-and-conquer strategy One of the most popular ensemble methods used in PR and ML A set of expert networks is trained together with a gate network
PhD Day 2011
Mixture of Experts
Stochastically partitions the input space of the problem into a number of subspaces Which experts becoming specialized on each subspace Uses the gating network to manage this process
PhD Day 2011
Mixture of Experts
ith expert Gating network Final output
PhD Day 2011
Mixture of Experts model

Why does ME succeed?
1. Encourages diversity between the single experts by automatically localizing them in different regions of the input space 2. Achieves good combination weights of the ensemble members by training the gate, which computes the dynamic weights together with the experts
PhD Day 2011
Hierarchical ME model
The HME is a well-known tree-structured architecture which can be considered a natural extension of the ME. The standard HME model hierarchically splits the input space into a nested set of subspaces
PhD Day 2011
Random Prototype-based Data Splitting

Selects some prototype points from the input space Partitions this space according to nearest distance from these prototypes Uses this dist info in the learning process Two different partitioning methods, i.e. disjoint and overlapping
PhD Day 2011
G. Armano, N. Hatami, "Random Prototype-based Oracle for selection fusion ensembles" ICPR 2010. G. Armano and N. Hatami, "Mixture of Random Prototype-based local Experts". HAIS 2010.
Mixture of Random Prototypebased Experts

Earlier works on the ME apply methods such as preprocessing to partition the input space or transform the input space into simpler and more separable spaces A modified version of the ME algorithm partitions the original problem into centralized regions Uses a simple distance-based gating function to specialize the expert networks (training step) Contribution of each expert is according to the distance between the input and a prototype embedded by the expert (testing step)
PhD Day 2011
G. Armano and N. Hatami, "Mixture of Random Prototype-based local Experts". HAIS 2010.
10
Mixture of Random Prototypebased Experts
PhD Day 2011
11
ME vs. MRPE on a toy problem
PhD Day 2011
12
The ME vs. MRPE

Resulting misclassifications in the standard ME derive from two sources:
1. The gating network is unable to correctly estimate the probability for a given input sample 2. Local experts do not learn their subtask perfectly
PhD Day 2011
13
The ME vs. MRPE

Improves three important aspects of the standard ME model
1. Reduces the training time by decreasing the number of parameters to be estimated 2. As simple distance measures used by the gating function are more robust with respect to errors in determining the area of expertise of an expert, errors in the proposed ME model are mainly limited to the error made by the expert networks 3. The area of expertise of each expert is more centralized, which makes the subproblem G. Armano and N. Hatami, "Mixture PhD Day 2011 14 easier to learn of Random Prototype-based local Experts". HAIS 2010.
Hierarchical MRPE
Data splitting based on random prototypes has been used for each ME module
PhD Day 2011
G. Armano, N. Hatami, "Hierarchical Mixture of Random Prototype-based Experts", ECML PKDD 2010.
15
Why does HMRPE work?

Individual Accuracy issue: Splitting the input space into N centralized parts makes the subproblems easier to learn for the expert Diversity issue: Since each ME module embedded in the HMRPE architecture has its own set of prototypes (which are different from those embedded by the other ME modules), experts have been specialized on very different data subsets Combination issue: simple distance rules used by the gating function are more robust with respect to errors in determining the area of expertise
PhD Day 2011
G. Armano, N. Hatami, "Hierarchical Mixture of Random Prototype-based Experts", ECML PKDD 2010.
16
Run-time Performance Analysis of the ME

The accuracy of classier systems is usually the main concern. however, in real applications, their run-time performance may play an important role as well Many well-performing classiers cannot be used in real applications due to the amount of computational resources Although the ME model has been deeply investigated, no remarkable work has been made so far on its computational complexity In general, run-time performance depends on the type of classier used, its parameters, and the characteristics of"Run-time Performance Analysis of the Mixture ofsolved CORES 2011. 17 G. Armano and N. Hatami, the problem to be Experts Model". PhD Day 2011
Run-time Performance of the ME

The RTP of an ME classier can be decomposed as three main components: 1) Expert networks, 2) Gating network, and 3) Aggregation (also called combination)
PhD Day 2011
G. Armano and N. Hatami, "Run-time Performance Analysis of the Mixture of Experts Model". CORES 2011.
18
Run-time Performance of the ME

Expert complexity Gating complexity Aggregation complexity
PhD Day 2011
19
Run-time Performance of the MRPE

The experts and the aggregation rule in the MRPE model do not change with respect to the standard ME We only need to reformulate the complexity of the gating network
TgME/TgMRPE=4.65!!!
Decreasing complexity too!!!
PhD Day 2011
20
Experimental results
We used some of the UCI ML datasets 10-fold cross-validation Multi-layer perceptron (MLP) For N, number of partitions (experts), we varied it from 2 to 10
PhD Day 2011
21
Experimental results
PhD Day 2011
22
Run-time comparison
Training time Error rate Complexity (run-time)
PhD Day 2011
23
Conclusions
A modified version of the popular ME algorithm is presented Specializes expert networks on centralized regions of input space instead of nested and stochastic regions Using simple distance-based gating thus reduces the network complexity and the training time Improves overall classification accuracy and Complexity
PhD Day 2011
Extending the proposed method to HME

24
Future work
Defining a procedure for automatically determining the number of optimal experts for each problem without resorting to complex preprocessing Adaptation of this method to simple distance-based classifiers instead of NNs Heuristics able to help in the process of partitioning the input space instead of using RP Using the error rate and the complexity for automatically estimating the optimal Ne for a given problem
PhD Day 2011
25
ho fameeeEEEeee!!! andiamo a mangare???!!

Mixture of Random Prototype-Based Local Experts: Nima Hatami Dept. of Elec. & Elec. Eng. Univ. of Cagliari, Italy

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mixture of Random Prototype-Based Local Experts: Nima Hatami Dept. of Elec. & Elec. Eng. Univ. of Cagliari, Italy

Uploaded by

Copyright:

Available Formats

Mixture of Random Prototype-based Local Experts

PhD Day 2011

N. Hatami, Mixture of Random Prototype-based Local Experts

PhD Day 2011

N. Hatami, Mixture of Random Prototype-based Local Experts

PhD Day 2011

N. Hatami, Mixture of Random Prototype-based Local Experts

PhD Day 2011

N. Hatami, Mixture of Random Prototype-based Local Experts

Mixture of Experts model

PhD Day 2011

N. Hatami, Mixture of Random Prototype-based Local Experts

Random Prototype-based Data Splitting

PhD Day 2011

Mixture of Random Prototypebased Experts

Mixture of Random Prototypebased Experts

PhD Day 2011

ME vs. MRPE on a toy problem

PhD Day 2011

The ME vs. MRPE

The ME vs. MRPE

PhD Day 2011

Why does HMRPE work?

Run-time Performance Analysis of the ME

Run-time Performance of the ME

PhD Day 2011

Run-time Performance of the ME

PhD Day 2011

Run-time Performance of the MRPE

PhD Day 2011

PhD Day 2011

PhD Day 2011

PhD Day 2011

Extending the proposed method to HME

PhD Day 2011

N. Hatami, Mixture of Random Prototype-based Local Experts

ho fameeeEEEeee!!! andiamo a mangare???!!

You might also like