You are on page 1of 26

Mixture of Random Prototype-based Local Experts

Nima Hatami Dept. of Elec. & Elec. Eng. Univ. of Cagliari, Italy

Agenda
Classifier ensembles Mixture of Experts (ME) model Hierarchical ME Mixture of Random Prototype-based Experts (MRPE) Hierarchical MRPE Experimental results Conclusions & Future work
PhD Day 2011
N. Hatami, Mixture of Random Prototype-based Local Experts

Classifier ensembles
Also Known as classifier fusion, combining classifiers, MCS Most real-world PR problems are too complicated for a single classifier to solve Divide-and-conquer has proved to be efficient in many of these complex situations Combination of classifiers which have complementary properties

PhD Day 2011

N. Hatami, Mixture of Random Prototype-based Local Experts

Mixture of Experts
Jacobs et al. have proposed the ME based on the divide-and-conquer strategy One of the most popular ensemble methods used in PR and ML A set of expert networks is trained together with a gate network

PhD Day 2011

N. Hatami, Mixture of Random Prototype-based Local Experts

Mixture of Experts
Stochastically partitions the input space of the problem into a number of subspaces Which experts becoming specialized on each subspace Uses the gating network to manage this process

PhD Day 2011

N. Hatami, Mixture of Random Prototype-based Local Experts

Mixture of Experts
ith expert Gating network Final output

PhD Day 2011

N. Hatami, Mixture of Random Prototype-based Local Experts

Mixture of Experts model


Why does ME succeed?
1. Encourages diversity between the single experts by automatically localizing them in different regions of the input space 2. Achieves good combination weights of the ensemble members by training the gate, which computes the dynamic weights together with the experts
PhD Day 2011
N. Hatami, Mixture of Random Prototype-based Local Experts

Hierarchical ME model
The HME is a well-known tree-structured architecture which can be considered a natural extension of the ME. The standard HME model hierarchically splits the input space into a nested set of subspaces

PhD Day 2011

N. Hatami, Mixture of Random Prototype-based Local Experts

Random Prototype-based Data Splitting


Selects some prototype points from the input space Partitions this space according to nearest distance from these prototypes Uses this dist info in the learning process Two different partitioning methods, i.e. disjoint and overlapping

PhD Day 2011

G. Armano, N. Hatami, "Random Prototype-based Oracle for selection fusion ensembles" ICPR 2010. G. Armano and N. Hatami, "Mixture of Random Prototype-based local Experts". HAIS 2010.

Mixture of Random Prototypebased Experts


Earlier works on the ME apply methods such as preprocessing to partition the input space or transform the input space into simpler and more separable spaces A modified version of the ME algorithm partitions the original problem into centralized regions Uses a simple distance-based gating function to specialize the expert networks (training step) Contribution of each expert is according to the distance between the input and a prototype embedded by the expert (testing step)
PhD Day 2011
G. Armano and N. Hatami, "Mixture of Random Prototype-based local Experts". HAIS 2010.

10

Mixture of Random Prototypebased Experts

PhD Day 2011

G. Armano and N. Hatami, "Mixture of Random Prototype-based local Experts". HAIS 2010.

11

ME vs. MRPE on a toy problem

PhD Day 2011

G. Armano and N. Hatami, "Mixture of Random Prototype-based local Experts". HAIS 2010.

12

The ME vs. MRPE


Resulting misclassifications in the standard ME derive from two sources:
1. The gating network is unable to correctly estimate the probability for a given input sample 2. Local experts do not learn their subtask perfectly
PhD Day 2011
G. Armano and N. Hatami, "Mixture of Random Prototype-based local Experts". HAIS 2010.

13

The ME vs. MRPE


Improves three important aspects of the standard ME model
1. Reduces the training time by decreasing the number of parameters to be estimated 2. As simple distance measures used by the gating function are more robust with respect to errors in determining the area of expertise of an expert, errors in the proposed ME model are mainly limited to the error made by the expert networks 3. The area of expertise of each expert is more centralized, which makes the subproblem G. Armano and N. Hatami, "Mixture PhD Day 2011 14 easier to learn of Random Prototype-based local Experts". HAIS 2010.

Hierarchical MRPE
Data splitting based on random prototypes has been used for each ME module

PhD Day 2011

G. Armano, N. Hatami, "Hierarchical Mixture of Random Prototype-based Experts", ECML PKDD 2010.

15

Why does HMRPE work?


Individual Accuracy issue: Splitting the input space into N centralized parts makes the subproblems easier to learn for the expert Diversity issue: Since each ME module embedded in the HMRPE architecture has its own set of prototypes (which are different from those embedded by the other ME modules), experts have been specialized on very different data subsets Combination issue: simple distance rules used by the gating function are more robust with respect to errors in determining the area of expertise
PhD Day 2011
G. Armano, N. Hatami, "Hierarchical Mixture of Random Prototype-based Experts", ECML PKDD 2010.

16

Run-time Performance Analysis of the ME


The accuracy of classier systems is usually the main concern. however, in real applications, their run-time performance may play an important role as well Many well-performing classiers cannot be used in real applications due to the amount of computational resources Although the ME model has been deeply investigated, no remarkable work has been made so far on its computational complexity In general, run-time performance depends on the type of classier used, its parameters, and the characteristics of"Run-time Performance Analysis of the Mixture ofsolved CORES 2011. 17 G. Armano and N. Hatami, the problem to be Experts Model". PhD Day 2011

Run-time Performance of the ME


The RTP of an ME classier can be decomposed as three main components: 1) Expert networks, 2) Gating network, and 3) Aggregation (also called combination)

PhD Day 2011

G. Armano and N. Hatami, "Run-time Performance Analysis of the Mixture of Experts Model". CORES 2011.

18

Run-time Performance of the ME


Expert complexity Gating complexity Aggregation complexity

PhD Day 2011

G. Armano and N. Hatami, "Run-time Performance Analysis of the Mixture of Experts Model". CORES 2011.

19

Run-time Performance of the MRPE


The experts and the aggregation rule in the MRPE model do not change with respect to the standard ME We only need to reformulate the complexity of the gating network

TgME/TgMRPE=4.65!!!
Decreasing complexity too!!!

PhD Day 2011

G. Armano and N. Hatami, "Run-time Performance Analysis of the Mixture of Experts Model". CORES 2011.

20

Experimental results
We used some of the UCI ML datasets 10-fold cross-validation Multi-layer perceptron (MLP) For N, number of partitions (experts), we varied it from 2 to 10

PhD Day 2011

G. Armano and N. Hatami, "Mixture of Random Prototype-based local Experts". HAIS 2010.

21

Experimental results

PhD Day 2011

G. Armano and N. Hatami, "Mixture of Random Prototype-based local Experts". HAIS 2010.

22

Run-time comparison
Training time Error rate Complexity (run-time)

PhD Day 2011

G. Armano and N. Hatami, "Run-time Performance Analysis of the Mixture of Experts Model". CORES 2011.

23

Conclusions
A modified version of the popular ME algorithm is presented Specializes expert networks on centralized regions of input space instead of nested and stochastic regions Using simple distance-based gating thus reduces the network complexity and the training time Improves overall classification accuracy and Complexity
PhD Day 2011

Extending the proposed method to HME


N. Hatami, Mixture of Random Prototype-based Local Experts

24

Future work
Defining a procedure for automatically determining the number of optimal experts for each problem without resorting to complex preprocessing Adaptation of this method to simple distance-based classifiers instead of NNs Heuristics able to help in the process of partitioning the input space instead of using RP Using the error rate and the complexity for automatically estimating the optimal Ne for a given problem

PhD Day 2011

N. Hatami, Mixture of Random Prototype-based Local Experts

25

ho fameeeEEEeee!!! andiamo a mangare???!!

You might also like