You are on page 1of 6

Classication of Molecular Structures Made Easy

Edmondo Trentin and Ernesto Di Iorio


Abstract Several problems in bioinformatics and cheminformatics concern the classication of molecules. Relevant instances are automatic cancer detection/classication, machinelearning pathologic prediction, automatic predictive toxicology, etc. Molecules may be represented in terms of graphical structures in a natural way: each node in the graph can be used to represent an atom, whilst the edges of the graph represent the atom-atom bonds. Labels (in the form of real-valued vectors) are associated with nodes and edges in order to express physical and chemical properties of the corresponding atoms and bonds, respectively. These structured data are expected to contain more information than a traditional (at) feature vector, information that may strengthen the classication capabilities of a machine learner. This paper investigates the application of a novel Bayesian/connectionist classier to this graphical pattern recognition task. The approach is much simpler than stateof-the-art machine learning paradigms for graphical/relational learning. It relies on the idea of describing the graph in terms of a binary relation. The posterior probability of a class given the relation is estimated as a function of probabilistic quantities modeled with a neural network, trained over individual vertex pairs in the graph. The popular and challenging Mutagenesis dataset is considered for the experimental evaluation. Despite its simplicity, the technique turns out to yield the highest recognition accuracies to date on the complete (friendly + unfriendly) dataset, outperforming complex machines (relational and graph neural nets, kernels for graphs, inductive logic programming techniques, etc.). Some preliminary chemical/biological implications are eventually hypothesized in the light of the results obtained.

I. I NTRODUCTION Several problems in bioinformatics and cheminformatics concern the classication of molecules. Relevant examples are automatic cancer detection/classication, machinelearning pathologic prediction, automatic predictive toxicology, etc. A signicant instance of the latter type is represented by the Mutagenesis dataset [1]. Brought to the attention of the scientic community in 1991, it is a relatively small, publicly available dataset. It contains 230 aromatic and hetero-aromatic nitro compounds (molecules) which can be commonly found as intermediates in the synthesis of several industrial chemical compounds, as well as in automobile/diesel exhausted fumes. Ames test [2] showed that some of the compounds have a high mutagenic tendency on Salmonella Typhimurium TA98, whilst others have not. Prediction of mutagenicity is part of the predictive toxicology process that improves the quality (and reduces the cost, as well as the time) of drug development [1], The identication of mutagenic molecules is also relevant to other major tasks, such as precocious diagnosis of cancer.
Edmondo Trentin and Ernesto Di Iorio are with the Dipartimento di Ingegneria dellInformazione, Universit` di Siena, Italy (email: {trentin, a diiorio}@dii.unisi.it).

From the point of view of computer science, the task can be formulated in terms of a binary structured pattern recognition problem. Molecules are represented in the form of graphs which describe the corresponding molecular structures. Molecular structures do t the graphical representation quite in a natural way, since atoms may be thought of as the nodes of a graph, the edges of which are the chemical bonds. Atoms belong to a given, well-dened and nite universe (the atom type). Labels attached to atoms may contain features that describe some physical/chemical properties of the atom, while labels for edges may contain information on the bond properties (single or multiple bond, spatial arrangement of the bond, etc.). These structured data are expected to contain more information than a traditional (at, and xed-dimensionality) feature vector, information that may strengthen the classication capabilities of a machine learner. In its original formulation, the dataset included the level of mutagenicity measured for each molecule (i.e., a real number), and a proper regression model was sought. The dataset was partitioned into two subsets, according to the difculty of the regression task, namely the regression friendly part and the regression unfriendly part. This partition was introduced in [1], which observed that the mutagenicity of 188 (out of 230) of the compounds could be t by a linear regression model in a suitable manner. In point of fact, in the literature the dataset is mostly used as a benchmark for the classication of structures (instead of regression), especially for Inductive Logic Programming (ILP) approaches [3], [4]. In this perspective, a molecule is termed mutagenic if its level of mutagenicity has a positive logarithm, otherwise it is classied as non mutagenic. In most cases, only the regression friendly (easier) part of the data was used, often (improperly) referred to as the Mutagenesis task. Established results on the regression friendly subset show a generally high recognition accuracy (up to 95%). For this reason, the question Is mutagenesis. still challenging? was raised [4]. On the contrary, experimental results on the overall 230-compounds turned out to be much less promising (after 16 years of intensive research, stateof-the-art is 90.5% [5]), stressing the fact that the complete Mutagenesis is still challenging. Reviews of results obtained using a variety of approaches to the overall set of molecules can be found in [6], [7], while [4] surveys the vast literature on the regression friendly part only. This paper investigates the application of a Bayesian/connectionist classier to the graphical pattern recognition task. The method was introduced in [8], where it is applied to image classication tasks from the Caltech benchmark dataset. Graphical representations of the

3241 978-1-4244-1821-3/08/$25.00 c 2008 IEEE

images are used in [8] (namely, region adjacency graphs and multi-resolution trees), and the technique is shown to outperform state-of-the-art connectionist frameworks for the classication of graphical data, i.e. recursive neural nets (RNN) [9] and graph neural nets (GNN) [10]. Major problems that arise in the application of neural networks for graphs (e.g. graph matching problem, innite recursion over cyclic structures, long-term dependencies) are overcome [8]. Moreover, the approach is much simpler and computationally efcient - than state-of-the-art machine learning paradigms for graphical/relational learning (ILP, probabilistic relational learning [11], RNNs and GNNs). It does not require any ad hoc software implementations, since it can be realized via standard neural network simulators. It is suitable for broad families of nite, labeled graphs, either cyclic or acyclic, connected or unconnected, directed or undirected. The technique is reviewed (and improved, through the distinction between graphs space and edges space probabilistic quantities) in Section II. It relies on the idea of looking at the graph as a whole (instead of attempting a visit of the nodes in a certain order), namely as a binary relation (set of vertex pairs). The class-posterior probability of the graph is then computed from a joint class-conditional probability dened over the edges of the graph, which is factorized into edge-specic probabilistic quantities that are estimated via a Multilayer Perceptron (MLP). Although the resulting machine is not theoretically universal, its points of strength are its simplicity, its computational efciency [8], and the impressive experimental results it yields on several tasks. Experiments on the Mutagenesis dataset (friendly, and friendly + unfriendly) are reported in Section III, where the highest recognition accuracies to date are shown. This is intended as an empirical evidence of the fact that the algorithm is suitable for molecular structure classication tasks. In consequence, it could be taken into serious considerations by practitioners in bioinformatics and cheminformatics problems which involve the classication of molecules. Finally, Section IV draws the conclusions, and some preliminary chemical/biological implications are eventually hypothesized in the light of the results obtained. II. R EVIEW OF THE TECHNIQUE Each molecular structure is represented as a graph G which, in turn, is a pair G = (V, E) where V is an arbitrary set of nodes (or, vertices) over a given universe U (e.g., the set of atoms involved in the problem), and E V V is the set of edges (e.g., the atom-atom chemical bonds). In principle, the formalism covers directed as well as undirected, connected and unconnected nite graphs (G is undirected iff (a, b) E (b, a) E), either cyclic or acyclic. From an algebraic point of view, the graph is a binary relation over U , namely {xj = (aj , bj ) | aj U, bj U, j = 1, . . . , n} for a proper cardinality n. All the binary relations (graphs) involved in the learning problem at hand (both in training and test) are assumed to be dened over the same domain (U U ). From now on, we rely on the assumption that

the universe U is a (Lebesgue-) measurable space, in order to ensure that probability measures can in fact be dened. The measurability of nite graphs dened over measurable domains (and with measurable labels) like countable sets or real vectors is shown in [12]. Labels may be attached to vertices or edges, assuming they are dened over a measurable space. For the vertices, we consider a labeling L in the form of d-dimensional vectors associated with nodes, namely L(G) = { (v) | v V, (v) Rd }. Labels are accounted for by modifying the denition of xj = (aj , bj ) E slightly, taking xj = (aj , (aj ), bj , (bj )). As regards the edge labels, for each (aj , bj ) E a label is allowed in the form e (aj , bj ) Rde , where de is the dimensionality of the continuous label domain. Then, xj is extended as follows: xj = (aj , bj , e (aj , bj )) (if the graph has edge labels, but no node labels), or xj = (aj , (aj ), bj , (bj )), e (aj , bj )) (if the graph has both). For instance, labels may contain real-valued features that describe certain chemical, or physical, measurements characterizing atoms or bonds. The present framework requires that the nodes in the graph are individual elements of a well-dened universe. It does not explicitly cover scenarios in which the nodes act only as placeholders in the specic graphical representation of the data. If this is the case, and the actual input features are completely encapsulated within label vectors (e.g., by means of a coding vector that represents the atom type), the previous denitions may be replaced by xj = ( (aj ), (bj )) for each pair (aj , bj ) E. This turns out to be effective in practical applications (see Section III), but it is mathematically justied only iff each label identies the corresponding node in an univocal manner. Let 1 , . . . , c be a set of classes or states of nature. For instance, in the mutagenicity prediction case, two classes are considered (mutagenic, non-mutagenic). We assume that each graph belongs to one of the c classes. The posterior probability of i-th class given the graph is the class-posterior given the corresponding binary relation, namely P (i | {x1 , . . . , xn }), where each xj is interpreted as a random vector whose characteristics and dimensionality depend on the nature of the universe U . The assumption of dealing with measurable universes allows the adoption of probabilistic measures, and applying Bayes theorem [13] we can write: P (i | {x1 , . . . , xn }) = p({x1 , . . . , xn } | i )P (i ) p({x1 , . . . , xn }) (1)

where P (.) denotes a probability measure, and p(.) denotes a probability density function (pdf) which reduces to a probability if its support is discrete. P (i ) is the prior probability of i-th class evaluated in the graphs space, i.e. the prior probability that a molecule belongs to class i . The quantity p({x1 , . . . , xn } | i ) is a joint pdf that expresses the probabilistic distribution of the overall binary relation {x1 , . . . , xn } over its domain according to the law p(.). We assume that the pairs xj , j = 1, . . . , n (including the corresponding labels) are independently and identically distributed (iid) according to the class-conditional density

3242

2008 International Joint Conference on Neural Networks (IJCNN 2008)

p(xj | i ). In order to understand the meaning of p(xj | i ), it may be helpful to underline that it implicitly expresses three different, yet joint probabilistic quantities, all of them conditioned on i : (i) the likelihood of observing any given pair of nodes, or edge (chemical bond), (ii) the probability distribution of node labels (characteristic properties of the atoms), and (iii) the pdf of edge labels (characteristic properties of the chemical bonds). In so doing, the probability of having a bond between two atoms is modeled jointly with the statistical properties of the atoms and of the bond. In [8] it is shown that the iid assumption does not affect signicantly the classication performance of the model. The assumption is in line with classical and state-of-theart literature on statistical pattern recognition [13], hidden Markov models (where the feature vectors that form the input observation sequence are assumed to be independent of each other) [14], probabilistic classication of structures in ILP [15], random graphs [16] and scale-free networks [17], [18]. Similarities and differences w.r.t. the Naive Bayes classier are also pointed out in [8], where it is also seen that the iid assumption does not imply any loss in terms of structural information. The structure is encapsulated within the binary relation, which does not depend on the probabilistic properties of the quantities involved in Equation (1). Under the iid assumption and applying again Bayes theorem we can write:
n

problem of possible innite recursion over cyclic structures [8]. In order to apply Equation (3), we need to estimate P (i ), PE (i ) and P (i | xj ) for i = 1, . . . , c and j = 1, . . . , n. If good estimates of these quantities are obtained, the maximum-a-posteriori decision rule expressed by Equation (3) is expected to yield the minimum probability of classication error [13]. The quantities P (i ) and PE (i ) can be estimated from the relative frequencies of classes over the training sample (in the graphs space and in the edges space, respectively), as usual. A MLP with c output units (one for each of the different classes) is then used to estimate P (i | xj ). Equivalently, a single output may be adopted for 2-class problems, as usual. The MLP is known to be a universal non-parametric probability model [19], [20] and it may optimally approximate the Bayesian posteriorprobability, once it is trained via Backpropagation (BP) on a supervised training set featuring class labels (i.e., 0/1 targets) [21], [22], [20]. Once training is completed, the classication of a novel (test) molecule is accomplished as follows. The MLP outputs are computed over all the edges x1 , . . . , xl of the corresponding graph (of course, the present formalism applies to graphs that possibly have different size and structure) and, in turn, are substituted in the right-hand side of Equation (3) which, eventually, yields P (i | G). III. E XPERIMENTS This Section reports the results obtained on the Mutagenesis dataset. GNNs recently established the state-of-the-art performance over the whole (friendly + unfriendly) dataset [5]. For this reason, the same graphical representation of compounds and the same features (labels) as in [5] were used. Each atom is featured by its atom type (according to a onehot coding) and its charge. In addition, four global features (i.e., dened at the whole molecule level) are enclosed within the atom labels. Two of them are pre-coded structural attributes [5]. Other two attributes are the result of chemical measurements, namely the energy of the lowest unoccupied molecule orbital, and the water/octanol partition coefcient. Undirected edges between pairs of nodes are introduced in the graph to represent each individual atom-atom bond. The graph obtained this way is, in general, cyclic (i.e., RNNs cannot deal with this structured representation). First, we evaluated the behavior of the algorithm on the regression-friendly part of the dataset. This provides us with a rst benchmark, and allows us for a massive comparison with different machine learning approaches to the task of molecule classication. Results are reported in Table I. They are expressed in terms of average recognition rate (%). It is seen that the proposed technique yields recognition accuracies (95.22 1.87% over a 10-fold crossvalidation setup, using 19 test molecules for the rst 9 folds and the remaining 17 molecules for the 10-th fold) that are in line with the best performance obtained to date. A 2-layer MLP architecture was used, featuring an hidden layer with 15 sigmoid units, and a single sigmoid output unit. The number

p({x1 , . . . , xn } | i )

=
j=1 n

p(xj | i ) P (i | xj )p(xj ) PE (i ) j=1 (2)

where PE (i ) is the prior probability of i-th class evaluated in the edges space, i.e. the prior probability that an individual edge is in a graph which belongs to class i . Substituting Equation (2) into Equation (1) we obtain P (i | xj )p(xj ) PE (i ) j=1
n

P (i | {x1 , . . . , xn })

P (i ) p({x1 , . . . , xn }) n P ( | x ) i j = P (i ) (3) PE (i ) j=1 because p({x1 , . . . , xn }) = j=1 p(xj ), where p(xj ) = c k=1 PE (k )p(xj | k ). Since the pairs xj are extracted from a well-dened universe and the joint probabilities (e.g. p({x1 , . . . , xn })) are invariant w.r.t. arbitrary permutations of their arguments, there is no graph matching problem in the present framework. Representing the molecule as a relation implies looking at the structure as a whole. This is a major difference w.r.t. other techniques that require a visit of the graph in a specic order, and that are faced with the
n

2008 International Joint Conference on Neural Networks (IJCNN 2008)

3243

Method RS MFLOG Present approach GNN RSD Relational neural network 1nn(dm) Neural networks boosted-FOIL P-Progol RELAGGS SINUS RDBC FOIL

Reference Lodhi and Muggleton, 2005 [4] Kramer and De Raedt, 2001 [26] This paper Uwents et al., 2006 [5] Krogel et al., 2003 [24] Uwents et al., 2006 [5] Ramon, 2002 [6] Srinivasan et al., 1994 [23] Quinlan, 1996 [25] Srinivasan et al., 1994 [23] Krogel et al., 2003 [24] Krogel et al., 2003 [24] Kirsten, 2002 [28] Quinlan and Cameron-Jones, 1993 [27]
TABLE I

Accuracy 95.8 3.3 95.7 95.22 1.87 94.3 0.6 92.6 91.0 1.8 91 2.0 89.0 2.0 88.3 88 2.0 88 84.5 84 76

R ECOGNITION ACCURACIES (%) ON THE REGRESSION - FRIENDLY PART OF THE M UTAGENESIS DATASET [1].

of training epochs ranged from 40 to 100, on a fold-by-fold basis (it was determined via crossvalidation on the specic training sample). Furthermore, the majority of much more complex machines is outperformed. Only two approaches from the literature exhibit a slightly better behavior: 1) the random seed (RS) method [4], i.e. an ILP-based ensemble method which obtains a randomized (by different seed examples) set of theories from a standard ILP learner. Its recognition accuracy is 95.8 3.3%, relying on a knowledge base which includes also additional features (functional groups, indicators and other chemical features that are not specied in [4]). Also, the cardinality of the test sets in the 10-fold procedure is not reported in [4]. In interpreting the results, note that with such an amount of data (e.g., 19 test examples within a certain fold), the misclassication of a single molecule lowers the fold-specic recognition accuracy by as much as 5.26%. Moreover, despite the higher average accuracy, the RS method turns out to be less stable - as can be seen from its standard deviation; 2) the MFLOG method [26], where a logistic regression approach is applied along with a class-sensitive feature construction technique relying on the integration of a version-space built in response of a user-dened query (constrained on the frequency of individual features). The approach yields a 95.7% accuracy (the standard deviation not being reported in [26]). The connectionist models for graphical data (GNNs and relational neural networks [5]) score signicantly lower than the present technique, as well as standard neural networks [23] do. The relational neural network is an instance of the probabilistic relational learning framework by [11], whose results on the Mutagenesis dataset are presented in [5]. Application of this technique required the description of the compounds by means of a table of a relational database,

linked to another table which encapsulates the atom-bond structure. Two separate, recurrent ANNs were trained over the two tables. The remaining rows in the Table quote major results from the literature. RSD, RELAGGS, and SINUS are propositionalisation frameworks whose evaluation on the task is presented in [24]. The 1nn(dm) approach [6] relies on instance based learning and clustering techniques for ILP. P-Progol [23] is another ILP approach. Boosted-FOIL [25] is a boosting rst-order learning framework, which extends the bare FOIL [27]. Finally, RDBC stands for Relational Distance-Based Clustering [28]. Let us now focus on the central experimental topic of the paper, namely the overall (regression friendly + unfriendly) Mutagenesis dataset. Table II reports the results obtained, compared with formerly published state-of-the-art approaches to the problem. Again, a 10-fold cross-validation was applied, using the same 10-fold partitioning of the dataset that was used in [5]. Albeit more sensitive to the data partitioning on a fold-by-fold basis, the proposed approach (relying on a 2-layer 15-hidden units MLP) achieves the highest predictive accuracy to date. It yields a relative 35.89% average error rate reduction w.r.t. GNNs which, in turn, had recently established the best results on the task [5]. As regards the computational burden (100 training epochs, i.e. less than 2 minutes worst-case training time on a PC architecture with 1.0GHz processor and 256 MB RAM; to be compared with - for instance - GNNs, that required more than 6 hours on an Apple G5 (TM) biprocessor architecture with 4 GB RAM running the software implementation used in [10]), similar conclusions to those drawn in [8] still hold. The following rows of Table II offer a direct comparison with different approaches to the relational learning problem (some of them, as the 1nn(dm), relational neural networks, and the RDBC, were introduced in Table I for the regression-

3244

2008 International Joint Conference on Neural Networks (IJCNN 2008)

Method Present approach GNN 1nn(dm) SVM Graph Kernel Relational neural networks RDBC TILDE

Reference This paper Uwents et al., 2006 [5] Ramon, 2002 [6] Gonsalves, 2005 [32] Uwents et al., 2006 [5] Kirsten, 2002 [28] De Raedt and Blockeel, 1997 [31]

Accuracy 93.91 3.13 90.5 0.7 88 2.0 87.16 86.5 2.1 83 82

TABLE II R ECOGNITION ACCURACIES (%) ON THE M UTAGENESIS ( FRIENDLY + UNFRIENDLY ) DATASET [1].

friendly part). A former ILP approach is TILDE (Top-down Induction of First-order Logical Decision Trees) [31]. A comparison with the kernels for graphs is shown in the third row of Table II. The result is the best one established with kernel machines on this task, as reported in [32] (where different kernels for graphs are compared). It was obtained using the technique proposed in [33]. Note that recursive neural networks (RNN) could not be applied to the present task, due to the intrinsic presence of cycles within the graphical representation of the molecular structure of individual compounds. IV. C ONCLUSIONS The paper investigated the application of a connectionist/Bayesian maximum-a-posteriori approach to the task of molecular structure classication from the Mutagenesis dataset. Comparison with state-of-the-art results from a longlasting literature shows that (i) the complete dataset (regression friendly + unfriendly) is still challenging, and that (ii) the proposed approach yields the highest recognition accuracies to date, outperforming the other (complex, and computationally heavy) relational learning techniques. This is particularly interesting given the formal and computational simplicity of the algorithm, that can be realized in a straightforward manner via any standard neural network simulators. We argue that the Mutagenesis task is emblematic of a variety of problems involving the classication of molecular structures. For these reasons, the approach turns out to be promising, deserving attention from practitioners in the eld. Major points of strength of the model, namely its simplicity and its efciency, are a consequence of an iid assumption that is made on the joint probabilistic distributions of the edges within the graphical representation of class-specic data. Of course, from a theoretical point of view, this implies a loss in terms of universal capabilities of the approach (whilst other connectionist techniques, i.e. GNNs and RNNs for directed acyclic graphs, are known to allow for theoretically universal mappings dened over the graph space [10], [8]). As it is observed in [8] in the realm of image recognition, the mathematical restriction does not necessarily imply losses in classication accuracies. In this respect, the model reminds us of the classic Naive Bayes model, whose well-behaved performance (and the rationale behind it) has

been widely investigated, and understood, in the literature (refer, for instance, to [34]). The emerging idea is that a class-specic probability law exists, capable of explaining the distribution of the edges in graphs belonging to a certain class, and that the spontaneous - yet independent - organization of the edges within the structure leads to the outgrowth of structural phenomena that are characteristic of the class under consideration. From a certain point of view, this is analogous to the studies on scale-free networks (e.g., the Wold Wide Web, or the so-called social networks) [17], [18], where a common (and xed) probability distribution (e.g., a uniform pdf, or a Power-Law distribution) rules the process of formation of iid links between entities in the model. Such process leads, eventually, to the emergence of connotative topological regularities, such as hubs. Instead of assuming a unique, pre-dened aggregation law, the present framework extends the concept by allowing for class-dependent laws that can be learnt from examples. Furthermore, instead of separating the topological properties from those of the labels (i.e, the content) of the nodes, the present probabilistic quantities do take both aspects, jointly, into consideration. Elaborating these ideas, the following question could be raised: has the (observed) behavior of the model any chemical implications? In the light of the experimental evidence reported above, an hypothesis (which has the avor of a machine-driven scientic discovery) might, reasonably, take form. The proposed model basically solves the classication problem on the friendly part of the Mutagenesis dataset. It also yields very high recognition accuracies on the overall dataset. It is quite natural to conjecture that the classication errors could be reduced even further if (i) more data were available (in order to guarantee a statistically representative coverage of the molecule distributions); and if (ii) the right MLP architecture were chosen, and if it converged to the global minimum of the criterion function (this is not feasible, in the practice). Under these circumstances, we put forward the preliminary hypothesis that - in order to comply with the empirical evidence - the chemical bonds in different molecules of a certain class have a strong tendency to be iid according to a class-specic pdf (which takes into account the properties of the atoms involved in the bond). The hypothesis turns out to be corroborated (albeit far from being

2008 International Joint Conference on Neural Networks (IJCNN 2008)

3245

proven univocally) by the experimental results, at least as concerns mutagenicity on Salmonella Typhimurium TA98 for the family of molecules covered by the Mutagenesis dataset [1]. R EFERENCES
[1] A. Debnath, R. Lopez de Compandre, G. Debnath, A. Schusterman and C. Hansch, Structure-activity relationship of mutagenic aromatic Neural networks for mining and learning with graphs and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity, Journal of Medicinal Chemistry, vol. 34, no. 2, pp. 786-797, 1991. [2] B. N. Ames, F. D. Lee and W. E. Durston, An Improved Bacterial Test System for the Detection and Classication of Mutagens and Carcinogens, Proc. of the National Academy of Sciences USA, vol. 70, pp. 782-786, 1973. [3] N. Lavrac and S. Dzeroski, Inductive Logic Programming: Techniques and Applications, Ellis Horwood, New York, 1994. [4] H. Lodhi and S. H. Muggleton, Is mutagenesis still challenging?, Proc. of the 15th International Conference on Inductive Logic Programming, ILP 2005, pp. 35-40, 2005. [5] W. Uwents, G. Monfardini, H. Blockeel, F. Scarselli and M. Gori, Two Connectionist models for graph processing: an experimental comparison on relational data, ECML 2006 - Workshop MLG (Machine Learning on Graphs), pp. 213-220, 2006. [6] J. Ramon, Clustering and instance based learning in rst order logic, PhD Thesis, K.U. Leuven. Belgium, 2002. [7] W. Uwents and H. Blockeel, Classifying relational data with neural networks, Proc. of the 15th International Conference on Inductive Logic Programming, 2005. [8] E. Trentin and E. Di Iorio. A Simple and Effective Neural Model for the Classication of Structured Patterns, Proc. of KES 2007, Vietri Sul Mare (Italy), vol. 1, pp. 916, September 2007. [9] A. Sperduti and A. Starita, Supervised Neural Networks for the Classication of Structures, IEEE Transactions on Neural Networks, vol. 8, no. 3, pp. 714735, May 1997. [10] M. Gori, G. Monfardini and F. Scarselli, A new model for learning in Graph Domains, Proc. of IJCNN, 2005. [11] H. Blockeel and M. Bruynooghe, Aggregation versus selection bias, and relational neural networks, Workshop on Learning Statistical Models from Relational Data, 2003. [12] B. Hammer, A. Micheli and A. Sperduti, Universal Approximation Capability of Cascade Correlation for Structures, Neural Computation, vol. 17, no. 5, pp. 1109-1159, 2005. [13] R. O. Duda and P. E. Hart, Pattern Classication and Scene Analysis, J. Wiley, 1973. [14] L. R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proc. of the IEEE, vol. 77, no. 2, pp. 257286, 1989.

[15] P. A. Flach and N. Lachiche. Naive Bayesian Classication of Structured Data, Machine Learning, vol. 57, no. 3, pp. 233269, 2004. [16] B. Bollob s. Random Graphs (2nd Edition), Cambridge University a Press, 2001. [17] D. Watts and S. Strogatz, Collective dynamics of small world networks, Nature, vol. 393, no. 3, pp. 440442, 1998. [18] A-L Barab si and A. Reka, Emergence of scaling in random neta works, Science, vol. 286, no. 3, pp. 509-512, October 1999. [19] G. Cybenko, Approximation by superposition of sigmoidal functions, Mathematics of Control, Signal and Systems, vol. 2, pp. 303-314, October 1989. [20] C. M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, 1995. [21] M. D. Richard and R. P. Lippmann, Neural Network Classiers Estimate Bayesian A Posterior Probabilities, Neural Computation, vol. 3, pp. 461-483, 1991. [22] H. Bourlard and N. Morgan, Connectionist Speech Recognition. A Hybrid Approach, Kluwer Academic Publishers, vol. 247, 1994. [23] A. Srinivasan, S. Muggleton, R. King and M. Sternberg, Ilp experiments in a non-determinate biological domain, Proc. of the 4th International Workshop on Inductive Logic Programming, pp. 217232, 1994. [24] M. Krogel, S. Rawles, F. Zelezny, P. Flach, N. Lavrac and S. Wrobel, Comparative evaluation of approaches to propositionalization, Proc. of the 13th International Conference on Inductive Logic Programming. pp. 197-214, 2003. [25] K. Quinlan, Boosting rst-order learning, Proc. of the 7th International Workshop on Algorithmic Learning Theory, vol. 1160 of LNAI, pp. 143155, Springer, Berlin, October 2325, 1996. [26] S. Kramer and L. De Raedt, Feature Construction with Version Spaces for Biochemical Applications, Proc. of the Eighteenth International Conference on Machine Learning, pp. 258-265, 2001. [27] J. Quinlan and R. Cameron-Jones, FOIL: A midterm report, Proc. of the European Conference on Machine Learning, pp. 3-20, 1993. [28] M. Kirsten, Multirelational Distance-Based Clustering, PhD Thesis, School of Computer Science, Otto-von-Guericke University, 2002. [29] N. Friedman, L. Getoor, D. Koller and A. Pfeffer, Learning Probabilistic Relational Models, Proc. of IJCAI, Stockholm, Sweden, 1999. [30] L. De Raedt, T.G. Dietterich, L. Getoor and S. Muggleton, Probabilistic, Logical and Relational Learning - Towards a Synthesis, Internationales Begegnungs und Forschungszentrum f r Informatik (IBFI), u Schloss Dagstuhl, Germany, 2006. [31] L. De Raedt and H. Blockeel, Using logical decision trees for clustering, Proc. of the 7th International Workshop on Inductive Logic Programming, pp. 133141, 1997. [32] C.M. Gonsalves, Comparison Of Search-based And Kernel-based Methods For Graph-based Relational Learning, Master Thesis, University of Texas at Arlington, August 2005. [33] H. Kashima, K. Tsuda and A. Inokuchi, Marginalized kernels between labeled graphs, Proceedings of ICML03, pp. 32132, 2003. [34] D.J. Hand and K. Yu, Idiots Bayes - Not So Stupid After All?, International Statistical Review, vol. 69, pp. 385-398, 2001.

3246

2008 International Joint Conference on Neural Networks (IJCNN 2008)

You might also like