Professional Documents
Culture Documents
Version 1.1
Technical comments:
jsr-73-comments@jcp.org
Version 1.1
Maintenance Release Specification
June 22, 2005
Maintenance Release
Version 1.1
Copyright
Copyright (c) 2005 Oracle Corporation. All rights reserved.
Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No part of this product or documentation may be reproduced in any form by any means without prior written authorization of
the copyright holders, or any of the licensors, if any. Any unauthorized use may be a violation of domestic or international law. RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the U.S. Government and its agents is subject to the restrictions of
FAR 52.227-14(g)(2)(6/87) and FAR 52.227-19(6/87), or DFAR 252.227-7015(b)(6/95)
and DFAR 227.7202-3(a).
Disclaimer
This document and its contents are furnished as is for informational purposes only, and
are subject to change without notice. Oracle Corporation (Oracle) does not represent or
warrant that any product or business plans expressed or implied will be fulfilled in any
way. Any actions taken by the user of this document in response to the document or its
contents shall be solely at the risk of the user.
ORACLE MAKES NO WARRANTIES, EXPRESSED OR IMPLIED, WITH RESPECT
TO THIS DOCUMENT OR ITS CONTENTS, AND HEREBY EXPRESSLY DISCLAIMS ANY AND ALL IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR USE OR NON-INFRINGEMENT. IN NO EVENT SHALL
ORACLE BE HELD LIABLE FOR ANY DIRECT, INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES IN CONNECTION WITH OR ARISING FROM THE USER
OF ANY PORTION OF THE INFORMATION.
Trademarks
Sun, Sun Microsystems, Java, JavaBeans, and Enterprise JavaBeans are trademarks, registered trademarks, or servicemarks of Sun Microsystems, Inc. in the U.S. and other countries.
OMG, Object Management Group, CORBA, Unified Modeling Language, UML, are registered trademarks or trademarks of the Object Management Group, Inc.
All other product or company names mentioned are for identification purposes only, and
may be trademarks of their respective owners.
Maintenance Release
1.
1.2
1.3
1.4
1.5
1.6
Introduction..........................................................................................................................1
1.1.1
Benefits..................................................................................................................1
1.1.2
Target audience......................................................................................................2
1.1.3
Data analytics JSRs ...............................................................................................2
1.1.4
Exclusions .............................................................................................................2
Architectural components ....................................................................................................3
Dependencies and relationships...........................................................................................4
Organization.........................................................................................................................4
Expert group members.........................................................................................................5
Acknowledgements..............................................................................................................5
Use cases..................................................................................................................6
2.1
2.2
3.
Version 1.1
Overview..................................................................................................................1
1.1
2.
Concepts.................................................................................................................13
3.1
3.2
3.3
Maintenance Release
3.4
3.5
3.6
3.7
3.8
3.9
4.
Version 1.1
3.3.9
Model...................................................................................................................22
3.3.10 Model signature ...................................................................................................22
3.3.11 Model detail.........................................................................................................23
3.3.12 Logical attribute...................................................................................................23
3.3.13 Logical data .........................................................................................................23
3.3.14 Attribute statistics set ..........................................................................................23
3.3.15 Apply settings......................................................................................................24
3.3.16 Confusion matrix .................................................................................................24
3.3.17 Lift .......................................................................................................................24
3.3.18 Cost matrix ..........................................................................................................25
3.3.19 Prior probabilities ................................................................................................25
3.3.20 Category sets .......................................................................................................26
3.3.21 Taxonomy ............................................................................................................26
3.3.22 Rules ....................................................................................................................27
3.3.23 Verification report................................................................................................27
Physical data representations .............................................................................................27
3.4.1
Individual record .................................................................................................27
3.4.2
Single record case table .......................................................................................28
3.4.3
Multi-record case table ........................................................................................28
3.4.4
Data preparation ..................................................................................................29
Attribute mapping ..............................................................................................................29
3.5.1
Direct mapping ....................................................................................................29
3.5.2
Pivot mapping......................................................................................................30
Creating physical data objects ...........................................................................................30
Persistence .........................................................................................................................30
Object references ...............................................................................................................31
Reflection / introspection...................................................................................................32
Packages.................................................................................................................34
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.14
Maintenance Release
4.15
5.
5.8
5.9
5.10
5.11
6.5
7.
Version 1.1
6.
Summary................................................................................................................93
Appendix A.Glossary.........................................................................................................94
Appendix B.Requirements...............................................................................................102
B.1.
B.2.
B.3.
B.4.
B.5.
Domain requirements.......................................................................................................102
Foundation technologies ..................................................................................................103
Data mining standards .....................................................................................................103
System behavior...............................................................................................................103
Exclusions for version 1 ..................................................................................................104
B.5.1. Domain exclusions ............................................................................................104
B.5.2. System exclusions .............................................................................................104
Maintenance Release
Version 1.1
E.3.
E.4.
Introduction......................................................................................................................110
Methods ...........................................................................................................................111
E.2.1. WSDL Document Structure ..............................................................................111
E.2.2. Listing DME Contents.......................................................................................112
E.2.3. Introspection / Reflection ..................................................................................114
E.2.4. Saving objects....................................................................................................115
E.2.5. Retrieving objects..............................................................................................116
E.2.6. Removing objects ..............................................................................................117
E.2.7. Renaming objects ..............................................................................................118
E.2.8. Retrieving Object Components .........................................................................119
E.2.9. Verify Object .....................................................................................................120
E.2.10. Executing tasks..................................................................................................121
E.2.11. Getting execution status ....................................................................................123
E.2.12. Terminating Tasks..............................................................................................123
Java methods supporting XML........................................................................................124
XML Schema Definition .................................................................................................125
E.4.1. JDM Document .................................................................................................125
E.4.2. Task....................................................................................................................125
E.4.3. Task.Apply.........................................................................................................128
E.4.4. Data....................................................................................................................129
E.4.5. Supervised .........................................................................................................132
E.4.6. Supervised.Classification ..................................................................................133
E.4.7. Supervised.Regression.......................................................................................135
E.4.8. Clustering ..........................................................................................................136
E.4.9. Association ........................................................................................................138
E.4.10. AttributeImportance ..........................................................................................138
E.4.11. Statistics.............................................................................................................139
E.4.12. Algorithm ..........................................................................................................140
E.4.13. Base ...................................................................................................................143
E.4.14. Root ...................................................................................................................145
E.4.15. Enumeration extension ......................................................................................146
Maintenance Release
TABLE 1.
TABLE 2.
TABLE 3.
TABLE 4.
TABLE 5.
TABLE 6.
TABLE 7.
Version 1.1
Maintenance Release
FIGURE 1.1
FIGURE 1.2
FIGURE 4.2
FIGURE 4.3
FIGURE 4.4
FIGURE 4.5
FIGURE 4.6
FIGURE 4.7
FIGURE 4.8
FIGURE 4.9
FIGURE 4.10
FIGURE 4.11
FIGURE 4.12
FIGURE 4.13
FIGURE 4.14
FIGURE 4.15
FIGURE 4.16
FIGURE 4.17
FIGURE 4.18
FIGURE 4.19
FIGURE 4.20
FIGURE 4.21
FIGURE 4.22
FIGURE 4.23
FIGURE 4.24
FIGURE 4.25
FIGURE 4.26
FIGURE 4.27
FIGURE 4.28
FIGURE 4.29
FIGURE 4.30
FIGURE 4.31
FIGURE 4.32
FIGURE 4.33
FIGURE 4.34
FIGURE 4.35
FIGURE 4.36
FIGURE 4.37
FIGURE 4.38
FIGURE 4.39
FIGURE 4.40
FIGURE 4.41
FIGURE 4.42
FIGURE 4.43
FIGURE 4.44
FIGURE 4.45
FIGURE 4.46
FIGURE 4.47
FIGURE 4.48
FIGURE 4.49
FIGURE 4.50
Version 1.1
Maintenance Release
Version 1.1
1. Overview
1.1 Introduction
The Java Data Mining (JDM) specification addresses the need for a pure Java API to facilitate development of data mining-enabled applications. JDM supports common data mining operations, as well as the creation, persistence, access, and maintenance of metadata
supporting mining activities.
Currently, no existing Java platform specification provides a standard API for data mining
systems. Existing APIs are vendor-proprietary. By using JDM, implementers of data mining applications can expose a single, standard API that will be understood by a wide variety of developers writing client applications and components running on the Java 2
Platform. Similarly, data mining clients can be coded against a single API that is independent of the underlying data mining system. JDM is targeted for the Java 2 Platform,
Enterprise Edition (J2EE) and Standard Edition (J2SE).
In JDM, data mining [Mitchell1997, BL1997] includes the functional areas of classification, regression, attribute importance1, clustering, and association. These are supported by
such supervised and unsupervised learning algorithms as decision trees, neural networks,
Naive Bayes, Support Vector Machine, K-Means, and Apriori, on structured data. Common operations include model build, test, and apply (score). A particular implementation
of this specification may not necessarily support all interfaces and services defined by
JDM. However, JDM provides a mechanism for client discovery of supported interfaces
and capabilities.
JDM is based on a generalized, object-oriented, data mining conceptual model leveraging
emerging data mining standards such the Object Management Groups Common Warehouse Metadata (CWM), ISOs SQL/MM for Data Mining, and the Data Mining Groups
Predictive Model Markup Language (PMML), as appropriate
Implementation details of JDM are delegated to each vendor. A vendor may decide to
implement JDM as a native API of its data mining product. Others may opt to develop a
driver/adapter that mediates between a core JDM layer and multiple vendor products. The
JDM specification does not prescribe a particular implementation strategy, nor does it prescribe performance or accuracy of a given capability or algorithm.
To ensure J2EE compatibility and eliminate duplication of effort, JDM leverages existing specifications. In particular, JDM leverages the Java Connection Architecture [JSR16]
to provide communication and resource management between applications and the services that implement the JDM API. JDM also reflects aspects the Java Metadata Interface
[JSR40] for the interface specification.
1.1.1 Benefits
The availability of a J2EE-compliant data mining API provides benefit to both vendors
and users of tools and applications in the areas of business intelligence, business analytics,
data mining systems, data warehousing, and life sciences / bioinformatics.
Historically, application developers coded homegrown data mining algorithms into applications, or used sophisticated end-user GUIs. These GUIs packaged a suite of algorithms
complete with support for data transformation, model building, testing, and scoring. However, it was difficult, if not impossible, to embed data mining end-to-end in applications
using commercial data mining products due to inadequate APIs. If a vendor had an API, it
was proprietary, making the development of a product using that API risky. If a different
Maintenance Release
Version 1.1
vendors solution was required, rewriting that product was also potentially costly.
The ability to leverage data mining functionality via a standard API greatly reduces risk
and potential cost. With a standard API, customers can use multiple products for solving
business problems by applying the most appropriate algorithm implementation without
investing resources to learn each vendors proprietary API. Moreover, a standard API
makes data mining more accessible to developers while making developer skills more
transferable. Vendors can now differentiate themselves on price, performance, accuracy,
and features. Java Data Mining (JDM) addresses this need for Java.
data mining vendors companies that intend to implement this API for their respective products, thereby providing the API to end users
application developers Java programmers who wish to use a data mining API for
building GUIs or other applications that benefit from data mining technology
data mining experts individuals with advanced degrees in statistics, machine learning, or data mining; or with significant practical data mining experience
data mining novices Java-knowledgeable developers who have a basic understanding of the problems that data mining can solve, who can minimally leverage the function-level of data mining tasks
1.1.4 Exclusions
The domain of data mining is quite large. The JDM expert group made decisions early
to exclude certain features from JDM to make it more manageable. As such, functionality
such as data transformations, visualization, mining unstructured data (e.g., text), wrappers
and ensembles, and sensitivity analysis have been omitted from this first version of the
API. Note that with respect to visualization, JDM does provide many of the key data
objects necessary to support visualization, e.g., confusion matrix, lift results, decision tree
representation, and neural network architecture.
From a systems perspective, JDM does not specify behavior for transactions, scheduling,
or security. These are left to vendors to determine what best suits their respective products
and customer base.
Maintenance Release
Version 1.1
application programming interface (API) - The API is the end-user-visible component of a JDM implementation that allows access to services provided by the data mining engine (DME). An application developer using JDM requires knowledge only of
the API library, not of other supporting components.
data mining engine (DME) - A DME provides the infrastructure that offers a set of
data mining services to its API clients. When implemented as a server of a clientserver architecture, it is referred to as a data mining server (DMS), which is a specific
instantiation of the more general Enterprise Information System (EIS) as specified in
the Connector Architecture (JSR-16).
mining object repository (MOR) - The DME uses a mining object repository which
serves to persist data mining objects. This repository can be based on, e.g., the CWM
framework, specifically leveraging the CWM Data Mining metamodel, or implemented using a vendor-proprietary representation. The MOR may exist in a file-based
environment, or in a relational / object database. Section 3.7 discusses JDM persistence options.
Figure 1.1 depicts three possible architectures for a JDM implementation. In (a), each
component resides in a separate physical location or separate executable. We view this as
a three-tier architecture with the data stored in a separate repository, such as a database. In
(b), the DME contains the MOR and results in a classic client-server architecture. This
scenario is possible, e.g., where the database contains both the DME and MOR, or the
DME uses the local files system for persistent storage. In (c), the system is monolithic,
i.e., API, DME and MOR reside in, or are managed by a single executable.
API
API
API
DME
DME
DM E
MOR
MOR
MOR
(a)
(b)
(c)
Maintenance Release
Version 1.1
DMG PMML 2.0, [PMML], provides an XML-based representation for mining models and facilitates interchange among vendors for model results.
1.4 Organization
This document focuses on JDM requirements, concepts, use cases, code examples, packages supporting the API, and vendor conformance.
In Section 2, we present use cases to help the reader appreciate how this API can be used
under various circumstances, both by end users and vendors conforming to the standard.
In Section 3, we present the synthesis of data mining concepts that form the basis of the
JDM model. These concepts result from analyzing the requirements of many different data
mining functions and algorithms. These concepts are key to providing a unified data mining framework.
In Sections 4, we present the JDM packages and class diagrams to illustrate the relationship between the various interfaces and classes. Details of each class are provided in the
companion Javadoc-generated documentation.
In Section 5, we provide and explain code examples using the JDM API. These examples
represent working with the API as a non-data mining expert, relying on convenience routines to automate much of the specification, as well as exposing detailed specification for
data mining experts.
In section 6, we present the requirements for vendor conformance to the API.
In section 7, we summarize our JDM experience and where the standard is likely to go in
subsequent versions.
In appendix A, we provide a glossary of terms used in this document.
In appendix B, we review the data mining domain requirements and foundation technologies driving the API. We explore related data mining standards and common system
behavior.
In appendix C, we list optional methods for models and model detail a vendor may choose
to implement.
In appendix D, we provide JDM error codes for JDMException.
June 22, 2005
Maintenance Release
Version 1.1
In appendix E, we define Web services based on the JDM model. There has been significant interest expressed within the expert group and from external comments for defining a
JDM Web services interface.
In appendix F, we provide a list of references.
Corporate Intellect
California Institute of Technology
SAS Institute
University of Ulster, N. Ireland
Sun Microsystems, Inc.
Oracle Corporation
SPSS, Inc.
Blue Martini Software
SAP AG
KXEN
IBM Germany
KXEN
Computer Associates International, Inc.
Magnify
BEA Systems
Sun Microsystems, Inc.
Hyperion Solutions
Fair Isaac
Hyperion Solutions
Strategic Analytics
Computer Associates International, Inc.
Oracle Corporation
SPSS, Inc.
Oracle Corporation
* former member
1.6 Acknowledgements
The expert group recognizes and thanks Dipankar Roy and Shiby Thomas for reviewing
previous drafts. We also recognize and thank Marcos Campos, Gary Drescher, Boriana
Milenova, Joe Yarmus, and Yan.Zhuang for their contributions to the JDM effort.
Maintenance Release
Version 1.1
2. Use cases
The use cases presented in this section provide a context in which to understand the possible uses of JDM. We have divided use cases into two categories: those relevant to applications and those relevant to vendors implementing JDM conforming products. Readers
already familiar with data mining may want only to browse this section.
Several JDM concepts are introduced briefly below to assist in understanding the use
cases. These are described in more detail in Section 3. The reader is expected to be familiar with common data mining terminology.
Mining Function - A major subdomain of data mining that shares common high level
characteristics. Functions include classification, regression, attribute importance, association, and clustering.
Task - A container within which to specify arguments to data mining operations to be performed by the data mining engine. Tasks include model building, testing, applying (scoring), computing statistics, and object import and export. Tasks may execute synchronously
or asynchronously.
Settings - A collection of parameters specifying the input for building a data mining
model or applying a model to data (i.e., scoring). Build settings may be high level, specified for mining functions, or detailed, specified for mining algorithms. Apply settings
specify the content of the scoring result, and in some cases, affect the type of content provided. For example, a cost matrix may be specified for classification at apply time.
Model - An algorithm often produces a compressed representation of input data called a
model. This model contains the essential knowledge extracted from the data as determined
by the algorithm. A model can be descriptive or predictive. A descriptive model helps in
understanding the underlying data or model behavior. For example, an association rules
model on market basket data can be used to describe consumer behavior. A predictive
model can be an equation or set of rules that makes it possible to predict an unseen or
unknown value (the dependent variable or target) from other, known values (independent
variables or predictors).
Maintenance Release
Version 1.1
Maintenance Release
Version 1.1
Although an avid Java programmer, she is unfamiliar with the details of data mining. Having read about JDM and having access to a commercial implementation through her
school, she leverages all the automated aspects of JDM, specifying only the data and
accepting all default settings for the Clustering build settings. In this way, no algorithm
selection is necessary, nor any algorithm-specific settings.
She uses the API for the clustering model for inspecting the identified clusters.
In this use case, JDM allows novice users to extract benefit from data mining technology
by eliding algorithm details. Vendor implementations may vary in the degree of automation and the quality of models that automation produces.
Maintenance Release
Version 1.1
user specifies an existing JDM model as input to a build task, along with other required
inputs. On execution of the task, the DME uses this model as a seed from which to continue building the model. This optional specification can be used for any type of algorithm
that can leverage a seed model.
Maintenance Release
Version 1.1
to import the model. Validation of the manually modified model occurs at import. JDMs
support for single record scoring enables the analyst to produce an application that joins
information stored in a database about individuals with that dynamically acquired by airport personnel, perhaps at the ticket counter.
10
Maintenance Release
Version 1.1
The Data Web service allows customers to connect to a managed warehouse and store
their transaction, customer and sales data using a secure Web service interface. List Inc.
manages the customer data in its data warehouse, cleans and grooms the data, and provides a range of preprocessing and transformation facilities. They maintain a comprehensive repository of high quality background data including income, census, and
demographic and geographic data. List Inc. has relationships with many data vendors and
can call upon their services when required. This background data is merged with the customer data using their proprietary merge technology.
List Inc. offers a complete model training and testing facility that guarantees optimal
results. The customer data is used to build predictive models to determine the best
responders, cross sell and up sell models and investigate return on investment (ROI). List
Inc. has a comprehensive testing facility that can choose the best algorithm and product
combination that delivers the optimal ROI. The customer does not have to worry about
data mining tool integration, training and testing.
The customer decides only on the schedule for updating models and the ROI they require.
List Inc. owns two super computers to provide the fastest modeling facilities available
today.
JDM is critical to List Inc.s services. The Predictive Web service wraps JDM to allow the
customer to apply models. The Training Web service wraps JDM to allow the customer to
build models and set parameters. JDM is used internally to connect to different vendor
data mining tools and algorithms in their building and testing processes.
The Training Web service can be used by both novice customers and experienced data
analysts. Mining savvy data analysts can tailor the training process, choose particular
algorithms and their settings. In addition, they can choose the attributes from their data
they wish to include in models.
The Prediction Web service provides access to the resultant models across the net. The
Prediction Web service interface is called with new prospect data and the score outcome
returned. The service allows customers to enhance their software systems and their own
web sites with predicted outcomes as if they owned the data mining tools themselves.
11
Maintenance Release
Version 1.1
12
Maintenance Release
Version 1.1
3. Concepts
In this section, we introduce JDM concepts: mining function, task, principal objects, physical data representations, attribute mapping, physical data storage, object references, and
reflection and introspection.
3.1.1 Classification
Classification has been used in customer segmentation, business modeling, and credit
analysis. As a type of supervised learning, an algorithm supporting classification builds a
model from a set of predictors that are used to predict a target. A set of predictors may
include demographic data such as age, income, number of children, and zip code, to predict the binary target buy/no-buy a minivan. The input or build data for a supervised learning algorithm requires the presence of attributes for both predictors and target in each
case. Given a pre-determined set of classes in the target attribute, classification analyzes
the build data to create a model that can predict to which class a given case belongs.
3.1.2 Regression
Regression has been used in financial forecasting, time series prediction, biomedical and
drug response modelling, and environmental modelling. Also a type of supervised learnJune 22, 2005
13
Maintenance Release
Version 1.1
ing, regression involves predicting a continuous, numerical valued target attribute given a
set of predictors. A regression problem may use the same predictors as a classification
problem, but specifies a target such as the predicted lifetime value of a customer.
3.1.4 Clustering
Clustering has been used in customer segmentation, gene and protein analysis, product
grouping, finding numerical taxonomies, and text mining. Clustering analysis identifies
clusters embedded in the data, where a cluster is a collection of data objects that are similar to one another. A good clustering method produces high quality clusters to ensure that
the inter-cluster similarity is low and the intra-cluster similarity is high. The similarity of
two values of an attribute can be expressed as distance functions. For numeric data, this
can be as simple as the euclidean distance between points. For categorical data, similarity
can be expressed to make married and cohabiting closer to one another, as well as separated and divorced.
3.1.5 Association
Association has been used in market basket analysis and the analysis of consumer behavior for the discovery of relationships or correlations among a set of items, e.g., the presence of one pattern implies the presence of another pattern. They help to identify the
attribute value conditions that occur frequently together in a given set of data. Association
analysis is widely used in transaction data analysis for directed marketing, catalog design,
and other business decision-making process. Traditionally, association is used for market
basket data analysis such as 90% of the people who buy milk also buy bread.
Support and confidence metrics are used as a quality measure of a rule within an association model. These are available in JDM as part of the Association model for each rule produced. Note that the rules returned from an association model are different from the
predicate-based rules produced from clustering models or decision tree models. Here, the
rules consist of a set of items. These items typically occur together in a single transaction,
such as the items purchased at an online retail checkout.
The support of a rule is used to ensure that the items in associated in the rule occur
together frequently enough to be considered significant. Using the probability notation,
support (A o B) = P(A, B)
14
Maintenance Release
Version 1.1
15
Maintenance Release
Version 1.1
apply. These are not required attributes as a subset may be specified where NULL values
can be handled. Some algorithms perform automatic attribute selection, e.g., with a decision tree model, 100 attributes may have been used to train the model, but only 25 were
used in the final rule set and are necessary for scoring. These 25 constitute the model signature.
16
Maintenance Release
Version 1.1
Test data must be preprocessed in the same way as the build data. The user is responsible
for ensuring this compatibility. However, some DMEs may choose to use information
present in the LogicalData stored with the model to flag incompatibilities.
Test metrics content depends on the type of model. For example, classification models
produce a confusion matrix, whereas regression models provide error estimates. In addition to obtaining a confusion matrix, model testing includes option to compute lift and
receiver operating characteristics (ROC). A user may specify to compute lift or ROC in a
test task.
Lift is a measure of the effectiveness of a predictive model calculated as the ratio between
the results obtained with and without the predictive model. The cumulative gains and lift
charts are often used as visual aids for measuring model performance. A positive target
value v and the number of quantiles q are two common parameters to computing lift.
Suppose that there exist n records in the input data, p of which are known to have the positive target v, thus yielding a total gain p/n. These input records are applied to the predictive model to get the predicted target value and its likelihood. Then the records are
rearranged in the order of the likelihood of the positive prediction and divided into q equal
segments. The cumulative gain ci of a quantile i is the ratio of the cumulative number of
positive targets to the total number of records n. The lift value li of quantile i is computed
as the ratio of the cumulative gain ci to the total gain p/n.
ROC is a measure of comparison between individual models to determine thresholds
which yield a high proportion of positive hits. ROC curves aid users in selecting samples
by minimizing error rates. ROC was originally used in signal detection theory to gauge the
true hit versus false alarm ratio when sending signals over a noisy channel.
The horizontal axis of an ROC graph measures the false positive rate as a percentage. The
vertical axis shows the true positive rate. The top left hand corner is the optimal location in
an ROC curve, indicating high TP (true-positive) rate versus low FP (false-positive) rate.
The ROC Area Under the Curve is useful as a quantitative measure for the overall performance of models over the entire evaluation data set. The larger this number is for a specific model, the better. However, if the user wants to use a subset of the scored data, the
ROC curves help in determining which model will provide the best results at a specific
threshold.
In addition to computing test metrics from a task specification, JDM enables the computation of test metrics using a scored dataset. Here, by specifying a dataset that has the
required attributes, e.g., actual value, predicted value, the confusion matrix can be computed. Similar capabilities exists for computing lift and ROC. This separation of apply
results from the test computation provides greater application flexibility as well as enables
computing test metrics using data produced outside of JDM or a data mining system.
17
Maintenance Release
Version 1.1
encoded in the model that this pattern (combination of predictors with the target or grouping) was seen frequently in the build data.
Using an apply settings specification, a user may tailor the content of the result. For example, a user may want the customer identifier, along with the score, probability, and rule
number, e.g., if the model is a decision tree, to be output for each case in the apply data. In
the case of classification, users can also specify a cost matrix in the apply settings.
As with test above, the data input to the trained model must match the models signature.
However, not all attributes present in the signature must be present in the physical data for
apply (or test). Missing values handling depends on the algorithm or its implementation.
For supervised models, the target attribute, of course, is not needed. The result of the apply
operation is placed according to the specification of the apply settings and destination
physical data location in the task.
18
Maintenance Release
Version 1.1
Some PMML models map readily to JDM as PMML served as input to aspects of the JDM
design. JDM also influenced certain aspects of PMML in the PMML 2.0 release. Note that
JDM is not a PMML viewer; not all contents of a PMML model are readily exposed
through Java objects.
Vendors may supplement a PMML model using extensions to include JDM metadata as
part of the model, however, this is outside the specification of PMML.
Even for PMML, not all information that comprises a JDM model is immediately exportable in the PMML standard, e.g., PMML does not specify build settings. Vendors may
choose to leverage the PMML extension provision where arbitrary text can be provided. In
this case, the CWM-DM or JDM XML representation of settings may be used.
19
Maintenance Release
Version 1.1
3.3.1 Connection
JDM connection objects are abstractions for vendor specific access to the DME, e.g.,
java.resource.cci.Connection or a JCX Connection. JDM users access a DME by creating
a connection object via a connection factory, per the Java Connection Architecture (JCX).
The factory accepts a user name and password to gain access to the DME. Connections are
expected to be single-threaded, i.e., a single application thread is expected to use a given
connection instance, thereby avoiding concurrency control issues.
The connection also provides access to objects persistent in the MOR. Methods defined on
connection objects enable the creation, deletion, and retrieval of mining objects present in
a user namespace.
When a user establishes a connection to a DME, the connection provides access to objects
in the users namespace. Although not required, users can reference objects in other user
namespaces using the convention <username>.<objectname> when supplying an object
name for applicable method arguments.
The connection does not provide direct access to the data to be mined. For this, other standard interfaces for talking to databases (e.g., JDBC) and file systems already exist. Data to
be used for data mining is specified via a URI, a reference to the actual data with a vendorspecific format.
3.3.2 Task
A task object serves as a container within which to specify arguments for data mining
operations to be performed by the DME. By providing an object to specify tasks, we separate the specification of the task from its execution. To support deferred or batch processing, one or more task objects can be saved and scheduled for execution by the application.
JDM defines tasks for each of the mining operations, i.e., build, test, and apply. It also
defines tasks for computing statistics and importing and exporting mining objects.
20
Maintenance Release
Version 1.1
grated by the vendor, such as the Java Messaging Service (JMS), JMX remote MBeans,
and JCA 1.5 inbound communication.
By invoking getStatus on an execution handle, applications can determine the current status, e.g., executing or terminated. In addition, implementations may provide incremental
status on task execution using this mechanism to inform users, e.g., whether a decision
tree model is training or pruning, or what percentage of the model build or apply is complete. Applications can leverage this information to provide real-time feedback to end
users, especially if a visual interface is present.
Each time a task is executed asynchronously, an execution handle is created and associated with the task. A vendor implementation may choose to keep all or a subset of the past
execution handles with the task. The most recently executed handle must be provided.
21
Maintenance Release
Version 1.1
3.3.7 Algorithm
A data mining algorithm is a technique or procedure that when applied to data produces a
model. The set of data mining algorithms is extensive and growing. As such, JDM does
not include a large set of algorithms, but specifies a framework for including new algorithms and their model representations. This enables vendors to provide additional algorithms and functionality in advance of their inclusion in the standard.
An algorithm consists of optional specifications:
1. Algorithm settings that specify parameters or inputs that affect model building
2. Model detail that defines a models content, e.g., the specific decision tree representation that describes the tree nodes (predicates, support, etc.)
For each of these specifications, this may involve the definition of a new interface, the
reuse of existing interfaces, or the specialization of existing interfaces.
3.3.9 Model
A model object is the result of applying an algorithm to data as specified in a build settings
object. The representation of a model is specific to the algorithm used and vendors may
choose whether to expose the model detail representation. Models can be (1) used for
direct inspection, e.g., to examine the rules produced from a decision tree or association,
(2) tested for accuracy, (3) applied to data for scoring, (4) exported to an external representation such as PMML and (5) imported for use in the DME.
A model references its build settings as well as the task that created it. A model has a signature, as described below.
In this first release, models are intended to be read-only objects and cannot be directly
modified via the Java API or explicitly stored in the mining object repository (MOR) by
the user.
22
Maintenance Release
Version 1.1
23
Maintenance Release
Version 1.1
squares for value and value range. Numerical statistics are applicable to continuous and
discrete numerical values. Numerical statistics include mean, median, variance, max and
min. Discrete statistics are applicable to discrete numeric and categorical data such as
strings. Discrete statistics include the model, and histogram data.
data to be passed through to the output from the input dataset, e.g., key attributes
values computed from the apply itself, e.g., score, probability and in the case of decision trees, rule identifiers
Churner
Non-Churner
Churner
250
6 (Type I Error)
Non-Churner
21 (Type II Error)
506
The accuracy of the model on the test data is (250 + 506) / (250 + 506 + 21 + 6) = 96.6%
The error is (21 + 6) / (250 + 506 + 21 + 6) = 3.4%
3.3.17 Lift
Lift is a measure of how prediction results improve using a model than could be obtained
by chance. For example, consider that 2% of the customers contacted from a mailing list
would purchase a product. To ensure all 2% would respond, a catalog mailing would have
to be sent to the entire mailing list. Using a data mining model to select catalog recipients,
we could select those customers most likely to make a purchase. For a given customer segment, perhaps 10% of the likely purchasers can be sent a catalog. The lift then is computed
as 10/2 or 5. Lift can also be computed on a per-decile basis.
Lift may also be used as a measure to compare different data mining models. Since lift is
computed using a dataset with actual outcomes, lift compares how well a model performs
24
Maintenance Release
Version 1.1
with respect to this dataset on predicted outcomes. Lift also indicates how well model predictions improve over random selection.
Actual \ Predicted
Churner
Non-Churner
Churner
300
Non-Churner
100
The problem is to determine who will churn (change service provider) and who will not. If
a person turns out to be a churner, but is predicted to be a non-churner, the cost may be
$300 since the service provider did not act with promotions to entice that customer not to
churn and replacing that customer is expensive. However, if the customer is predicted to
be a churner, and in fact would not churn, there is a cost of $100 in unnecessary promotions given to that customer.
To represent when an algorithm predicts unknown, the cost matrix may handle this case by
defining an additional category for the target with CategoryProperty of unknown. This
allows a model to predict unknown if model cannot make a prediction better than chance.
A cost matrix may be used when building a model, applying a model to data, and computing lift or return on investment. Note, however, that although a cost matrix may be specified for all classification problems, it may be ignored if the particular algorithm cannot
handle such input. It is up to each vendor to document the behavior in this case.
25
Maintenance Release
Version 1.1
The sum of the priors must equal 1. If a value is present in the data that is not specified in
the priors and at least one prior has been specified, an exception is raised.
Note that stratification is a general procedure that can apply to attributes other than the target. It is a procedure that first groups data rows by some criterion and then samples differentially among the groups. In the more general case, a row weighting scheme would be
required for the model to be able to re-construct the relation of sample to population.
Note, however, that although priors may be specified for all classification problems, they
may be ignored if a particular algorithm cannot handle such input. It is up to each vendor
to document the behavior in this case.
3.3.21 Taxonomy
A taxonomy represents hierarchical relationships between categories. Generally, the topmost categories are most general, and the leaves are most specific or referring to specific
item categories. For example, in the category taxonomy of beverages, there may be two
sub-categories alcoholic and non-alcoholic. The category alcoholic may have further subcategories of beer, wine, liquor, and sparkling wine.
Taxonomies can exist as explicit relationships between categories, represented as Java
objects, or as metadata that references external tabular data. Large taxonomies are
expected to exist as external tabular data.
Taxonomies are an optional specification for Association settings.
26
Maintenance Release
Version 1.1
3.3.22 Rules
Certain algorithms produce rules of the general form: antecedent implies consequent.
JDM defines two kinds of rules: itemset-based as used in association rules and predicatebased as used in decision trees and clustering.
Decision tree rules involve predicate-based antecedents with probability based predictions
in the consequents. Clustering can express the component clusters as rules producing
assignments to a cluster.
Association rules are presented in the form of association objects that link an antecedent
itemset with a consequent itemset.
JDM provides methods for extracting rules from the corresponding models. Rule objects
may be translated into a vendor-defined format string, or a standard XML representation.
Rule objects provide methods giving access to the various rule features.
27
Maintenance Release
Version 1.1
matically created by the system and reused upon each invocation. It is up to the application to copy the values from the output record as needed. Obviously, the output record is
not valid with results until the execute completes.
id
age
income
churner
100
45
100
False
200
23
25
True
Name
Value
100
age
45
100
income
100
100
churner
False
200
age
23
200
income
25
200
churner
True
28
Maintenance Release
Version 1.1
Model Signature
Score Prob
29
Maintenance Release
Version 1.1
Figure 1.2 illustrates an example that maps apply input data (AID) to apply settings data
(ASD), and the mapping of AID to the model signature (MS). A direct mapping between
AID and MS can be specified explicitly. Here, since the physical attribute named X is different from the MS attribute named A, mapping is required. To output key values or other
data values from AID to ASD, a direct mapping from X to ASD attributed named A is
specified. If AID attribute Z were a key, a direct mapping to ASD attribute Y would
rename the attribute and output key values for each cases score. From the model itself, the
user specifies to output the top score and corresponding probability (assuming a classification model).
3.7 Persistence
JDM defines several named objects that can be stored in the MOR of the DME. These
objects can be categorized into input objects (build settings, logical data, physical data set,
cost matrix, taxonomy, apply settings), task objects and output objects (model, test metrics). Named objects are defined to enable applications to reuse the objects and avoid having to maintain such application metadata independently. However, a vendor can choose
which objects are persistent and which are transient based no the needs of the vendors
users. An object is persistent if it can be accessed across sessions.A session is defined as
the duration of an open connection to the DME. An object is transient if it removed, or no
longer accessible, once the session terminates. Transient objects can be access by name
during the session.
30
Maintenance Release
Version 1.1
Some vendors may choose to persist all named objects across sessions. In this case,
named objects are persisted independent of connection availability. Named objects will
be available for reuse until the application explicitly removes them using the connection method. This is applicable when a vendor needs to support a full-scale MOR, perhaps supporting an end-user data mining tool.
Some vendors may persist no metadata, e.g., named objects are persisted only for the
lifetime of the connection. This is applicable when a vendor supports only synchronous execution of the tasks and has no need to persist objects once the mining operation completes.
We expect most vendors will support persistence of some metadata to enable asynchronous execution. Data mining operations are often long running, so support for asynchronous execution can be critical for some applications. To support asynchronous
task execution, both tasks and output objects need to be persisted across sessions. This
is applicable when a vendor wishes to minimize metadata storage and maintenance
requirements.
Through the use of Connection.supportsCapability (NamedObject, PersistenceOption)
and Connection.getNamedObjects (PersistenceOption), users of a vendor implementation
can determine which objects are persistent or transient.
Object
Referenced Objects
Comment
Task
named
BuildSettings
31
Maintenance Release
Version 1.1
Object
Referenced Objects
Comment
Model
owned
LogicalData
PhysicalDataSet
N/A
Taxonomy
N/A
CostMatrix
N/A
TestMetrics
owned
ApplySettings
Contains PhysicalAttributes.
package support
capability support
default values for object variables
The Connection interface provides the method supportsPackage which allows programs
to know which packages are supported by the implementation. The standard Java object
for Class allows determining which methods are supported by the class. Within appropriate classes, the method supportsCapability is available to allow a program to determine if
the implementation will use a provided value for an object variable. For example, ClassificationSettings can be queried to learn if the cost matrix or priors specifications will have
any effect on the model build.
32
Maintenance Release
Version 1.1
Programs can determine the default values provided for objects by using the default constructor for an object and using the get methods to retrieve the default values. For example, invoking getMaxSurrogates from a TreeSettings instance will indicate the default for
this value. However, a program should invoke supportsCapability for MaxSurrogates to
know if the implementation supports surrogates. If the implementation does not support a
capability, the value returned by the get method or the value supplied by the set method is
undefined.
Methods that are not implemented but the method signature must be provided will throw
the java.lang.UnsupportedOperationException.
33
Maintenance Release
Version 1.1
4. Packages
In this section, we first introduce the notation used for depicting the JDM interfaces in
subsequent sections. Then, we introduce the packages that support the JDM specification.
This section is provided to show relationships graphically between the various components and objects. The methods on each interface are also depicted, without further comment. For details of the interfaces depicted below, refer to the accompanying Java
documentation produced using Javadoc.
4.2 Notation
Each diagram in this section represents JDM objects with the following conventions, in
some cases, to facilitate code generation.
Package - a collection of interfaces, classes, and enumerations that maps to a Java package. A package is depicted as a tabbed folder. 4.1 depicts three packages. PackageA is an
individual package. PackageC is a subpackage of PackageB.
Interface - a named specification for a set of methods that provide a service. Classes
implement interfaces. An interface is depicted as a rectangle, named, possibly with methods specified. Interfaces are distinguished from classes by the italicized name.
Class - a named specification for a set of methods. Classes are used in JDM where constructors and / or static methods are desired. A class is depicted as a rectangle, named, possibly with methods specified. A class may implement multiple interfaces. Note that JDM
defined only one class, JDMException. All other objects are described as interfaces.
Inheritance - a relationship between interfaces, classes, or between one or more interfaces and a class. Inheritance is depicted as an open triangle at the more general element,
and a line to the more specific element.
34
Maintenance Release
Version 1.1
PackageA
PackageB
PackageC
Interface
et hodZ()
ACassociation +a
lassC +c
0..n
ClassA +a ABassociation +b
methodX()
1 methodY() 1
0.. 1
ClassB
<<Enumeration>>
Enumeration
value1
value2
35
Maintenance Release
Version 1.1
javax.datamining.supervised: Defines objects supporting the build settings and models for supervised learning functions, specifically: classification and regression, with
corresponding optional packages. It also includes a common test task for the classification and regression functions.
36
Maintenance Release
<<metamodel>>
Base
<<metamodel>>
Algorithm
(from JDM)
KMeans
Version 1.1
<<metamodel>>
Data
<<metamodel>>
Statistics
(from JDM)
(from JDM)
NaiveBayes
<<metamodel>>
Task
FeedForwardNeural
Net
Tree
<<metamodel>>
Rule
(from JDM)
<<m etam odel>>
Apply
(from JDM)
<<metamodel>>
Association
(from JDM)
SVM
<<metamodel>>
AttributeImportance
<<metamodel>>
Clustering
(from JDM)
(from JDM)
<<metamodel>>
Supervised
(from JDM)
Regression
Classification
37
Maintenance Release
Version 1.1
Collection
contains(object : Object) : boolean
containsAll(collection : Collection) : boolean
equals(o : Object) : boolean
hashCode() : int
isEmpty() : boolean
iterator() : Iterator
size() : int
toArray() : Object
toArray(objectArray : Object) : Object
Factory
VerificationReport
getReportText() : String
getReportType() : ReportType
<<enum eration>>
ReportType
error
warning
Exception
Java
Exception
JDMException
JDMException(errorCode : int, errorMessage : String)
JDMException(errorCode : int, errorMessage : String, vendorCode : int, vendorMessage : String)
getErrorCode() : int
getVendorErrorCode() : int
getVendorErrorMessage() : String
ConnectionFailureException
TaskException
Inval idURIException
ObjectExistsException
UnsupportedOperationException
JDMUnsupportedFeatureException
IncompatibleSpecificationException
InvalidObjectException
ObjectNotFoundException
DuplicateEntryException
EntryNotFoundException
IllegalArgumentException
JDMIllegalArgumentException
38
Maintenance Release
Version 1.1
Enum
getEnum() : String
isEqual(src : Enum) : boolean
<<enumeration>>
ReportType
error
warning
<<enumeration>>
OutlierTreatm ent
systemDefault
systemDetermined
asIs
asMissing
<<enumeration>>
MiningFunction
<<enumeration>>
MiningAlgorithm
as sociation
attributeIm portance
regressi on
clustering
classification
feedForwardNeuralNet
kMeans
naiveBayes
decisionTree
svmRegression
svmClassification
<<enumeration>>
LogicalAttributeUs age
active
supplementary
inactive
<<enumeration>>
ExecutionState
submitted
executing
success
error
terminating
terminated
<<enumeration>>
SortOrder
systemDefault
asIs
ascending
descending
<<enumeration>>
SizeUnit
count
percentage
<<enumeration>>
MiningTask
buildTask
testTask
applyTask
computeStatisticsTask
exportTask
importTask
<<enumeration>>
ImportExportFormat
<<enumeration>>
NamedObject
task
buildSettings
model
logicalData
physicalDataSet
testMetrics
taxonomy
costMatrix
applySettings
systemDefault
PMML1_0
PMML2_0
PMML2_1
PMML3_0
CWM1_0
CWM1_1
JDM1_0
ExecutionHandle
terminate() : ExecutionStatus
getLatestStatus() : ExecutionStatus
getStatus(fromTimestamp : Date) : Collection
getStartTime() : Date
waitForCompletion(timeoutInSeconds : int) : Execut...
getDurationInSeconds() : Integer
getTaskName() : String
getWarnings() : ExecutionStatus
containsWarning() : boolean
ExecutionStatus
getState() : ExecutionState
getTimestamp() : Date
getDescription() : String
containsWarning() : boolean
<<enumeration>>
ExecutionState
submitted
executing
success
error
terminating
terminated
39
Maintenance Release
Version 1.1
<<enumeration>>
NamedObject
task
buildSettings
model
logicalData
physicalDataSet
testMetrics
taxonomy
costMatrix
applySettings
MiningObject
getObjectType() : NamedObject
getDescription() : String
setDescription(description : String)
getName() : String
getCreatorInfo() : String
getCreationDate() : Date
getObjectIdentifier() : String
BuildSettings
Task
Model
Taxonomy
PhysicalDataSet
ApplySettings
(f rom Data)
(f rom Data)
(from Apply)
TestMetrics
LogicalData
(from Supervised)
(from Data)
CostMatrix
(f rom Class ification)
MiningObject
getDescription()
setDescription()
getName()
getCreatorInfo()
getCreationDate()
getObjectIdentifier()
BuildSet ti ngs
getMiningFunction()
getDesiredExecutionTimeInMinutes()
setDesiredExecutionTimeInMinutes()
getAlgorithmSettings()
setAlgorithmSettings()
getLogicalData()
getLogicalDataName()
setLogicalDataName()
getLogicalAttributes()
getWeight()
setWeight()
setWeightAttribute()
getWeightAttribute()
getUsage()
setUsage()
setOutlierTreatment()
getOutlierTreatment()
setOutlierIdentification()
getOutlierIdentification()
verify()
Model
getUniqueIdentifier()
getVersion()
getMajorVersion()
getMinorVersion()
getProviderName()
getProviderVersion()
getApplicationName()
getMiningFunction()
getMiningAlgorithm()
getSignature()
getBuildSettings()
getEffectiveBuildSettings()
getModelDetail()
getAttributeStatistics()
getTaskIdentifier()
getBuildDuration()
TestMetrics
getTaskIdentifier() : Integer
getModelName() : String
getTestDataName() : String
Task
getExecutionHandle() : ExecutionHandle
40
Maintenance Release
Version 1.1
MiningObject
LogicalData
(from Data)
+logicalData
<<enumeration>>
LogicalAttributeUsage
0..1
BuildSettings
getMiningFunction() : MiningFunction
getDesiredExecutionTimeInMinutes() : int
buildSettingsRefLogicalData
setDesiredExecutionTimeInMinutes(minutes : int)
getAlgorithmSettings() : AlgorithmSettings
+buildSettings setAlgorithmSettings(algorithmSettings : AlgorithmSettings)
0..n getLogicalData() : LogicalData
getLogicalDataName() : String
setLogicalDataName(name : String)
+buildSettings getLogicalAttributes(usage : LogicalAttributeUsage) : Collection
0..n getWeight(logicalAttrName : String) : double
setWeight(logicalAttrName : String, weight : double)
getWeightAttribute() : String
buildSettingsRefAlgorithmSettings
setWeightAttribute(logicalAttrName : String)
getUsage(logicalAttrName : String) : LogicalAttributeUsage
setUsage(logicalAttrName : String, usage : LogicalAttributeUsage)
0..1 +algorit hmSettings
getOutlierTreatment(logicalAttrName : String) : OutlierTreatment
setOutlierTreatment(logicalAttrName : String, treatment : OutlierTreatment)
AlgorithmSettings
getOutlierIdentification(logicalAttrName : String) : Interval
verify() : VerificationReport
setOutlierIdentification(logicalAttrName : String, bounds : Interval)
getMiningAlgorithm() : MiningAlgorithm
getAttributeNames(retrievalType : AttributeRetrievalType) : String
verify() : VerificationReport
AssociationSet tings
SupervisedSettings
ClassificationSettings
AttributeImportanceSettings
act ive
suppl ementary
i nacti ve
<<enumeration>>
OutlierTreatment
systemDefault
systemDetermined
asIs
asMissing
<<enumeration>>
AttributeRetrievalType
usage
weight
outlierTreatment
outlierIdentification
ClusteringSettings
RegressionSettings
41
Maintenance Release
MiningObject
Version 1.1
modelHasSettings
( fromJDM Root)
+setti ng s
+model 1
BuildSettings
0.. 1
+effectiveSettings 0..1
Model
getUniqueIdentifier() : String
getVersion() : String
getMajorVersion() : String
getMinorVersion() : String
getProviderName() : String
getProviderVersion() : String
getApplicationName() : String
getMiningFunction() : MiningFunction
getMiningAlgorithm() : MiningAlgorithm
getSignature() : ModelSignature
getBuildSettings() : BuildSettings
getEffectiveBuildSettings() : BuildSettings
getModelDetail() : ModelDetail
getAttributeStatistics() : AttributeStatisticsSet
getTaskIdentifier() : String
getBuildDuration() : Integer
+model
1
modelHasEffectiveSettings
+model
+signature
modelHasSignature
1
+model
ModelSignature
(fromData)
0..1
+dataStatisti cs
AttributeStatisti csSet
miningModelHasStatist... 0..1
(fromStatistics)
+m ode l
1
miningModelHasDetail
+d etai l 0..1
ModelDetail
SupervisedModel
Re gressionMo de l
AttributeImportanceModel
ClusteringModel
ClassificationModel
42
Maintenance Release
Version 1.1
ConnectionSpec
getName() : String
setName(userName : String)
getURI() : String
setURI(uri : String)
setPassword(password : String)
setLocale(locale : Locale)
getLocale() : Locale
<<enumeration>>
ConnectionCapabi lity
containerManaged
connectionSpec
jcxConnection
scoringEngine
ConnectionMetaData
getVersion() : String
getMajorVersion() : int
getMinorVersion() : int
getProviderName() : String
getProviderVersion() : String
ConnectionFactory
getConnection() : Connection
getConnection(spec : ConnectionSpec) : Connection
getConnection(connection : Connection) : Connection
getConnectionSpec() : ConnectionSpec
supportsCapability(capability : ConnectionCapability) : boolean
+factory
<<enumeration>>
PersistenceOption
transientObject
persistentObject
factoryHasConnections
+conne cti on
0..n
Conn ectio n
close()
getFactory(objectName : String) : Factory
getMetaData() : ConnectionMetaData
getConnectionSpec() : ConnectionSpec
setLocale(locale : Locale)
getLocale() : Locale
getSupportedFunctions() : MiningFunction
getSupportedAlgorithms(function : MiningFunction) : MiningAlgorithm
supportsCapability(function : MiningFunction, algorithm : MiningAlgorithm, taskType : MiningTask) : boolean
supportsCapability(object : NamedObject, persistence : PersistenceOption) : boolean
getNamedObjects(persistenceOption : PersistenceOption) : NamedObject
getMaxNameLength() : int
getMaxDescriptionLength() : int
getDescription(objectName : String, objectType : NamedObject) : String
setDescription(objectName : String, objectType : NamedObject, description : String)
saveObject(name : String, object : MiningObject, replace : boolean)
removeObject(name : String, objectType : NamedObject)
renameObject(oldName : String, newName : String, objectType : NamedObject)
doesObjectExist(objectName : String, objectType : NamedObject) : boolean
retrieveObject(name : String, objectType : NamedObject) : MiningObject
retrieveObject(objectIdentifier : String) : MiningObject
retrieveObjects(createdAfter : Date, createdBefore : Date, objectType : NamedObject) : Collection
retrieveObjects(createdAfter : Date, createdBefore : Date, objectType : NamedObject, minorType : Enum) : Collection
getObjectNames(objectType : NamedObject) : Collection
getObjectNames(createdAfter : Date, createdBefore : Date, objectType : NamedObject) : Collection
getObjectNames(createdAfter : Date, createdBefore : Date, objectType : NamedObject, minorType : Enum) : Collection
getModelNames(function : MiningFunction, algorithm : MiningAlgorithm, createdAfter : Date, createdBefore : Date) : Collection
getCreationDate(objectName : String, objectType : NamedObject) : Date
retrieveModelObjects(function : MiningFunction, algorithm : MiningAlgorithm, createdAfter : Date, createdBefore : Date) : Collection
getLastExecutionHandle(taskName : String) : ExecutionHandle
getExecutionHandles(taskName : String) : ExecutionHandle
execute(taskName : String) : ExecutionHandle
execute(task : Task, timeout : Long) : ExecutionStatus
requestModelLoad(modelName : String)
requestModelUnload(modelName : String)
getLoadedModels() : String
requestDataLoad(dataURI : String)
requestDataUnload(dataURI : String)
getLoadedData() : String
43
Maintenance Release
Version 1.1
physical data
logical data
model signature
taxonomy
category matrix
category set
caseIdRequired
multiAttributeCaseId
Attribute
PhysicalAtt ributeFactory
getName() : String
getDescription() : String
PhysicalAttribute
setName(attributeName : String)
PhysicalDataSet
setDescription(description : String)
getAttributes() : Collection
getDataType() : AttributeDataType
getAttributeNames(dataType : AttributeDataType) : Collecti...
+physicalData
+attribute
setDataType(dataType : AttributeDataType)
getAttributeNames(role : PhysicalAttributeRole) : Collection
getRole() : PhysicalAttributeRole
getAttributeCount() : int
0..n
1
setRole(role : PhysicalAttributeRole)
getAttribute(attributeName : String) : PhysicalAttribute
physicalDataHasAttributes
getAttributeIndex(attributeName : String) : Integer
getAttribute(index : int) : PhysicalAttribute
addAttribute(attribute : PhysicalAttribute)
+physicalData
+statistics
addAttributes(attributeArray : PhysicalAttribute)
AttributeStatisticsSet
removeAttribute(name : String)
(fromStatistics)
0..1
1
removeAllAttributes()
physicalDataHasStatisticsSet
importMetaData()
getAttributeStatistics() : AttributeStatisticsSet
getURI() : String
PhysicalDa taSetFactory
create(uri : String, importMetaData : boolean) : PhysicalDataSet
supportsCapability(capability : PhysicalDataSetCapability) : boolean
<<enumeration>>
PhysicalDataSetCapability
singleRecordCaseData
multiRecordCaseData
<<enumeration>>
AttributeDataType
integerType
doubleType
stringType
unknownType
<<enumeration>>
PhysicalAttributeRole
data
caseId
attributeName
attributeValue
taxonomyChildId
taxonomyParentId
PhysicalDataRecordFactory
create() : PhysicalDataRecord
create(signature : ModelSignature) : PhysicalDataRecord
PhysicalDataRecord
getValue(attributeName : String) : Object
setValue(attributeName : String, value : Object)
getAttributeNames() : Collection
getAttributeCount() : int
removeAttribute(attributeName : String)
resetValues()
removeAllAttributes()
44
Maintenance Release
Version 1.1
LogicalData
getAttributes() : Collection
getAttribute(name : String) : LogicalAttribute
getAttributes(type : AttributeType) : Collection
addAttribute(attribute : LogicalAttribute)
removeAttribute(attributeName : String)
removeAllAttributes()
Attribute
getName() : String
getDescription() : Stri...
+logicalData
logicalDataHasAttributes
0..1
+attribute
1..n
LogicalAttribute
LogicalDataFactory
create() : Logi calData
create(physicalDataSet : Physi calDataSet) : Logical Data
create(physicalDataSetName : String) : Logical Data
LogicalAttributeFactory
create(attrName : String, type : AttributeType) : LogicalAttribute
create(attrNameArray : String, type : AttributeType) : LogicalAttribute
supportsCapability(capability : LogicalAttributeCapability) : boolean
<<enumeration>>
DataPreparationStatus
<<enumeration>>
AttributeType
categorical
ordinal
numerical
notSpecified
unprepared
prepared
<<enumeration>>
LogicalAttributeCapability
discreteAttributes
boundedAttributes
ordinalAttributes
unpreparedAttributes
categorySetEnabled
Mo delSignature
Attribute
getAttributes() : Collection
getAttribute(attributeName : String) : SignatureAttribute
getAttributesByRank(ordering : SortOrder) : Collection
+modelSignature
1
+attribute
SignatureAttribute
modelSignatureHasAttribute
1..n
getAttributeType() : AttributeT yp e
getDataType () : AttributeDataType
getRank() : int
getImpo rtanceValue() : double
<<enum eration>>
AttributeType
categorical
ordinal
numerical
notSpecified
45
Maintenance Release
Version 1.1
TaxonomyFactory
createTable(taxonomyName : String, physicalDataName : String) : TaxonomyTable
createObject() : TaxonomyObject
supportsCapability(capability : TaxonomyCapability) : boolean
Taxonomy
getChildren(parent : Object) : Collection
getParents(child : Object) : Collection
getRoots() : Collection
getLeaves() : Collection
TaxonomyTable
getPhysicalDataName() : String
<<enumeration>>
TaxonomyCapability
tableTaxonomy
objectTaxonomy
TaxonomyObject
addChildren(parent : Object, childArray : Object)
removeDescendants(parent : Object)
removeRelationship(parent : Object, childArray : Object)
CategoryMatrix
getCategories() : Collection
getValue(rowCategoryValue : Object, columnCategoryValue : Object) : Double
getCategorySet() : CategorySet
CategorySetFactory
create(dataType : AttributeDataType) : CategorySet
create(categorySet : CategorySet) : CategorySet
CategorySet
addCategory(categoryValue : Object, property : CategoryProperty) : int
insertCategory(categoryValue : Object, property : CategoryProperty, beforeIn...
removeCategory(index : int)
getSize() : int
getDataType() : AttributeDataType
getIndex(categoryValue : Object) : Integer
getValue(index : int) : Object
getValues() : Object
getValues(property : CategoryProperty) : Object
getName(index : int) : String
getProperty(index : int) : CategoryProperty
getDefaultProperty() : CategoryProperty
setDefaultProperty(property : CategoryProperty)
Interval
getIntervalClosure() : IntervalClosure
getStartPoint() : double
getEndPoint() : double
<<enumeration>>
IntervalClos ure
closedClosed
closedOpen
openClosed
openOpen
<<enumeration>>
CategoryProperty
valid
error
unknown
missing
46
Maintenance Release
Version 1.1
Task
getExecutionHandle() : ExecutionHa ndle
verify() : Veri ficati on Report
BuildTask
getModel Name() : String
setModelName(name : String)
getBuil dDataName() : Strin g
setBu ildDataNam e(na me : String)
getBuil dSettingsName () : Stri ng
setBu ildSettingsName (nam e : Stri ng)
getIn putModel Na me() : String
setInputModelName(mode lNam e : Stri ng)
getVali da tionDataName() : String
setVa lidationDataNam e(va lidationData Nam e : Stri ng)
getApplicati onNa me() : String
setAp pl icatio nName(n ame : String)
getModel Description() : String
setModelDescription(d escri pti on : Strin g)
getBuil dDataMap () : Map
setBu ildDataMap (buil dDat aMap : Map )
getVali da tionDataMa p() : Map
setVa lidationDataMap (vali dationDataMap : Map)
BuildTaskFactory
create(buildData : String, buildSettingsName : String, modelName : String) : BuildTask
supportsCapability(function : MiningFunction, algorithm : MiningAlgorithm, capability : BuildTaskCapability) : boolean
<<enumeration>>
BuildTaskCapability
i np utModel
val idati onData
da taMa ppin g
47
Maintenance Release
Version 1.1
ImportSummary
ExportTaskFactory
create() : ExportTask
supportsCapability(objectType : NamedObject, exportFormat : ImportExportForm
. ..
Task
getExecutionHandle() : ExecutionHandle
verify() : VerificationReport
getObjectCount() : int
getObjectNames() : String
getObjectTypes() : NamedObject
getObjectClassNames() : String
getObjectDescriptions() : String
getCreationDates() : Date
getFormat() : ImportExportFormat
1
+summary
importTaskHasSummary
0..1
addObjectName(name : String, namedObjectType : NamedO...
removeObjectName(name : String, namedObjectType : Nam...
getURI() : String
setURI(uri : String)
getFormat() : ImportExportFormat
setFormat(format : ImportExportFormat)
getObjectNames() : String
setIncludeModelSettings(option : SettingsInclusionOption)
getIncludeModelSettings() : SettingsInclusionOption
<<enum eration>>
SettingsInclusionOption
systemDefault
none
settings
effectiveSettings
settingsOnly
effectiveSettingsOnly
...
+importTask
ImportTask
ExportTask
<<enumeration>>
ImportExportFormat
getURI() : String
setURI(uri : String)
includeModelSettings() : boolean
includeModelSettings(includeModelSettings : boolean)
useOriginalCreationDates(useOriginalCreationDates : boolean)
useOriginalCreationDates() : boolean
populateSummary()
getSummary() : ImportSummary
getObjectNamesMap() : Map
setObjectNamesMap(map : Map)
ImportTaskFactory
create() : Im portTask
create(uri : String, populateSummary : boolean) : ImportTask
supportsCapability(objectType : NamedObject, exportFormat : ImportExportForm
...
systemDefault
PMML1_0
PMML2_0
PMML2_1
PMML3_0
CWM1_0
CWM1_1
JDM1_0
JDM1_1
Task
ComputeStatisticsTask
getPhysicalDataName() : String
setPhysicalDataName(name : String)
getLogicalDataName() : String
setLogicalDataName(logicalDataName : String)
ComputeStatisticsTaskFactory
create(phys icalDataName : String) : ComputeStatisticsTask
supportsCapability(capability : Com puteStatis ticsTaskCapability) : boolean
<<enumeration>>
ComputeStatisticsTaskCapability
logicalData
48
Maintenance Release
Version 1.1
Task
ApplyTask
getModelName() : String
setModelName(modelName : String)
getApplySettingsName() : String
setApplySettingsName(applySettingsName : String)
getApplyDataMap() : Map
setApplyDataMap(applyDataMap : Map)
DataSetApplyTask
RecordApplyTask
getApplyOutputDestination() : String
setApplyOutputDestination(applyOutputDestinationURI : String)
getApplyDataName() : String
setApplyDataName(applyDataName : String)
getInputRecord() : PhysicalDataRecord
s etInputRecord(record : PhysicalDataRecord)
getOutputRecord() : PhysicalDataRecord
RecordApplyTaskFactory
create(applyRecord : PhysicalDataRecord, modelName : String, applySettingsName : String) : RecordApplyTask
DataSetApplyTaskFactory
create(applyDataName : String, modelName : String, applySettingsName : String, applyOutputDestinationURI : String) : DataSetApplyTask
ApplySettings
getSourceDestinationMap() : Map
setSourceDestinationMap(sourceDestinationMap : Map)
resetMapping()
verify() : VerificationReport
49
Maintenance Release
Version 1.1
AlgorithmSettings
BuildSettings
SupervisedAlgorithmSettings
SupervisedSettings
getTargetAttributeName() : String
setTargetAttributeName(attributeName : String)
Model
Supervised Model
getTargetAttributeName() : String
MiningObject
(from JDMRoot)
Task
TestTask
getTestDataName() : String
setTestDataName(testDataName : String)
getModelName() : String
setModelName(modelName : String)
getTestMetricsName() : String
setTestMetricsName(testMetricsName : String)
getTestDataMap() : Map
setTestDataMap(testDataMap : Map)
verify() : VerificationReport
TestMetrics
getTaskIdentifier() : Integer
getModelNam e() : String
getTestDataNam e() : String
TestMetricsTask
getApplyOutputDataName() : String
setApplyOutputDataName(applyOutputData : String)
getActualTargetAttrName() : String
setActualTargetAttrName(actualTargetAttrName : String)
getPredictedTargetAttrName() : String
setPredictedTargetAttrName(predictedTargetAttrName : String)
getPredictionRankingAttrName() : String
setPredictionRankingAttrName(predictionRankingAttrName : String)
getTestMetricsName() : String
setTestMetricsName(testMetricsName : String)
verify() : VerificationReport
50
Maintenance Release
Version 1.1
SupervisedSettings
ClassificationSettings
getCostMatrixName() : String
setCostMatrixName(costMatrixName : String)
getPriorProbabilitiesMap(attributeName : String) : Map
setPriorProbabilitiesMap(attributeName : String, priorsMap : Map)
usePriors(usePriors : boolean)
getUsePriors() : boolean
SupervisedModel
ClassificationModel
getClassificationError() : double
getTargetCategorySet() : CategorySet
wasCostMatrixUsed() : boolean
ClassificationSettingsFactory
create() : Clas sificationSettings
supportsCapability(capability : ClassificationCapability) : boolean
supportsCapability(algorithm : MiningAlgorithm, capability : Clas sificationCapability) : boolean
<<enumeration>>
ClassificationCapability
costMatrix
priorProbability
weightedAttributes
ordinalAttributes
automatedDataPreparation
supplementaryAttributes
weightAttribute
classificationError
outlierTreatment
logicalAttributeUsage
logicalData
51
Maintenance Release
Version 1.1
ClassificationTestTaskFactory
TestTask
setNumberOfLiftQuantiles(numberOfQuantiles : int)
computeMetric(testMetric : ClassificationTestMetricOption, flag : boolean)
computeMetric(testMetric : ClassificationTestMetricOption) : boolean
getNumberOfLiftQuantiles() : int
getPositiveTargetValue() : Object
setPositiveTargetValue(positiveTargetValue : Object)
setCostMatrixName(costMatrixName : String)
getCostMatrixName() : String
getTestMetricsDescription() : String
setTestMetricsDescription(description : String)
da taMapping
<<enumeration>>
ClassificationTestMetricOption
confusionMatrix
lift
receiverOperatingCharacteristics
TestMetrics
Lift
(fromSupervised)
Classifi ca tionTestMetrics
getAccuracy() : Double
getConfusionMatrix() : ConfusionMatrix
getLift() : Lift
getROC() : ReceiverOperatingCharacterics
ReceiverOperatingCharacterics
getAreaUnderCurve() : double
getNumberOfThresholdCandidates() : int
getProbabilityThreshold(index : int) : double
getPositives(index : int, trueFalse : boolean) : long
getNegatives(index : int, trueFalse : boolean) : long
getHitRate(index : int) : double
getFalseAlarmRate(index : int) : double
TestMetricsTask
(from Supervised)
ClassificationTestMetricsTask
getNumberOfLiftQuanti les() : i nt
setNumberOfLiftQuantiles(numberOfQuanti les : in t)
getPositiveTargetValue() : Object
setPositiveTargetValue(positiveTargetValue : Object)
getCostMatrixName() : String
setCostMatrixName(costMatrixName : String)
com puteMetrics(testMetric : Cl assifica tionTestMetricOptio n, flag : boolean)
com puteMetric(te stMetric : ClassificationTestMetricOption) : boolean
<<enumeration>>
ClassificationT estMetricOption
confusionMatrix
lift
receiverOperatingCharacteristics
ClassificationTestMetricsTaskFactory
create(applyOutputData : String, actualTargetAttrName : String, predictedTargetAttrName : String, testMetricsName : String) : ClassificationTestMetricsTask
supportsCapability(metricOption : ClassificationTestMetricOption) : boolean
52
Maintenance Release
Version 1.1
<<enumeration>>
Classification ApplyContent
ApplySettings
pre di ctedCategory
pro babilit y
cost
nodeId
ClassificationApplySettings
mapByRank(co ntent : Classifi catio nApplyCo nte nt, d estPhysAttrName Array : St ring, fro mTop : b oolean )
mapByCategory(con tent : Classi ficati onAppl yConte nt, catego ryValue : Object , destin ati onAttrName : String)
mapT opPredictio n(conte nt : ClassificationApplyConte nt, d estPh ysAttrName : String)
mapPre di cti on s(content : Classi ficati onAp pl yContent, baseDe stPhysAttrNam e : Stri ng )
getRank(destin ati onAttrName : String ) : In teger
getRanks() : Integ er
i sFromT op () : boo lean
getMap pe dCateg ories() : Obj ect
getMap pe dDestin ationAttrName(cate goryVa lue : Obje ct, con tentT ype : Cla ssi ficati on Ap pl yCon tent) : String
getMap pe dDestin ationAttrNames(con tent : Classi ficationApp lyConte nt) : Strin g
getConten t(destin ationAttrName : String) : Classi ficati onAppl yConte nt
getConten tsByRa nk(rank : i nt) : Cla ssification ApplyCon tent
getConten tsByCa teg ory(cat egoryValu e : Object) : Classi ficati onAppl yContent
setCostMa tri xNam e(costMa trixNam e : Stri ng )
getCost Ma tri xName() : Strin g
getMap pe dConte nts() : Cla ssi ficati onAppl yContent
getMap pe dBaseDestinationAttri bu teName(content : Classi ficati onAp pl yConte nt) : String
ClassificationApplySettingsFactory
create() : ClassificationApplySettings
getDefaultApplySettings() : ClassificationApplySettings
supportsCapability(algorithm : MiningAlgorithm, content : ClassificationApplyContent) : boolean
supportsCapability(algorithm : MiningAlgorithm, capability : ClassificationApplyCapability) : boolean
getSupportedApplyContents(algorithm : MiningAlgorithm) : ClassificationApplyContent
<<enumeration>>
ClassificationApplyCapability
topSequentialRanks
bottomSequentialRanks
individualCategories
allPredictions
topPrediction
costMatrix
CategoryMatrix
getCategories() : Collection
getValue(rowCategoryValue : Object, columnCategoryValue : Object) : Double
getCategorySet() : CategorySet
ConfusionMatrix
getAccuracy() : double
getError() : double
getNumberOfPredictions(actualCategoryValue : Object, predictedCategoryValue : Object) : long
CostMatrix
CostMatrixFactory
create(categorySet : CategorySet) : CostMatrix
53
Maintenance Release
Version 1.1
SupervisedSettings
RegressionSettingsFactory
create() : RegressionSettings
supportsCapability(capability : RegressionCapability) : boolean
supportsCapability(algorithm : MiningAlgorithm, capability : RegressionCapability) : boolean
RegressionSettings
<<enumeration>>
RegressionCapability
weightedAttributes
automatedDataPreparation
supplementaryAttributes
weightAttribute
outli erTreatment
logicalAttributeUsage
logicalData
SupervisedModel
RegressionModel
getRSquared() : double
TestMetrics
(from Supervised)
TestTask
<<enumeration>>
TestTaskCapability
RegressionTestMetrics
dataMapping
RegressionTestTask
getMeanPredictedValue() : Double
getMeanActualValue() : Double
getMeanAbsoluteError() : Double
getRMSError() : Double
getRSquared() : Double
RegressionTestTaskFactory
create(inputDataName : String, modelName : String, testMetrics Name : String) : RegressionTestTask
supportsCapability(capability : TestTas kCapability) : boolean
ApplySettings
(from Apply)
RegressionApplySettings
map(content : RegressionApplyContent, destPhysAttrName : String)
getContent(destinationAttrName : String) : RegressionApplyContent
getContents() : RegressionApplyContent
getMappedDestinationAttributeName(content : RegressionApplyContent) : String
<<enumeration>>
Regress ionApplyContent
predictedValue
confidence
RegressionApplySettingsFactory
create() : RegressionApplySettings
getDefaultApplySettings() : RegressionApplySettings
supportsCapability(algorithm : MiningAlgorithm, content : RegressionApplyContent) : boolean
getSupportedApplyContents(algorithm : MiningAlgorithm) : RegressionApplyContent
54
Maintenance Release
Version 1.1
TestMetricsTask
(f rom Supervised)
RegressionTestMetricsTask
RegressionTestMetricsTaskFactory
create(applyOutputData : String, actualTargetAttrName : String, predictedTargetAttrName : String, testMetricsName : String) : RegressionTestMetricsTask
AlgorithmSettings
AttributeImportanceModel
AttributeImportanceAlgorithmSettings
BuildSettings
AttributeImportanceSettingsFactory
create() : AttributeImportanceSettings
supportsCapability(capability : AttributeImportanceCapability) : boolean
supportsCapability(algorithm : MiningAlgorithm, capability : AttributeImportanceCapability) : boolean
AttributeImportanceSettings
isSupervised() : boolean
setTargetAttributeName(targetAttrName : String)
getTargetAttributeName() : String
getMaxAttributeCount() : int
setMaxAttributeCount(maxCount : int)
<<enumeration>>
AttributeImportanceCapability
weightedAttributes
maximumResultSize
supervised
unsupervised
supplementaryAttributes
weightAttribute
outlierTreatment
logicalAttributeUsage
logicalData
55
Maintenance Release
Version 1.1
Model
AlgorithmSettings
AssociationModel
getRules() : Collection
getRules(filter : RulesFilter) : Collection
getItems() : Collection
getItemsets() : Collection
getItemsets(itemsetSize : int) : Collection
getMaxTransactionSize() : int
getAverageTransactionSize() : Double
getNumberOfTransactions() : long
getNumberOfItems() : int
getNumberOfItemsets() : int
getMinAbsoluteSupport() : int
getMaxAbsoluteSupport() : int
getMinConfidence() : Double
getMaxConfidence() : Double
getMaxRuleLength() : int
AssociationRulesAlgorithmSettings
AssociationRulesAlgorithmSettingsFactory
create() : AssociationRulesAlgorithmSettings
AssociationSettingsFactory
create() : AssociationSettings
supportsCapability(capability : AssociationCapability) : boolean
supportsCapability(algorithm : MiningAlgorithm, capability : AssociationCapability) : boolean
BuildSettings
<<enumeration>>
AssociationCapability
AssociationSettings
getMinSupport() : Double
setMinSupport(minSupport : double)
getMinConfidence() : Double
setMinConfidence(minConfidence : double)
getMaxRuleLength() : int
setMaxRuleLength(maxRuleLength : int)
getMaxRuleComponentLength(isAntecedent : boolean) : int
setMaxRuleComponentLength(maxLength : int, isAntecedent : boolean)
getMaxNumberOfRules() : int
setMaxNumberOfRules(maxRules : int)
getItems(included : boolean) : Object
addItem(item : Object, included : boolean)
addItems(itemArray : Object, included : boolean)
removeItem(item : Object, included : boolean)
removeItems(itemArray : Object, included : boolean)
setTaxonomyName(attributeName : String, taxonomyName : String)
getTaxonomyName(attributeName : String) : String
mi nimu mSupport
mi nimu mConfidence
ma xi mu mRu le Le ng th
ma xi mu mNu mbe rOfRul es
excludedItems
inclu de dIt ems
an teced en tL en gt h
conse que ntL en gth
taxon omy
au toma te dDataPrepa rat ion
supplementaryAttributes
ou tlie rT reatment
log icalAtt rib ute Usage
log icalDa ta
56
Maintenance Release
Version 1.1
AssociationRule
ge tRule Id en tifier() : in t
ge tAnte ce dent() : Itemset
ge tCon sequ en t() : I temset
ge tSup po rt() : do ub le
ge tAbsolu te Su pp ort() : i nt
ge tCon fid en ce () : d ou bl e
ge tLift() : do ub le
ge tLength () : i nt
+association
associationRefConsequent +consequent
0..n
+a ssociation associationRefAntecedent
0..n
Itemset
RulesFilter
setRang e(typ e : RuleProperty, min Valu e : dou ble, maxValue : double )
getMaxValue(type : RuleProperty) : Double
getMi nValue (type : RuleProperty) : Doubl e
setThreshold(property : RuleProperty, compOp : ComparisonOperator, th reshold Valu e : dou ble)
getThresholdValue(property : RuleProperty) : Doub le
getThresholdOperator(property : RuleProperty) : ComparisonOperator
getIte ms(componentOpti on : RuleCo mpone ntOption, included : boolean) : Object
setI tems(itemArray : Object, componentOpti on : RuleCompone ntOption, included : boolean)
setOrderingConditio n(orderByArray : Rule Prop erty, so rtOrd erArray : So rtOrder)
getOrde ringConditions() : RuleProperty
getOrde ringCondition(orderBy : Rule Prop erty) : SortOrder
setMaxNumb erOfRu les(maxRules : int)
getMaxNumberOf Rule s() : i nt
<<enumeration>>
RuleProperty
support
confidence
length
lift
<<enumeration>>
Rul eComponentOpt ion
systemDefault
antecedent
consequent
antecedentOrConsequent
RulesFilterFactory
create() : RulesFilter
supportsCapability(property : RuleProperty) : boolean
supportsCapability(property : RuleProperty, compOp : ComparisonOperator) : boolean
57
Maintenance Release
Version 1.1
<<enumeration>>
ClusteringModelProperty
centroid
hi erarchy
statistics
similarityScale
clust erSimilarity
splitPredicate
rules
Model
SignatureAttribute
ClusteringSignatureAttribute
getComparisonFunction() : AttributeComparisonFunction
getSimilarityScale() : double
getSimilarityMatrix() : SimilarityMatrix
ClusteringModel
getCluster(identifier : int) : Cluster
getNumberOfClusters() : int
getNumberOfLevels() : int
getRootClusters() : Collection
getClusters() : Collection
getLeafClusters() : Collection
getRules() : Collection
getSimilarity(clusterIdentifier1 : int, clusterIdentifier2 : int) : Double
hasProperty(property : ClusteringModelProperty) : boolean
+clusteri ng
+clustering
+clustering
0. .n
clusteringModelHasClusters
systemDefault
systemDetermined
absDiff
gaussSim
delta
equal
similarityMatrix
clusteringModelRefRoot
+root
<<enumeration>>
At tribut eCom pari sonFunction
clusteringModelHasRules +rule
Rule
+clusters 1..n
AttributeStatisticsSet
Cluster
getClusterId() : int
+statistics 0..1
getName() : String
getParent() : Cluster
+cluster
getAncestors() : Cluster
getLevel() : int
1 clusterHasStatisticsSet
getCaseCount() : long
getSupport() : double
getCentroidCoordinate(numericalAttributeName : String) : Double
getCentroidCoordinate(categoricalAttributeName : String, category : Object) : Doub...
+chil dren
getSplitPredicate() : Predicate
getChildren() : Cluster
0..n
getStatistics() : AttributeStatisticsSet
isLeaf() : boolean
isRoot() : boolean
getRule() : Rule
+cluster
0..n
+cluster
0..1
clusterRefSplitPredicate
clusterRefChildren
+splitPredicate
Predicate
58
Maintenance Release
Version 1.1
<<enumeration>>
AttributeComparisonFunction
systemDefault
systemDetermined
absDiff
gaussSim
delta
equal
similarityMatrix
BuildSetting s
ClusteringSettings
getMaxNumberOfClusters() : int
setMaxNumberOfClusters(maxClusters : int)
getMinClusterCaseCount() : long
setMinClusterCaseCount(minCaseCount : long)
getMaxClusterCaseCount() : long
setMaxClusterCaseCount(maxCount : long)
getAggregationFunction() : AggregationFunction
setAggregationFunction(function : AggregationFunction)
getAttributeComparisonFunction(logicalAttributeName : String) : AttributeComparisonFunction
setAttributeComparisonFunction(logicalAttributeName : String, function : AttributeComparisonFunction)
getMaxLevels() : int
setMaxLevels(numberOfLevels : int)
getSimilarityMatrix(logicalAttributeName : String) : SimilarityMatrix
setSimilarityMatrix(logicalAttributeName : String, matrix : SimilarityMatrix)
<<enumeration>>
AggregationFunction
systemDefault
systemDetermined
euclidean
squaredEuclidean
chebychev
cityBlock
minkowski
simpleMatching
jaccard
tanimoto
binarySimilarity
ClusteringSettingsFactory
create() : ClusteringSettings
supportsCapability(capability : ClusteringCapability) : boolean
supportsCapability(aggregationFunction : AggregationFunction) : boolean
supportsCapability(comparisonFunction : AttributeComparisonFunction) : boolean
supportsCapability(algorithm : MiningAlgorithm, capability : ClusteringCapability) : boolean
supportsCapability(aggregationFunction : AggregationFunction, comparisonFunction : AttributeComparisonFunction) : boolean
<<enumeration>>
ClusteringCa pability
minClusterCaseCount
maxClusterCaseCount
maxNumberOfClusters
weightedAttributes
automatedDataPreparation
supplementaryAttributes
weightAttribute
hierarchicalClusters
outlierTreatment
logicalAttributeUsage
logicalData
AlgorithmSettings
ClusteringAlgorithmSettings
59
Maintenance Release
Version 1.1
ApplySettings
(from Apply)
clusterIdentifier
probability
qualityOfFit
distance
ClusteringApplySettings
mapByRank(content : ClusteringApplyContent, destPhysAttrNameArray : String, fromTop : boolean)
mapByClusterIdentifier(content : ClusteringApplyContent, clusterIdentifier : int, destPhysAttrName : String)
mapTopCluster(content : ClusteringApplyContent, destPhysAttrName : String)
mapClusters(content : ClusteringApplyContent, baseDestPhysAttrName : String)
getRank(destinationAttrName : String) : Integer
getRanks() : Integer
isFromTop() : boolean
getContent(destPhysAttrName : String) : ClusteringApplyContent
getContentsByCluster(clusterIdentifier : int) : ClusteringApplyContent
getContentsByRank(rank : int) : ClusteringApplyContent
getMappedClusterIdentifiers() : int
getMappedClusterIdentifier(destPhysAttrName : String) : Integer
getMappedDestinationAttrName(clusterIdentifier : int, contentType : ClusteringApplyContent) : String
getMappedDestinationAttrNames(content : ClassificationApplyContent) : String
getMappedContents() : ClusteringApplyContent
getMappedBaseDestinationAttributeName(content : ClusteringApplyContent) : String
ClusteringApplySettingsFactory
create() : ClusteringApplySettings
getDefaultApplySettings() : ClusteringApplySettings
supportsCapability(algorithm : MiningAlgorithm, content : ClusteringApplyContent) : boolean
supportsCapability(algorithm : MiningAlgorithm, mappingType : ClusteringApplyCapability) : boolean
getSupportedApplyContents(algorithm : MiningAlgorithm) : ClusteringApplyContent
<<enumeration>>
ClusteringApplyCapability
topSequentialRanks
bottomSequentialRanks
individualClusters
allClusters
topCluster
CategoryMatrix
SimilarityMatrix
getCellValue(category1 : Object, category2 : Object) : double
setCellValue(category1 : Object, category2 : Object, similarityValue : double)
SimilarityMatrixFactory
create(ategorySet : CategorySet) : SimilarityMatrix
60
Maintenance Release
Version 1.1
Rule
getSupport() : double
getAbsoluteSupport() : long
getConfidence() : double
getAntecedent() : Predicate
getConsequent() : Predicate
getRuleIdentifier() : int
translate() : String
translate(format : RuleTranslationFormat) : String
+rule
+a ntecedent
{ordered}
+compoundPredicate
systemDefault
compoundPredicateHasPredicates +predicate
<<enumeration>>
RuleTranslationFormat
ruleRefConsequent
1
+consequent
Predicate
1..n
CompoundPredicate
getOperator() : BooleanOperator
getPredicates() : Predicate
<<enumeration>>
BooleanOperator
or
and
xor
not
surrogate
BooleanPredicate
getValue() : bool ean
<<enumeration>>
ComparisonOperator
SimplePredicate
getAttributeName() : String
getComparisonOperator() : ComparisonOperator
isNumericalValue() : boolean
getNumericalValue() : Double
getCategoryValues() : Object
equal
notEqual
lessThan
greaterThan
lessOrEqual
greaterOrEqual
in
notIn
61
Maintenance Release
Version 1.1
AttributeStatisticsSet
getStatistics(attri buteName : String) : Univaria teStati stics
getStatistics() : Collectio n
getNumberOfCases() : long
supportsCap ab iltiy(capabi lity : Attrib ut eStatisticsSetCap ability) : boolean
getStatisticsTi mestamp() : Date
UnivariateStatistics
getName() : String
getValues() : Object
getFrequency(index : int) : long
getFrequencies() : long
getProbabilities() : double
getFrequency(property : CategoryProperty) : long
getDiscreteStatistics() : DiscreteStatistics
getNumericalStatistics() : NumericalStatistics
getContinuousStatistics() : ContinuousStatistics
NumericalStatistics
getVa riance() : do uble
getQuantil eLimits() : dou ble
getQuantil e(limit : double) : d ouble
getMi nimumValue() : double
getMaximumValu e() : do ub le
getMeanValue() : double
getStandardDeviation() : double
getMedi anValue() : do uble
getInt erQu artileRange() : double
ContinuousStatistics
getIntervals() : Interval
getFrequency(range : Interval) : long
getFrequencies() : long
getSum(range : Interval) : double
getSum() : double
getSumOfSquares(range : Interval) : doub...
getSumOfSquares() : double
DiscreteStatistics
getModalValue() : Object
getDiscreteValues() : Object
getFrequency(discreteValue : Object) : long
getFrequencies() : long
62
Maintenance Release
Version 1.1
SupervisedAlgorithmSettings
TreeSettings
TreeSettingsFactory
getMaxSurrogates() : int
setMaxSurrogates(maxSurrogates : int)
getMaxDepth() : int
setMaxDepth(maxDepth : int)
determineMaxDepth(determineMaxDepth : boolean)
determineMaxDepth() : boolean
getMinNodeSize() : double
getMinNodeSizeUnit() : SizeUnit
getMinNodeSize(sizeUnit : SizeUnit) : double
setMinNodeSize(size : double, unit : SizeUnit)
getMinDecreaseInImpurity() : double
setMinDecreaseInImpurity(minImpurity : double)
getTreeSelectionMethod() : TreeSelectionMethod
setTreeSelectionMethod(selectionMethod : TreeSelectionMethod)
getMaxSplits() : int
setMaxSplits(maxSplits : int)
getMaximumPValue() : double
setMaximumPValue(maxPValue : double)
getBuildHomogeneityMetric() : TreeHomogeneityMetric
setBuildHomogeneityMetric(buildMetric : TreeHomogeneityMetric)
getPruningHomogeneityMetric() : TreeHomogeneityMetric
setPruningHomogeneityMetric(pruningMetric : TreeHomogeneityMetric)
computeNodeStatistics(computeNodeStatistics : boolean)
getComputeNodeStatistics() : boolean
<<enumeration>>
SizeUnit
count
percentage
<<enumeration>>
TreeHomogeneityMetric
systemDetermined
systemDefault
meanSquaredError
meanAbsoluteDeviation
gini
entropy
misclassificationRatio
<<enumeration>>
TreeSelectionMethod
systemDetermined
systemDefault
minimumErrorTree
oneStandardErrorTree
63
Maintenance Release
Version 1.1
SupervisedAlgorithmSettings
NaiveBayesSettings
getSingletonThreshold() : double
setSingletonThreshold(singletonThreshold : double)
getPairwiseThreshold() : double
setPairwiseThreshold(pairwiseThreshold : double)
NaiveBayesSettingsFactory
create() : NaiveBayesSettings
supportsCapability(capability : NaiveBayesCapability) : boolean
<<enumeration>>
NaiveBayes Capability
singletonThreshold
pairwiseThreshold
missingValueHandling
singletonCount
64
Maintenance Release
Version 1.1
FeedForwardNeuralNetSettingsFactory
SupervisedAlgorithmSettings
create() : FeedForwardNeuralNetSettings
supportsCapability(capability : FeedForwardNeuralNetCapability) : boolean
<<enumeration>>
Feed ForwardNeuralNetCapabi lity
getNeuralLayers() : NeuralLaye r
setNeuralLayers(hi ddenLa yerArray : NeuralLa yer)
getLearningAlgorithm() : Learn ingAlgorithm
setLearni ngAl gorithm(learni ngAl gorithm : Learnin gAlgorithm)
getMaxNumberOfIteration s() : int
setMaxNu mberOfIterati ons(maxItera ti ons : int)
getMinErrorTolerance() : double
setMinErrorTolerance(min Tole rance : double )
determineNum berOfNodesPerLayer() : boolean
determineNum berOfNodesPerLayer(determin eNumberOfNodesPerLayer : boolean)
1
backPropagation
backPropagationWithMomentum
bias
maximumIterations
minimumErrorTolerance
missingValueHandling
hiddenLayers
+backpropSettings 0..n
+backpropSettings
backpropAlgorithmSettin gsRefLearningAlgori thm
backpropAlgorithmSe ttingsHasLaye r
+learningAlgorithm
+neuralLayer
Learn ingAlgorith m
1. .n
NeuralLayer
getNumberOfNodes() : int
setNumberOfNodes(nodes : int)
useBias() : boolean
useBias(useBias : boolean)
getActivationFunction() : ActivationFunction
setActivationFunction(function : ActivationFunction)
<<enumeration>>
ActivationFunction
systemDetermined
systemDefault
linearIdentity
logistic
hyperbolicTangent
sign
symmetricSign
softMax
Backpropagation
getLearningRate() : double
setLearningRate(rate : double)
getMomentum() : double
setMomentum(momentum : double)
Backpropagati onFactory
create() : Backpropa gation
NeuralLayerFactory
create(numberOfNodes : int) : NeuralLayer
getMaxNumberOfNodes() : int
65
Maintenance Release
Version 1.1
SupervisedAlgorithmSettings
(from Supervised)
<<enumeration>>
KernelFunction
SVMClassificationSettings
getKernelFunction() : KernelFunction
setKernelFunction(kernelFunction : KernelFunction)
getCStrategy() : double
setCStrategy(cValue : double)
getTolerance() : double
setTolerance(tolerance : double)
getStandardDeviation() : double
setStandardDeviation(stdDeviation : double)
getComplexityFactor() : double
setComplexityFactor(factor : double)
getKernelCacheSize() : int
setKernelCacheSize(cacheSize : int)
getPolynomialDegree() : int
setPolynomialDegree(degree : int)
systemDefault
systemDetermined
kLinear
kGaussian
polynomial
hypertangent
sigmoid
<<enumeration>>
SVMClassificationCapability
cStrategy
tolerance
standardDeviation
complexityFactor
kernelCacheSize
polynomialDegree
SVMClassificationSettingsFactory
create() : SVMClassificationSettings
supportsCapability(capability : SVMClassificationCapability) : boolean
supportsCapability(kernelFunction : KernelFunction) : boolean
SupervisedAlgorithmSettings
(from Supervised)
SVMRegressionSettings
getKernelFunction() : KernelFunction
setKernelFunction(kernelFunction : KernelFunction)
getCStrategy() : double
setCStrategy(cValue : double)
getTolerance() : double
setTolerance(tolerance : double)
getStandardDeviation() : double
setStandardDeviation(stdDeviation : double)
getComplexityFactor() : double
setComplexityFactor(factor : double)
getKernelCacheSize() : int
setKernelCacheSize(cacheSize : int)
getPolynomialDegree() : int
setPolynomialDegree(degree : int)
getEpsilon() : double
setEpsilon(epsilon : double)
SVMRegressionSettingsFactory
create() : SVMRegressionSettings
supportsCapability(capability : SVMRegressionCapability) : boolean
supportsCapability(kernelFunction : KernelFunction) : boolean
<<enumeration>>
KernelFunction
systemDefault
systemDetermined
kLinear
kGaussian
polynomial
hypertangent
sigmoid
<<enumeration>>
SVMRegressionCapability
cStrategy
tolerance
standardDeviation
complexityFactor
kernelCacheSize
polynomialDegree
epsilon
66
Maintenance Release
Version 1.1
ClusteringAlgorithmSettings
KMeansSettings
getMaxNumberOfIterations() : int
setMaxNumberOfIterations(maxIterations : int)
getMinErrorTolerance() : double
setMinErrorTolerance(minErrorTolerance : double)
getDistanceFunction() : ClusteringDistanceFunction
setDistanceFunction(distanceFunction : ClusteringDistanceFunction)
KMeansSettingsFactory
create() : KMeansSettings
supportsCapability(capability : KMeansCapability) : boolean
supportsCapability(distanceFunction : ClusteringDistanceFunction) : boolean
<<enumeration>>
ClusteringDistanceFunction
systemDetermined
systemDefault
euclidean
<<enumeration>>
KMeansCapability
minimumErrorTolerance
missingValueHandling
67
Maintenance Release
Version 1.1
ModelDetail
TreeModelDetail
getRootNode() : TreeNode
getNodes() : TreeNode
getNodeIdentifiers() : int
getRules() : Collection
getRule(nodeId : int) : Rule
getNode(nodeId : int) : TreeNode
getTreeDepth() : int
getNumberOfNodes() : int
getNumberOfLeafNodes() : int
<<enumeration>>
PredictionType
category
mean
median
+treeModel
treeRepresentationHasNode
1
+rootNode 1
TreeNode
getIdentifier() : int
getTargetCount(target : Object) : long
getTargetCounts() : long
getCaseCount() : long
getNumberOfChildren() : int
getParent() : TreeNode
getAncestors() : TreeNode
getChildren() : TreeNode
getPredicate() : Predicate
getSurrogates() : Predicate
getPrediction() : Object
getLevel() : int
getNodeStatistics() : AttributeStatisticsSet
getPredictionType() : PredictionType
isLeaf() : boolean
getRule() : Rule
+treeNode
+childNode
0..n
+parent
treeNodeHasPredic...
treeNodeHasChild
+predicate
0..1
Predicate
68
Maintenance Release
Version 1.1
NeuralNetworkModelDetail
getNumberOfLayers() : int
getLayerIdentifiers() : int
getActivationFunction(layerId : int) : ActivationFunction
getNumberOfNeurons(layerId : int) : int
getNeuronIdentifiers(layerId : int) : int
getWeight(parentLayerId : int, parentNeuronId : int, childNeuronId : int) : double
getBias(layerId : int, neuronId : int) : double
Layer ID
0 : input
1..n-1 : hidden
n : output
ModelDetail
NaiveBayesModelDetail
getCount(attributeName : String, attributeValue : Object) : int
getPairCount(attributeName : String, predictorValue : Object, targetValue : Object) : int
getPairProbability(attributeName : String, predictorValue : Object, targetValue : Object) : double
getPairProbabilities(attrName : String, targetValue : Object) : Map
getTargetCount(targetValue : Object) : long
getTargetProbability(targetValue : Object) : double
69
Maintenance Release
Version 1.1
ModelDetail
(from Base)
SVMModelDetail
is LinearSVMModel() : boolean
getNumberOfSupportVectors() : int
getNumberOfBoundedVectors () : int
getNumberOfUnboundedVectors () : int
SVMRegressionModelDetail
getCoefficient(categoricalAttrName : String, categoryValue : Object) : double
getCoefficient(numericalAttrName : String) : double
getCoefficients(attrName : String) : Map
getBias() : double
SVMClassificationModelDetail
getCoefficient(targetValue : Object, categoricalAttrName : String, categoryValue : Object) : double
getCoefficient(targetValue : Object, numericalAttrName : String) : double
getCoefficients(targetValue : Object, attrName : String) : Map
getBias(targetValue : Object) : double
70
Maintenance Release
Version 1.1
5. Code examples
In this section, we provide several code examples to illustrate the intended use of the JDM
API. These examples do not explore all mining functions nor all algorithms. We have
selected a few data mining usage scenarios from which other examples could be derived
given the individual interface documentation.
In particular, we illustrate:
building a clustering model using the clustering mining function (Section 5.1)
applying a clustering model to a data set and specifying the apply settings (Section 5.2)
applying a clustering model to an individual record (Section 5.3)
building a classification model using the classification mining function (Section 5.4)
testing a classification model to determine model accuracy (Section 5.5)
extracting rules from a decision tree model (Section 5.6)
extracting rules from an association model (Section 5.7)
importing and exporting a model (Section 5.8)
using reflection (Section 5.9)
establishing a connection to the DME (Section 5.10)
In the examples, a connection to a DME is assumed to be readily available as dmeConn,
and exception handling is omitted intentionally for improved code readability. For the
same reason, the vendor capability is not checked in the examples. The uniform resource
identifiers (URI) are used to represent the physical data in the examples. Refer to
Section 5.11 for more information on URIs. Since URI format is specific to the vendor, we
do not specify URI values in the examples.
71
Maintenance Release
Version 1.1
(11) clusteringSettings.setMinClusterCaseCount( 5 );
(12) dmeConn.saveObject( myClusteringBS, clusteringSettings, false );
// Create a task to build a clustering model with data and settings
(13) BuildTaskFactory btFactory = (BuildTaskFactory) dmeConn.getFactory(
javax.datamining.task.BuildTask );
(14) BuildTask task = btFactory.create( myBuildData, myClusteringBS,
myClusteringModel );
(15) dmeConn.saveObject( myClusteringTask, task, false );
// Execute the task and check the status
(16) ExecutionHandle handle = dmeConn.execute( myClusteringTask );
(17) handle.waitForCompletion( Integer.MAX_VALUE ); // wait until done
(18) ExecutionStatus status = handle.getLatestStatus();
(19) if( ExecutionState.success.equals( status.getState() ) )
(20)
This code shows how to map physical attributes to logical attributes. The physical
attribute name is PERSON_AGE, but it will be mapped to the name AGE in the logical
data and will appear in the model signature. Operations using this model, e.g., apply, must
use the name AGE. This is useful when an attribute has a name that is difficult to understand.
In line 15, we explicitly save the task. All objects associated with a task and the task itself
must be saved prior to asynchronous task execution. Task need not be saved for synchronous execution.
72
Maintenance Release
Version 1.1
In line 16, we use the connection to execute the task. At the DME, an algorithm with suitable default settings is selected to produce the clustering model when an algorithm is not
specified in the settings. The resulting model is placed in the MOR represented by the connection through which the task is executed. The user could later use the name of the model
for applying the model to data.
In lines 17 through 19, the application asynchronously checks the status of the execution
by extracting the execution handle.
In lines 1 through 3, we create the PhysicalDataSet object from a URI. The URI provides information necessary to access the apply data. The PhysicalDataSet object is populated with physical attributes (by the second parameter being true) and saved in the
repository represented by the connection.
From lines 4 through 11, we create a ClusteringApplySettings object to specify the
results of the apply operation. In this example the apply settings table will have columns
for the customer id directly copied from the input table (with the new name ID), cluster id
with the highest probability (with the name ClusterId), and its probability (with the name
Probability).
In lines 12 through 14, we create the DataSetApplyTask object with the input clustering
model and data, output data and apply settings name, and save the task. A URI is also proJune 22, 2005
73
Maintenance Release
Version 1.1
vided indicating where apply output data is to be persisted. In line 15, we execute the
apply operation using the data mining server connection. In line 16, we wait for the completion of the apply task until it is completed.
// error
An input record is created for input in lines 1 through 5. This record can be reuse din subsequent scoring tasks by chaning the specific attribute values. In lines 6 through 11, the
default apply output specification is used and only the top cluster (determined with the criteria by the vendor) is to be included in the output under the attribute name
Cluster_Identifier (line 20). The clustering apply settings is saved and referenced by
name in subsequent invocations. An attribute (ID) is directly copied from the input to
June 22, 2005
74
Maintenance Release
Version 1.1
the output (lines 8 through 10) and is retrieved from the output (line 19). The attributes in
the input data must be compatible with those in the model signature including names.
Note that real-time record apply has its own task, RecordApplyTask.
Some implementations may support loading the models for faster real-time scoring. It is
up to the implementation on how to manage the loaded models. If this feature is not supported, this operation is a no-op (lines 12 and 17).
75
Maintenance Release
(25)
(26)
Version 1.1
(27) }
(28) dmeConn.saveObject( myBuildTask, buildTask, false );
// Execute the task and block until finished
(29) ExecutionHandle handle = dmeConn.execute( myBuildTask );
(30) handle.waitForCompletion( Integer.MAX_VALUE ); // wait until done
// Access the model if model was successfully built
(31) ExecutionStatus status = handle.getLatestStatus();
(32) if( ExecutionState.success.equals( status.getState() ) ) {
(33)
(34)
(35) }
76
Maintenance Release
Version 1.1
In lines 1 through 2, we create the PhysicalDataSet object from a URI. This object is
populated with physical attributes that come directly from the specified data. In line 3, we
save the data in the mining server through the connection.
In lines 4 through 7, we create the test task object with the input classification model, test
data, and the test metrics name. In line 6, the test task is specified to produce a confusion
matrix as the result. Other optional test metrics include lift and receiver operating characteristics.
In line 8, we execute the test operation using the connection. In line 9, we wait for the
completion of the test task until it is completed. In line 10, we retrieve the classification
test metrics object by name. In line 11, we get the accuracy for this model as computed
from the input test data. In line 12, we get the confusion matrix.
77
Maintenance Release
Version 1.1
(25)
TreeModelDetail treeDetail =
(TreeModelDetail) treeModel.getModelDetail();
(26)
(27)
(28)
(29)
(30)
(31)
(32)
(33)
(34)
(35)
(36) } // End of If
In this example, a classification model is built using the tree algorithm. In line 14, we
specify the tree algorithm using tree settings. Once the model is built successfully (lines
19 to 23), the rules are extracted from the resulting decision tree model (lines 24 to 35).
In lines 1 and 2, we create the PhysicalDataSet object from a URI. This physical data set
object is populated with physical attributes that come directly from the specified physical
data. In line 3, we save the data in the mining server.
In lines 4 through 6, we create the logical data object using the physical data and save it in
the server. Here, the default behavior is to create a LogicalAttribute instance for each
physical attribute in the source data. Whether an attribute is of categorical or numerical
type is derived from its attribute (or column) data type and possibly the number of unique
values in the attribute. Note that the logical data needs to be persisted to be used for build
settings. However, the logical data may be omitted in the build settings if it is not supported by the mining function or all attributes are to be used with default behavior. In this
example, since no changes are made to the logical data after its content is populated from
a physical data, it can be omitted. In other words, the lines 4 to 6 as well as the line 9 are
not necessary for this example.
From lines 7 through 15, we create the classification build settings object with the tree
algorithm settings. In lines 16 through 18, we create a build task object and in line 19, we
submit it for asynchronous execution through the connection. In line 20, we wait for the
completion of the task. From lines 21 through 23, we try get the last state of the build task
to check if the task has finished successfully, resulting in a tree classification model. From
lines 24 through 35, we retrieve the rules from the tree model.
78
Maintenance Release
Version 1.1
(12)
(13) }
The range of the support values to be used as rule selection criterion is 0.03 (3%) to 1.0
(100%), which is the maximum value for support. The rules are retrieved in the order of
descending support value, i.e., the rules with the higher support are placed in the returned
collection before the rules with lower support.
79
Maintenance Release
Version 1.1
RuleProperty.confidence };
(7) SortOrder[] orders = new SortOrder[]{ SortOrder.descending, SortOrder.descending };
(8) rulesFilter.setOrderingCondition( props, orders );
(9) rulesFilter.setMaxNumberOfRules( 100 );// maximum number of rules
// Extract rules from the model using the filtering criteria
(10) Collection rulesCollection = assocModel.getRules( rulesFilter );
(11) Iterator ruleIterator = rulesCollection.iterator();
(12) while( ruleIterator.hasNext() ) {
(13)
(14)
(15) }
This example shows how two selection criteria can be specified to retrieve rules. The
range of the support values for the rule selection is 0.02 (2%) to 1.0 (100%). The range of
the confidence values for the rule selection is 0.95 (95%) to 1.0 (100%). Only the rules
that satisfy both conditions are returned.
The rules are retrieved in the order of descending support value and then descending confidence value if the support is equal. Only the first 100 rules that satisfy the selection criteria are returned if the number of selected rules exceeds 100.
(15)
(16) }
80
Maintenance Release
Version 1.1
This example shows how to retrieve association rules whose antecedents contain any of {
milk, coke, diaper } and consequents contain any of { potato-chip, beer }. For example,
an association rule { milk, diaper, tomato => beer } will be extracted from the model.
Note that each component is a subset of the items specified for the component.
The rules are retrieved in the order of descending support value, i.e., the rules with the
higher support are placed in the returned collection before the rules with lower support.
For a more complicated rules retrieval, a range of support values and/or a range of confidence values can be specified to further restrict the rule selection.
(14)
(15) }
The filtering criteria used in this example identify the association rules with unusually
high support (5% or greater) that do not contain items { tv, dvd } in any component of the
rule. This example shows how a range of support or confidence values can be combined
with item containment.
The rules are retrieved in the order of ascending support value.
81
Maintenance Release
Version 1.1
(11)
(12)
(13)
(14)
(15)
(16) }
(17) importTask.setObjectNamesMap( nameMap );
// Execute import synchronously without timeout
(18) ExecutionStatus status = dmeConn.execute( importTask, null );
(19) ExecutionState state = status.getState();
(20) if( state.equals( ExecutionState.success ) ) { // success
(21)
// do something here...
(22) }
(23) else {
(24)
// report error
(25) }
When an object is imported, its content may not be readily known. The user may lack the
information about the object format, the number of objects in it, the object names, and so
forth. When such information is not available, the user can obtain an import summary
from the object before executing import to avoid possible errors. In addition, this information allows the user to manage object names and creation dates.
In lines 1 through 3, we create an import task object and specify the location of the import
object as a URI. In line 4, an import summary is populated from the specified import
object, and the summary object is obtained in line 5. In lines 6 through 16, the types of the
contained objects are examined, and models are given a name Imp_Modelx where x
ranges between 0 and the number of contained objects minus 1, whereas all other objects
are given a name based on their type. In line 15, a map that contains index-name mappings
is specified to the import task. In lines 18 through 20, the import task is executed synchronously and its status is checked.
Note that the build settings, if any, will be imported together with the model by default,
and the creation date will be the time of import by default. The default behavior can be
June 22, 2005
82
Maintenance Release
Version 1.1
// do something here...
(11) }
(12) else { // error while exporting
(13)
// report error
(14) }
When JDM named objects are exported, the target location and at least one JDM named
object must be specified. If export format is not specified, the vendor default format is
selected. Note that multiple named objects can be exported into one location by invoking
addObjectName method multiple times. However, a single settings inclusion control
applies to all models specified with the method.
// do something here...
(9) }
(10) else { // error while exporting
83
Maintenance Release
(11)
Version 1.1
// report error
(12) }
Since the JDM named object to be exported is not a model, it is not necessary to set the
settings inclusion control with setIncludeModelSettings method. If the export format is
not specified, the vendor default format is selected.
ClassificationSettingsFactory classificationSettingsFactory =
(ClassificationSettingsFactory) dmeConn.getFactory( javax.datamining.supervised.classification.ClassificationSettings );
//
(3)
(4)
(5)
(6)
if( classificationSettingsFactory.supportsCapability(
ClassificationCapability.costMatrix ) )
(7)
(8)
(9)
return tsFactory.getMaxSurrogatesAllowed();
(10)
(11)
return 0;
(12) }
(13) else // report classification is not supported
In line 1, the capability of the DME is examined if it supports the classification function.
An alternative to this approach is to use Connection.getSupportedFunctions method
that returns an array of MiningFunction enums that are supported by the DME.
In lines 2 through 4, it is checked if the classification function supports cost matrix for
model build. In lines 5 and 6, the DME is checked again if it supports a tree algorithm for
classification. An alternative to this approach is to use Connection.getSupportedAlgorithms method that returns an array of MiningAlgorithm enums that are supported by the
DME, given a mining function.
In lines 7 and 8, we inquire if the tree settings supports maximum surrogates. Based on the
result of this inquiry, the code returns the number of maximum surrogates in lines 9 and
11.
84
Maintenance Release
Version 1.1
In lines 1 through 6, we create an InitialContent object used to access the mining server
connection factory. In line 7, we perform a lookup to obtain the connection factory. In
lines 8 through 11, we obtain a ConnectionSpec object and specify URI, user and password. In line 12, we create a Connection object using the connection spec obtained in
line 8.
The DME itself: its actual location may be specified by a URI in the connection specification.
Physical datasets, either input data (training or apply dataset) or output data created by
the DME are specified by a URI in the PhysicalDataSet object.
Imported objects and exported destinations are also specified using a URI in the
ImportTask and ExportTask.
URIs are defined by the RFC 2396 and Java 1.4 includes an implementation of URI representation and parsing in the package java.net.
The general syntax of an URI is given by:
[scheme:]scheme-specific-part [#fragment]
There are basically 2 sorts of URIs: opaque and hierarchical. Opaque URIs have a
scheme-specific part that does not begin with a slash. Hierarchical URIs have scheme-spe-
85
Maintenance Release
Version 1.1
cific part that does begins with a slash. It can be either an absolute URI (if the scheme is
specified), or a relative URI (no scheme specified).
Hierarchical URI syntax can be further refined:
[scheme:][//authority][path][?query][#fragment]
The authority itself being generally expressed as:
[user-info@]host[:port]
While the URI specification and processing are vendor specific, in JDM, we recomend
some general guidelines.When accessing the DME, the ConnectionSpec object already
includes a user and password specification, hence no user specification should be included
in the URI. When accessing data sources requiring user authentication, if no user specification is included in the URI (either in the auhority or in the query part), the Connections
user specification may be used (single-sign-on).
User authentication could be used in some cases in the URI to differentiate the DME user
and the data access user (for example accessing a remote FTP location with a different
user than the DME authentified user). User specification could be set in such cases in the
user-info part (using a "user:password" structure), or using custom query fields, such as in
the scheme: //host/path?user=uname&password=id.
Vendors must clearly specify the schemes supported for the different URI usages (ConnectionSpec, PhysicalDataSet, ImportTask, and ExportTask). They should also indicate the
behaviour if relative URIs are specified: for example a relative URI may be used to specify a relative filename.
The expected behaviour of common schemes (file:, http:, ftp:, jdbc:, ...) must be respected
by the DME. Vendors are free to define and specify their own scheme.
See [URI], [URI-SCHEMES], and [Java-URI] for more information.
86
Maintenance Release
Version 1.1
6. Conformance statement
Conformance to the JDM API standard is more flexible than most other standards. JDM is
conceived as an a la carte standard that allows vendors to implement functions and algorithms of the standard their product supports. For example, a vendor providing only neural
network algorithms for supervised learning would have no need for the clustering or association rules portions of the JDM specification. Adding functionality not specified in the
standard is enabled through interface and class specialization.
1. Some interfaces in a package may be partially implemented, for example, for the model apply (scoring)
engine (see Section 6.4.3 on page 91).
June 22, 2005
87
Maintenance Release
Version 1.1
Package organization mining functions, algorithms, and model detail are provided
in separate packages. This readily allows a vendor to choose which packages to provide in a compliant product, or which additional packages to provide. Vendors may add
new packages supporting proprietary algorithms for standard or other mining functions.
Reflective capabilities with the ability to add new functionality, or to limit which
packages are supported, comes the need to determine which capabilities are supported
by the vendor. JDM supports identifying which packages are present in an implementation, and which capabilities are supported at a class and enumeration level.
It is recommended that vendors who extend JDM have their new interfaces conform to the
JMI specification to ensure a consistent API for end users as well as the JDM framework.
Consider the following example. JDM defines the interface TreeSettings as non-abstract,
i.e., users can get a factory and create an instance implementing the TreeSettings interface.
However, tree is not a specific data mining algorithm. A vendor supporting multiple tree
algorithms may have to introduce implementations for specific tree algorithms, e.g.,
CART_TreeSettings and C45_TreeSettings. A vendor could just implement the generic
TreeSettings interface for either of these, or provide a specific CART_TreeSettings or
C45_TreeSettings interface which inherits from TreeSettings, or provide specific interfaces that inherit from Algorithm only. Vendors must provide corresponding factories for
extension interfaces. The Connection.getFactory (objectName) method must return these
factories. For example, if a vendor subclasses TreeSettings, the vendor would document
the name strings for users to specify in the getFactory method.
88
Maintenance Release
Version 1.1
7. For all user-provided strings, vendor implementations must support the minimum
string length of 1 character. Each vendor must allow a minimum of 8 characters for all
named objects. It is recommended that each implementation have a maximum string
length defined, however, JDM does not specify this maximum.
8. Vendors cannot add methods to standard interfaces in a JDM package. Vendors may
subclass JDM classes as necessary in a separate package.
9. Vendors need only support the definition of certain interfaces as input metadata, but do
not have to use that metadata when performing mining operations. For example, the
use of a weight attribute is possible for all mining functions and algorithms. However,
it is a vendors option to leverage a weight attribute in any of these areas. An application can use the reflection/introspection interface to determine if the use of weight
attribute is supported.
10. Vendors may subclass the JDM exception class to provide more specific error or warning feedback. However, no other top level exceptions should be introduced. Vendors
have the option to wrap internally raised exceptions as JDM exceptions, e.g., class cast
exceptions can be wrapped inside a JDM exception.
11. Vendors may subclass the VerificationReport to provide more specific verification
feedback.
12. Synchronous execution of tasks must be supported, however, asynchronous execution
is optional. If not implemented, the asynchronous execute method must throw the
unsupported feature exception.
13. Named objects are defined to enable referencing objects by name in methods, as well
as for applications to reuse the objects within or across sessions. However, a vendor
must specify the degree to which persistence is supported, using transient and persistent options.
14. Vendors must minimally support the import and export of models in some format, perhaps a native, proprietary format. Import and export of all other objects and formats are
optional and subject to introspection.
javax.datamining
javax.datamining.base
javax.datamining.data
javax.datamining.resource
javax.datamining.task
javax.datamining.statistics
javax.datamining.supervised
javax.datamining.supervised.classification
Vendors who support Regression, also support the packages:
89
Maintenance Release
Version 1.1
supervised
supervised.regression
Vendors who support Association Rules, also support the packages:
associationrules
Vendors who support Clustering, also support the packages:
clustering
rule (optional)
Vendors who support Attribute Importance, also support the packages:
attributeimportance
6.4.2 Algorithm level conformance
The packages listed below are optional for the specific algorithms. To support a given
algorithm, the vendor may choose to support packages from the list provided.
Vendors who support Tree Models, may also support the packages:
Vendors who support Feedforward Neural Network Models, may also support the packages:
Vendors who support Naive Bayes Models, may also support the packages:
Vendors who support Clustering Models, may also support the packages:
Vendors who support Association Rules Models, may also support the packages:
June 22, 2005
90
Maintenance Release
Version 1.1
Resource package - all interfaces must be supported to enable establishing a connection to the DME. However, a vendor may support only the synchronous interface Connection.execute (applyTask) and therefore need not implement ExecutionHandle.
Objects supporting Model, BuildSettings, AlgorithmSettings, ModelSignature for all mining functions f and algorithms a where Connection.supportsCapability (f, a,
null) returns TRUE, the implementation must be able to retrieve models and manipulate component objects for that function and algorithm.
Task package, Apply subpackage - the implementation must support one or both of
RecordApplyTask and DataSetApplyTask. The implementation must also support the
ApplySettings interface and any function-specific subclasses.
91
Maintenance Release
Version 1.1
Partially Enabled - A vendor claims that their product is partially enabled on a per package basis if the product returns FALSE for any supportsCapability method in a given package.
As such, a vendor may claim Full Implementation, Fully Enabled if their product implements the entire standard. However, it is much more common for a vendor to claim a
Qualified Implementation, Partially Enabled.
An example of a vendors claim statement may appear as:
Product: MyMiningSystem
JDM: Qualified Implementation
Classification: Fully Enabled
Tree: Fully Enabled
NaiveBayes: Partially Enabled
Regression: Partially Enabled
92
Maintenance Release
Version 1.1
7. Summary
JSR-73 originated in July of 2000. Like many JSRs, the expert group was optimistic to
complete the specification in roughly a years time. Now, sixteen face-to-face meetings
and over 150 conference calls later, JDM is seeing the light of day!
The expert group was encouraged by interest from the data mining community in our
progress and desire for public drafts as well as the reference implementation. The definition of Web services for data mining was a late addition as the expert group recognized the
importance for an XML-based representation and interface for the Java standard.
As noted earlier, there were several features that did not make it into this specification. At
the top of the list for version 2 enhancements are:
Sequential Patterns / Time Series - mining functions to address forecasting and modeling seasonal or periodic fluctuations in data.
Transformations interface - data preparation is a key aspect of any data mining solution.
A separate JSR for transformations is likely warranted. Having a close integration with
such a JSR and addressing transformations in the next version has high priority.
Ensemble models - define composite models structured with logic, e.g., boosting and
bagging approaches.
Apply for Association - augment specification to enable prediction based on association
rules.
Text Mining - enable mining of unstructured text data both by explicit feature extraction
and the accepting of text attributes as model predictors
Model Comparison - introduce ability to compare multiple models according to various
quality metrics, e.g., accuracy and lift for classification.
Multi-record real-time scoring - enable scoring of multiple records in the record apply
task as a performance optimization for applications.
Multi-target models - enable the specification of multiple targets for supervised models
as a model performance and representation optimization.
Other possible features under discussion include: multivariate statistics, mining stream
data, advanced statistical functions, algorithms for PCA and NMF in feature extraction,
integration with workflow, deviation detection, scoring multiple models in parallel with a
single pass over the data.
93
Maintenance Release
Version 1.1
Appendix A. Glossary
algorithm
A specific technique or procedure for producing a data mining model. An algorithm uses a
specific model representation and may support one or more functional areas. Examples
include CART and CHAID for decision trees, backpropagation neural networks, Naive
Bayes, and Apriori association.
algorithm settings
A collection of settings detailing algorithm-specific behavior to be used during model
building.
apply
The data mining operation that scores data, i.e., applies a model to data to produce apply
settings.
apply data
The data used as input when applying a model. Also referred to as score data, i.e., the data
to be scored.
apply settings
A user specification detailing the output desired from applying a model to data. This output may include predicted values, associated probabilities, key values, and other supplementary data.
association
association rules
Association rules capture co-occurrence of items among transactions. A typical rule is an
implication of the form A -> B, which means that the presence of itemset A implies the
presence of itemset B with certain support and confidence. The support of the rule is the
ratio of the number of transactions where the itemsets A and B are present to the total
number of transactions. The confidence of the rule is the ratio of the number of transactions where the itemsets A and B are present to the number of transactions where itemset
A is present.
attribute
A generic column of data, minimally with a name and datatype. There are several specializations of attribute, see logical attribute, physical attribute, and signature attribute.
Attributes are used in statistics, machine learning, data mining, and other disciplines to
describe observations, objects, data records, and other entities. Sometimes attributes are
also referred to as variables, fields, dimensions, features, and properties. Attributes are
often categorized with regard to their mathematical properties, that is, in terms of the
intrinsic organization or structure of the associated values (or value range or scale).
Generally speaking, there are continuous or numerical attributes, and discrete or symbolic
attributes.
attribute assignment
The mapping of one attribute to another used to associate input data with a models
attributes, or a models output with an output table.
attribute importance
A measure of the importance of an attribute to a mining model. The measures of different
attributes in build data enables users to select the attributes that are found to be most relevant to a mining model.
attribute type
94
Maintenance Release
Version 1.1
true measures). JDM restricts itself to three types: categorical, numerical, and ordinal.
attribute usage Specifies how a logical attribute is to be used when building a model, e.g., active vs. supplementary, suppressing automatic data preprocessing, and assigning a weight to a particular attribute.
build
build data
The data used as input to building a model. Also referred to as the training data.
build settings
A collection of parameters specifying the high level input for building a data mining
model, consisting of mining function and algorithm specifications. Mining functions consist of key areas including: classification, regression, association, sequences, attribute
importance, and clustering.
case
categorical attribute
An attribute where the values correspond to discrete categories. For example, state is a
categorical attribute with discrete values (CA, NY, MA, etc.). Categorical attributes are
either non-ordered (nominal) like state, gender etc. or ordered (ordinal) such as high,
medium or low temperatures.
Categorical attributes tell us which of several unordered categories a thing belongs to. For
example, we can say that a beverage is BEER, LIQUOR, LEMONADE, or WINE. Categorical attributes exhibit the lowest degree of organization, since the set of values such an
attribute or variable may assume posses no systematic intrinsic organization or order. The
only relation between the values of such attributes is the identity relation. Because of the
lack of an order relation, it is not possible to tell if one attribute value is greater than
another, nor that one value is closer to a certain value than another. However, we can tell if
two values are equal or not equal.
For example, the categorical attribute beverage may be associated with the set, V, of possible attribute values, where V = {BEER, LIQUOR, LEMONADE, WINE}. Given this
variable, it is not possible to tell that LIQUOR is smaller than WINE, or that LIQUOR is
closer to BEER than WINE. However, we can tell that two values a and b are equal (identical) if, for example, a := BEER and b := BEER, then a = b.
category
category set
centroid
A cluster centroid is a vector that encodes, for each logical attribute, either the mean
(numerical attributes) or the mode (categorical attributes) of the cases in the build data
assigned to a cluster.
classification
The process of predicting the unknown value of the target attribute for new records using a
model built from records with known target values.
cluster
A collection of data objects that are similar to one another. Typically produced from a
clustering algorithm and stored with a clustering model.
clustering
Given a set of data points, each having a set of attributes, and a similarity measure among
them, clustering is the process of grouping the data points into different clusters such that
data points in the same cluster are more similar to one another and data points in different
clusters are less similar to one another.
cost matrix
A two-dimensional, N x N table that defines the cost associated with a prediction versus
95
Maintenance Release
Version 1.1
the actual value. A cost matrix is typically used in classification models, where N is the
number of distinct values in the target, and the columns and rows are labeled with target
values.
cross validation A method of evaluating the accuracy of a classification or regression model. The build
data is divided into several parts, with each part in turn being used to evaluate a model
built using the remaining parts.
data mining
The process of discovering hidden, previously unknown and usable information from a
large amount of data. This information is represented in a compact form, often referred to
as a model.
DMS
export
The operation that supports taking mining objects from within the DME and exporting
them to an external system such as a file or database table cell.
extension
A feature that is not covered by any of the relevant specifications or a non-standard implementation of a feature that is covered.
functional area A subset of the data mining API that corresponds to a particular class of algorithm.
feature selection
Given a data set with lots of attributes, feature selection is the process of selecting the features (attributes) that are more important to the data mining model. Feature selection is
done based on the importance computed using attribute importance algorithms. See also
Attribute Importance.
import
The operation that supports taking mining objects from an external system such as a file or
database table cell and importing them to the DME and MOR.
item
An element that can be compared against another to determine if they are different. Typi-
96
Maintenance Release
Version 1.1
JDM implementation
A JDM technology-enabled client API, resource adapter, and supporting data mining
engine. The resource adapter may provide support for features not implemented by the
supporting engine. It may also provide the mapping between standard syntax/semantics
and the native API implemented by the engine.
JMI
JMS
JMX
JOLAP
JSR
lift
A measure of how much better prediction results are using a model than could be obtained
by chance. For example, consider that 2% of the customers mailed a catalog without using
the model would make a purchase. However, using the model to select catalog recipients,
10% would make a purchase. Then the lift is 10/2 or 5. Lift may also be used as a measure
to compare different data mining models. Since lift is computed using a dataset with actual
outcomes, lift compares how well a model performs with respect to this dataset on predicted outcomes. Lift indicates how well the model improved the predictions over a random selection given actual results. Lift allows a user to infer how a model will perform on
new data.
logical attribute
A description of a domain of data used as input to mining operations. Logical attributes
may be categorical, ordinal, or numerical.
logical data
mining function A major subdomain of data mining that shares common high level characteristics. Functions include: classification, regression, attribute importance, association, and clustering.
mining model
The result of building a model from a mining build settings. The representation of the
model is specific to the algorithm specified by the user or selected by the underlying DMS
and defined by a ModelDetail object. A model can be used for direct inspection, e.g., to
examine the rules produced from a decision tree or association rules, or to score data.
The end product(s) of a mining operation. For example, a build task produces a mining
97
Maintenance Release
Version 1.1
Data value that is missing because it was not measured, not answered, was unknown or
was lost. Data mining methods vary in the way they treat missing values. Typically, they
ignore the missing values, or omit any records containing missing values, or replace missing values with the mode or mean, or infer missing values from existing values.
model
model detail
The specific representation of a model that may be algorithm dependent. For example, a
classification model has some common Model object state, however, a decision tree is
specific model detail that may have resulted from using the tree algorithm settings.
model signature
A collection of signature attributes, derived from the logical data used to build a model.
The input data to a model must be compatible with the model signature.
MOF
MOR
multi-record case
A representation of physical data that uses multiple records to store a single case. The data
is typically has three columns with roles of sequence id, attribute name, and value.
numerical attribute
An attribute whose values are numbers. The numeric value can be either an integer or a
real number. Numerical attribute values are continuous as opposed to discrete or categorical values. See also Categorical Attribute and Ordinal Attribute.
OLAP
ordinal attribute
An ordinal attribute is similar to a categorical attribute except that there is an order defined
on the discrete categorical values. For example, temperature where the discrete values are
high, medium and low. There is an order defined on the values; i.e., high > medium > low.
Ordinal attributes allow us to put things in order, because the set of values associated with
an ordinal attribute possesses an intrinsic organization, which is defined by a total order
relation. Therefore we can tell if one value is bigger or smaller than another, but we can
normally not tell or measure the difference or distance between to values (unlike with
interval attributes or variables). For example, if x, y, and z are ranked, 5, 6, and 7, we can
tell x < y < z, but not if (z - y) < (y - x). The set of values associated with an ordinal
attribute possesses an intrinsic organization, which is defined by a total order relation.
The ordinal attribute speed may take any of the following ranked values: STATIONARY,
SLOW, FAST, VERY FAST, where rank(STATIONARY) = 1, rank(SLOW) = 2,
rank(FAST) = 3, and rank(VERY FAST) = 4. This organization of the ordinal attribute
values allows us, for example, to tell that SLOW represents a smaller speed value than
FAST. However, it is not possible to tell if, for example, the difference between two adjacent values is the same or not. For example, we cannot tell if the difference between
SLOW and FAST is equal to, smaller or greater than the difference between the values
FAST and VERY FAST.
outlier
June 22, 2005
A data item that does not (or is not thought to have) come from the typical population of
98
Maintenance Release
Version 1.1
data, in other words, data items that fall outside the boundaries that enclose most other
data items in the data.
percentage
A value between 0 and 100 that represents a part of a whole. For example, 75% indicates
three quarters of a whole.
physical attribute
An object that corresponds to a field in a formatted file, or column in a database table.
Using tasks, physical attributes can be mapped to logical attributes of a models signature
or logical data of a build settings object.
physical data set Identifies data as a set of cases to be used as input to data mining. Through the use of
attribute assignment, attributes of the physical data are mapped to logical attributes of a
models logical data. The data referenced by a physical data set object can be used in
model building, model application (scoring), lift computation, statistical analysis, etc.
physical data record
A collection of named attribute values used as input and output for single record scoring.
predictor
A value between zero and one (0..1) that indicates the likelihood of an event. Zero indicates there is no chance of the event occurring. One indicates it is probabilistically certain
the event will occur.
quality of fit
In clustering, a value between zero and one that is a measure of how well a given case fits
in the predicted cluster. Values closer to zero indicate a poor fit, values closer to one indicate a good fit.
A mining function and class of supervised algorithms that predicts continuous targets.
ROC
ROI
Return On Investment.
rule
An expression of the general form if X, then Y. An output of certain models, e.g., association rules models or decision tree models. The X may be a compound predicate.
score data
session
settings
signature attribute
June 22, 2005
99
Maintenance Release
Version 1.1
A type of attribute used to define one of the inputs to a model for test and apply. See model
signature.
single-record case
A representation of physical data that uses a single records to store a each case. Each column contains data to be mined that can correspond to a logical attribute.
specified feature
A feature of JDM that must meet the specification of as detailed in JDM.
supervised learning
The process of building data mining models using a known dependent variable, also
referred to as the target. All classification and regression techniques are supervised.
supported feature
A feature for which the JDM implementation supports standard syntax and semantic
intentions, informal semantics, intended meaning} for that feature as defined in the relevant specifications.
system default
For an enumeration class, a vendor-defined default value that corresponds to one of the
allowed values for the enumeration class. This default value may be different according to
the context. Vendors must document the system default for each context.
system determined
For an enumeration class, a user may request the vendor implementation to determine
what is the best value for this enumeration. The implementation-selected value may take
into account, e.g., other settings or data to determine an enumeration value. Vendors must
document the behavior users can expect.
target
taxonomy
A hierarchical grouping of the categorical values. For example, a geography taxonomy
groups cities into states, states into regions, regions into countries and so on.
task
TCK
The data mining operation that determines the accuracy of a model. This is typically performed by using held-aside data identical in form to the build data, scoring that test data,
and comparing the actual target value with the predicted target value. Testing is only applicable for supervised models.
test data
training
The step in the model building process that produces as possibly non-optimized from of
the model. For example, a tree algorithm may produce a full tree during training, but may
require an evaluation phase to effectively select the best subtree. See build.
training data
transformation A function applied to data resulting in a new form or representation of the data. For exam-
100
Maintenance Release
Version 1.1
URI
unsupervised learning
The process of building data mining models without the guidance (supervision) of a
known, correct result. In supervised learning, this correct result is provided in the target
attribute. Unsupervised learning has no such target attribute. Clustering and association
are examples of unsupervised mining.
web service
A software application identified by a URI, whose interfaces and bindings are capable of
being defined, described, and discovered as XML artifacts. A Web service supports direct
interactions with other software agents using XML based messages exchanged via Internet-based protocols. [W3]
weight
A numeric value associated with an attribute or row. Weights associated with attributes
instruct the DME to consider the contribution of attributes with higher weights more
important than those with lower weights. Weights associated with rows, by identifying an
attributes as containing weight values, instructs the DME to consider the contribution of
rows with higher weights more important that those with lower weights.
wrapper
A type of algorithm that wraps others models to achieve better accuracy. Examples
include bagging and boosting.
101
Maintenance Release
Version 1.1
Appendix B. Requirements
This section discusses the major requirements for the data mining API. It focuses on data
mining domain requirements, use of foundation technologies and related data mining standards, and system behavior requirements. The detailed requirements are expressed in the
UML model and corresponding Javadoc documentation.
The last section discusses specific features excluded from this version of the standard.
These include both domain and system exclusions.
Requirement 2:
Requirement 3:
Requirement 4:
Support a representative set of data mining functionality for common usage of generally agreed upon algorithm interfaces.
Requirement 4.1:
Requirement 4.2:
Requirement 4.3:
Requirement 4.4:
Specify the algorithms Decision Trees, Feed Forward Neural Networks, SVM, and Naive Bayes for Classification and Regression;
and K-Means for Clustering.
Requirement 4.5:
Test
Apply
Classification
Regression
Association
Clustering
Attribute Importance
Requirement 5:
102
Maintenance Release
Version 1.1
Requirement 6:
Requirement 7:
Requirement 8:
Requirement 9:
Requirement 11:
Requirement 12:
Requirement 14:
Requirement 15:
Map the JDM API closely to the SQL/MM Data Mining standard to
facilitate a JDM implementation on top of SQL/MM.
Requirement 16:
Application software should be portable without requiring significant application code modifications.
Requirement 17:
Requirement 18:
Requirement 19:
Requirement 20:
103
Maintenance Release
Version 1.1
Requirement 20.1:
Requirement 20.2:
Uniquely name objects within a major object category, e.g., BuildSettings, Models, Results, etc.
Requirement 21:
Requirement 22:
Requirement 23:
Requirement 24:
B.5.1.2. Transformations
Data transformations are applicable beyond the realm of data mining, even though transformations are an important part of it. The expert group concluded that transformations are
beyond the scope of JDM version 1 and may deserve a separate JSR. As there are many
tools that support transformations, e.g., standalone applications and database management
systems, reproducing a small subset of commonly used mining transformations within
JDM seemed ill-advised. First, not all transformations could be covered. Second, users
would likely go outside JDM to include unsupported transformations.
Transformations are considered preprocessing. An algorithm may automatically transform
data internally, e.g., binning numerical data for Nave Bayes, however, the standard interface does not allow the specification of the number of bins or other binning options. Users
who want this level of control must preprocess the data before submitting it to JDM algorithms. Vendors who prefer to support some degree of preprocessing will find a natural
place within the specification to place such preprocessing.
Missing value treatment and outlier treatment are also viewed as a form of transformation.
Vendors who wish to include such transformations because their algorithms already provide such an option are free to provide vendor-specific algorithms settings.
Transactions - A transactional interface is not specified within JDM. Defining transaction boundaries around long running data mining operations would overly complicate the standard and the ability for vendors to support this standard. As such, we
June 22, 2005
104
Maintenance Release
Version 1.1
suggest that individual operations provide atomicity whenever possible to ensure correct execution across multiple concurrent invocations from a single or multiple users
and in the presence of failures. Transactions are an area where we expect vendors to
differentiate themselves.
Thread Safety - The level of thread safety is not specified within JDM. The extent to
which multiple threads operate correctly is up to each vendors implementation as
multi-threaded applications may not be required in many domains.
Scheduling - The ability to perform sophisticated task scheduling is not defined within
JDM. The execution of multiple tasks, related tasks, or dependencies among tasks are
better handled by existing mechanisms, e.g., workflow systems, operating system support, etc. The ability to store and reference tasks for later execution, however, directly
supports applications.
Security - JDM does not address security issues except for specifying that some form
of login validation occur for access to the DME. Similar login information is provided
for accessing data such as files and database tables. Beyond this, vendors may address
security as part of their respective implementations.
Remote Method Invocation (RMI) - JDM does not specify the architecture, e.g., client-server, or implementation technique supporting client-server communication.
Serializable Objects - JDM does not specify techniques for transferring Java objects
inter- or intra-system. The Java serialized object feature, while commonly used and
well integrated into the Java framework, has alternatives such as XML representations.
Enterprise Java Beans (EJBs) - JDM strives to provide a straightforward Java API
that may be used in many contexts. Leveraging a technology such as EJBs places certain demands on an implementation that may not be necessary for a particular use. The
JDM API does not preclude being exposed through EJBs, but this is specific to the
vendors implementation.
105
Maintenance Release
Version 1.1
Optional Method
Model
getAttributeStatistics
getBuildDuration
getEffectiveBuildSettings
getModelDetail
getUniqueIdentifier
AssociationModel
getAverageTransactionSize
getItems
getItemsets(int)
getMaxAbsoluteSupport
getMaxTransactionSize
getMinAbsoluteSupport
getNumberOfItems
getNumberOfTransactions
getRules(RulesFilter)
ClusteringModel
getRules
getSimilarity
Cluster
getCentroidCoordinate(String)
getCentroidCoordinate(String,
Object)
getName
getRule
getSplitPredicate
getStatistics
ClassificationModel
getClassificationError
RegressionModel
getRSquared
NaiveBayesModelDetail
getCount(String, Object)
getPairCount(String, Object,
Object)
getPairProbability(String,
Object, Object)
getTargetCount(Object)
getTargetProbability(Object)
SVMModelDetail
getNumberOfBoundedVectors
getNumberOfUnboundedVectors
TreeNode
getNodeStatsitics
getSurrogates
106
Maintenance Release
Version 1.1
Appendix D. Exceptions
Exceptions can be either checked or unchecked (runtime). For checked exceptions, where
the application can take appropriate actions for anticipated errors, JDM provides the
JDMException class which inherits from the standard Java Exception. All JDM methods
accepting parameters automatically include JDMException in their signature; others are as
specified in the interface documentation. JDM provides subclasses of JDMException to
allow specialized exception handling in applications.
Unchecked exceptions result from unanticipated application execution failure and may
require stopping the application. For unchecked exceptions, vendors may choose to throw
standard Java RuntimeException instances, wrap these as appropriate in JDMException or
JDMRuntimeException instances, or throw the JDM subclass of a Java runtime exception.
To keep the number of JDMException and JDM runtime exception subclasses relatively
small, yet still provide meaningful feedback to applications and developers, JDM defines
standard exception messages and error codes to support code portability. Vendors can
embed their specific error codes within the JDM exception-related classes, as well as wrap
other Java exceptions as appropriate. The table below lists standard JDM exception error
codes and their mapping to specific JDM exception subclasses.
Note that JDMException error codes are defined in the range 1000-1499, JDMRuntimeException error codes are defined in the range 1500-1999. Error codes in the range 20009999 are reserved for vendor-specific error codes.
Standard exception messages and codes are necessary for code portability. Vendors can
embed their specific error codes within the JDMException, as well as wrap other exceptions as appropriate. The tables below lists standard JDM exception error codes and corresponding Exception classes.
JDMException has the following subclasses:
ConnectionFailureException
IncompatibleSpecificationException
InvalidURIException
TaskException
InvalidObjectException
EntryNotFoundException
DuplicateEntryException
ObjectNotFoundException
ObjectExistsException
JDM defines the following runtime exceptions:
JDMUnsupportedFeatureException
inherits from java.lang.UnsupportedOperationException
JDMIllegalArgumentException
inherits from java.lang.IllegalArgumentException
107
Maintenance Release
Version 1.1
Code
JDM Exception
Title
Message
Remarks
1000
GenericError
Generic Error.
ConnectionFailureException
1001
ConnectionFailure
ConnectionFailureException
1002
ConnectionOpenFailed
ConnectionFailureException
1003
ConnectionClosedFailed
EntryNotFoundException
1004
EntryNotFound
0-999
1005
Reserved.
DuplicateEntryException
1006
DuplicateEntry
InvalidURIException
1007
InvalidURI
InvalidURIException
1008
InaccessibleURI
IncompatibleSpecificationException
1009
IncompatibleArgumentSpecification
IncompatibleSpecificationException
1010
IncompatibleSpecification
IncompatibleSpecificationException
1011
InvalidUsage
IncompatibleSpecificationException
1012
InvalidSettings
ObjectNotFoundException
1013
ObjectNotFound
ObjectExistsException
1014
ObjectExists
108
Maintenance Release
Version 1.1
Code
Title
Message
Remarks
TaskException
1015
TaskExecuting
TaskException
1016
TaskNotExecuting
TaskException
1017
TaskFailed
10171499
Code
Title
Message
Remarks
JDMRuntimeException
1500
GenericError
Generic Error.
JDMUnsupportedFeatureException
1501
UnsupportedFeature
JDMIllegalArgumentException
1502
NullArgument
JDMIllegalArgumentException
1503
ArrayMismatch
JDMIllegalArgumentException
1504
InvalidArgument
JDMIllegalArgumentException
1505
InvalidStringArgument
JDMIllegalArgumentException
1506
StringTooLong
JDMIllegalArgumentException
1507
InvalidClassName
JDMIllegalArgumentException
1508
InvalidDataType
JDMIllegalArgumentException
1509
ArraySizeExceeded
JDMIllegalArgumentException
1510
InvalidObjectType
JDMIllegalArgumentException
1511
InvalidObject
15121999
109
Maintenance Release
Version 1.1
110
Maintenance Release
Version 1.1
E.2. Methods
JDM defines the following SOAP methods to communicate with a DME. Note that JDM
Web services follow document literal style for better interoperability.
listContents
getCapabilites
getObject
saveObject
removeObject
renameObject
getSubObjects
verifyObject
executeTask
getExecutionStatus
terminateTask
For JDM SOAP methods, http://www.jsr73.org/2004/webservices is used as the
namespace. Each of these methods is detailed in the examples below.
111
Maintenance Release
Version 1.1
<message name="IDataMining_saveObject">
<part name="parameters" element="ns2:saveObjectElement"/>
</message>
<message name="IDataMining_saveObjectResponse">
<part name="result" element="ns2:saveObjectResponseElement"/>
</message>
<portType name="IDataMining">
<operation name="saveObject">
<input message="tns:IDataMining_saveObject"/>
<output message="tns:IDataMining_saveObjectResponse"/>
<fault message="tns:IDataMining_exception"/>
</operation>
...messages definitions...
</portType>
<binding name="IDataMiningBinding" type="tns:IDataMining">
<soap:binding transport="http://schemas.xmlsoap.org/soap/http"
style="document"/>
<operation name="saveObject">
<input>
<soap:body use="literal"/>
</input>
<output>
<soap:body use="literal"/>
</output>
<soap:operation soapAction=""/>
</operation>
...method bindings...
</binding>
<service name="DataMiningService">
<port name="IDataMiningPort" binding="tns:IDataMiningBinding">
<soap:address location="http://www.jsr-73.org/2004/webservices/DataMiningService"/>
</port>
</service>
</definitions>
WSDL Type
<complexType name="listContents">
<sequence>
<element name="objectFilter" type="ObjectFilter"/>
</sequence>
</complexType>
<complexType name="listContentsResponse">
<sequence>
<element name="object" type="MiningObjectHeader"
maxOccurs="unbounded"/>
</sequence>
</complexType>
112
Maintenance Release
Version 1.1
<complexType
<attribute
<attribute
<attribute
<attribute
<attribute
<attribute
<attribute
<attribute
<attribute
name="ObjectFilter">
name="name" type="xsd:string" use="optional"/>
name="type" type="xsd:string" use="optional"/>
name="function" type="xsd:string" use="optional"/>
name="algorithm" type="xsd:string" use="optional"/>
name="creatorInfo" type="xsd:string" use="optional"/>
name="createdBefore" type="xsd:date" use="optional"/>
name="createdAfter" type="xsd:date" use="optional"/>
name="objectIdentifier" type="xsd:string" use="optional"/>
name="requestedContent" type="ObjectContentType"
use="optional"/>
</complexType>
<simpleType name="ObjectContentType">
<restriction base="string">
<enumeration value="modelSignature"/>
<enumeration value="buildSettings"/>
<enumeration value="effectiveBuildSettings"/>
<enumeration value="statistics"/>
<enumeration value="modelDetail"/>
<enumeration value="logicalData"/>
<enumeration value="physicalData"/>
<enumeration value="costMatrix"/>
<enumeration value="applySettings"/>
</restriction>
</simpleType>
Example
SOAP Request:
<SOAP-ENV:Envelope
xmlns:SOAP-ENV=http://schemas.xmlsoap.org/soap/envelope
xmlns:xsi=http://www.w3c.org/2001/XMLSchema-instance
xmlns:xsd=http://www.w3c.org/2001/XMLSchema
>
<SOAP-ENV:Header>
<connectionSpec xmlns= http://www.jsr-73.org/2004/JDMSchema>
<userName>miningGuru</userName>
<password>mine</password>
<uri>www.jsr-73.org</uri>
</connectionSpec>
</SOAP-ENV:Header>
<SOAP-ENV:Body>
<listContents xmlns=http://www.jsr-73.org/2004/webservices/
xmlns:jdm= http://www.jsr-73.org/2004/JDMSchema>
<objectFilter type=CostMatrix/>
</listContents>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
SOAP Response:
<SOAP-ENV:Envelope ... >
<SOAP-ENV:Body>
<listContentsResponse
xmlns=http://www.jsr-73.org/2004/webservices/
xmlns:jdm= http://www.jsr-73.org/2004/JDMSchema>
<object xsi:type=CostMatrix name=myCostMatrix creatorInfo=jdmExpert>
</object>
</listContentsResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
June 22, 2005
113
Maintenance Release
Version 1.1
WSDL Type
<complexType name="getCapabilities"/>
<complexType name="getCapabilitiesResponse">
<sequence>
<element name="report" type="CapabilitiesReport"/>
</sequence>
</complexType>
<xsd:complexType name="CapabilitiesReport">
<xsd:sequence>
<xsd:element name="capability" type="Capability" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="Capability">
<xsd:attribute name="task" type="MiningTask" use="optional"/>
<xsd:attribute name="function" type="MiningFunction" use="optional"/>
<xsd:attribute name="algorithm" type="MiningAlgorithm" use="optional"/>
<xsd:attribute name="enumName" type="xsd:string" use="optional"/>
<xsd:attribute name="enumValue" type="xsd:string" use="optional"/>
<xsd:attribute name="isSupported" type="xsd:boolean" use="required"/>
</xsd:complexType>
Example
SOAP Request:
<SOAP-ENV:Envelope ...>
<SOAP-ENV:Header ... />
<SOAP-ENV:Body>
<getCapabilities xmlns=http://www.jsr-73.org/2004/webservices/
xmlns:jdm= http://www.jsr-73.org/2004/JDMSchema>
</getCapabilities>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
SOAP Response:
<SOAP-ENV:Envelope ...>
<SOAP-ENV:Body>
<getCapabilitiesResponse
xmlns=http://www.jsr-73.org/2004/webservices/
xmlns:jdm= http://www.jsr-73.org/2004/JDMSchema>
<report enumeration=ActivationFunction>
<capability task=Build function=Regression isSupported=true/>
...
</report>
</getCapabilitiesResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
114
Maintenance Release
Version 1.1
WSDL Type
<complexType name="saveObject">
<sequence>
<element name="object" type="MiningObject"/>
</sequence>
<attribute name="objectName" type="xsd:string" use="required"/>
<attribute name="overwrite" type="xsd:boolean" use="optional"/>
<attribute name="verify" type="xsd:boolean" use="optional"/>
</complexType>
<complexType name="saveObjectResponse">
<sequence>
<element name="report" type="VerificationReport" minOccurs="0"/>
</sequence>
</complexType>
Example
SOAP Request:
<SOAP-ENV:Envelope ... >
<SOAP-ENV:Header ... />
<SOAP-ENV:Body>
<saveObject xmlns=http://www.jsr73.org/2004/webservices/
xmlns:jdm= http://www.jsr73.org/2004/JDMSchema
name=myClassificationSettings-1 overwrite=true verify=true>
<object xsi:type=ClassificationSettings miningFunction="classification">
<algorithmSettings algorithm=naiveBayes pairwiseThreshold="0.1" singletonThreshold="0.1"/>
<buildAttribute attributeName="income" usage="active"
outlierTreatment="asMissing"/>
<buildAttribute attributeName="age" usage="active" outlierTreatment="asIs"/>
<buildAttribute attributeName="numChildren" usage="active"
outlierTreatment="asIs"/>
<buildAttribute attributeName="ss#" usage="inactive"/>
</classificationSettings>
</object>
</saveObject>
</SOAP-ENV:Body>
SOAP Response:
<SOAP-ENV:Envelope ... >
<SOAP-ENV:Body>
<saveObjectResponse
xmlns=http://www.jsr-73.org/2004/webservices/
xmlns:jdm= http://www.jsr-73.org/2004/JDMSchema>
<verificationReport reportType=warning>
June 22, 2005
115
Maintenance Release
Version 1.1
<reportText>Details of report...</reportText>
</verificationReport>
</saveObjectResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
WSDL Type
<complexType name="getObject">
<attribute name="objectName" type="xsd:string" use="required"/>
<attribute name="objectType" type="NamedObjectType" use="required"/>
</complexType>
<complexType name="getObjectResponse">
<sequence>
<element name="object" type="NamedObject"/>
</sequence>
</complexType>
<xsd:complexType name="NamedObject">
<xsd:sequence>
<xsd:choice>
<xsd:element name="task" type="Task"/>
<xsd:element name="buildSettings" type="BuildSettings"/>
<xsd:element name="model" type="Model"/>
<xsd:element name="logicalData" type="LogicalData"/>
<xsd:element name="physicalDataSet" type="PhysicalDataSet"/>
<xsd:element name="testMetrics" type="TestMetrics"/>
<xsd:element name="taxonomy" type="Taxonomy"/>
<xsd:element name="costMatrix" type="CostMatrix"/>
<xsd:element name="applySettings" type="ApplySettings"/>
</xsd:choice>
</xsd:sequence>
</xsd:complexType>
Example
SOAP Request:
<SOAP-ENV:Envelope ... >
<SOAP-ENV:Header ... />
<SOAP-ENV:Body>
<getObject xmlns=http://www.jsr-73.org/2004/webservices/
xmlns:jdm= http://www.jsr-73.org/2004/JDMSchema
name=Census_A_ClassificationSettings
type=BuildSettings />
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
SOAP Response:
116
Maintenance Release
Version 1.1
WSDL Types
<complexType name="removeObject">
<attribute name="objectName" type="xsd:string" use="required"/>
<attribute name="objectType" type="NamedObjectType" use="required"/>
</complexType>
<complexType name="removeObjectResponse">
<attribute name="objectName" type="xsd:string" use="required"/>
<attribute name="objectType" type="NamedObjectType" use="required"/>
</complexType>
Example
SOAP Request:
<SOAP-ENV:Envelope ... >
<SOAP-ENV:Header ... />
<SOAP-ENV:Body>
<removeObject xmlns=http://www.jsr73.org/2004/webservices/
xmlns:jdm= http://www.jsr73.org/2004/JDMSchema
objectName=myClassificationSettings
objectType=BuildSettings>
</removeObject>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
117
Maintenance Release
Version 1.1
SOAP Response:
<SOAP-ENV:Envelope ... >
<SOAP-ENV:Body>
<removeObjectResponse xmlns=http://www.jsr73.org/2004/webservices/
xmlns:jdm= http://www.jsr73.org/2004/JDMSchema
objectName=myClassificationSettings
objectType=BuildSettings>
</removeObjectResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
WSDL Types
<complexType name="renameObject">
<attribute name="fromName" type="xsd:string" use="required"/>
<attribute name="toName" type="xsd:string" use="required"/>
<attribute name="objectType" type="NamedObjectType" use="required"/>
</complexType>
<complexType name="renameObjectResponse">
<attribute name="fromName" type="xsd:string" use="required"/>
<attribute name="toName" type="xsd:string" use="required"/>
<attribute name="objectType" type="NamedObjectType" use="required"/>
</complexType>
Example
SOAP Request:
<SOAP-ENV:Envelope ... >
<SOAP-ENV:Header ... />
<SOAP-ENV:Body>
<renameObject xmlns=http://www.jsr73.org/2004/webservices/
xmlns:jdm= http://www.jsr73.org/2004/JDMSchema
fromName=myClassificationSettings toName=settings1>
</renameObject>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
SOAP Response:
<SOAP-ENV:Envelope ... >
<SOAP-ENV:Body>
<renameObjectResponse xmlns=http://www.jsr73.org/2004/webservices/
xmlns:jdm= http://www.jsr73.org/2004/JDMSchema
fromName=myClassificationSettings toName=settings1>
</renameObjectResponse>
June 22, 2005
118
Maintenance Release
Version 1.1
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
WSDL Type
<complexType name="getSubObjects">
<sequence>
<element name="contentType" type="ObjectContentType"
maxOccurs="unbounded"/>
</sequence>
<attribute name="objectName" type="xsd:string" use="required"/>
<attribute name="objectType" type="NamedObjectType" use="required"/>
</complexType>
<complexType name="getSubObjectsResponse">
<sequence>
<element name="object" type="SubObjectResult" maxOccurs="unbounded"/>
</sequence>
</complexType>
<xsd:complexType name="SubObjectResult">
<xsd:sequence>
<xsd:element name="header" type="MiningObject"/>
<xsd:choice>
<xsd:element name="modelSignature" type="ModelSignature"/>
<xsd:element name="buildSettings" type="BuildSettings"/>
<xsd:element name="effectiveBuildSettings" type="BuildSettings"/>
<xsd:element name="statistics" type="AttributeStatisticsSet"/>
<xsd:element name="modelDetail" type="ModelDetail"/>
<xsd:element name="logicalData" type="LogicalData"/>
<xsd:element name="physicalDataSet" type="PhysicalDataSet"/>
<xsd:element name="taxonomy" type="Taxonomy"/>
<xsd:element name="costMatrix" type="CostMatrix"/>
<xsd:element name="applySettings" type="ApplySettings"/>
</xsd:choice>
</xsd:sequence>
<xsd:attribute name="objectCount" type="xsd:int"/>
</xsd:complexType>
Example
SOAP Request:
<SOAP-ENV:Envelope ... >
<SOAP-ENV:Header ... />
<SOAP-ENV:Body>
<getSubObjects xmlns=http://www.jsr-73.org/2004/webservices/
xmlns:jdm= http://www.jsr-73.org/2004/JDMSchema
contentType=modelSignature objectName=myFavoriteModel
objectType=model>
</getSubObjects>
119
Maintenance Release
Version 1.1
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
SOAP Response:
<SOAP-ENV:Envelope ... >
<SOAP-ENV:Body>
<getSubObjectsResponse
xmlns=http://www.jsr-73.org/2004/webservices/
xmlns:jdm= http://www.jsr-73.org/2004/JDMSchema>
<object>
<header name=myFavoriteModel creatorInfo=jdmExpert/>
<modelSignature>
<attribute name="caseID" attributeType="notSpecified"
datatype="string"/>
<attribute name="age" attributeType="categorical"
datatype="integer"/>
<attribute name="income" attributeType="numerical"
datatype="double"/>
<attribute name="numChildren" attributeType="numerical"
datatype="integer"/>
</modelSignature>
<object>
</getSubObjectsResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
WSDL Type
<complexType name="verifyObject">
<sequence>
<choice>
<element name="objectName" type="xsd:string"/>
<element name="object" type="MiningObject"/>
</choice>
</sequence>
<attribute name="objectType" type="xsd:string" use="optional"/>
</complexType>
<complexType name="verifyObjectResponse">
<sequence>
<element name="report" type="VerificationReport"/>
</sequence>
</complexType>
Example
SOAP Request:
<SOAP-ENV:Envelope ... >
<SOAP-ENV:Header ... />
<SOAP-ENV:Body>
120
Maintenance Release
Version 1.1
<verifyObject xmlns=http://www.jsr-73.org/2004/webservices/
xmlns:jdm= http://www.jsr-73.org/2004/JDMSchema>
<objectName>mySettings</objectName>
<objectType>buildSettings</objectType>
</verifyObject>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope ... >
SOAP Response:
<SOAP-ENV:Envelope ... >
<SOAP-ENV:Body>
<verifyObjectResponse
xmlns=http://www.jsr-73.org/2004/webservices/
xmlns:jdm= http://www.jsr-73.org/2004/JDMSchema>
<verificationReport reportType=warning>
<reportText>Details of report...</reportText>
</verificationReport>
</verifyObjectResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
WSDL Type
<complexType name="executeTask">
<sequence>
<choice>
<element name="taskName" type="xsd:string"/>
<element name="task" type="Task"/>
</choice>
</sequence>
</complexType>
<complexType name="executeTaskResponse">
<sequence>
<choice>
<element name="status" type="ExecutionStatus"/>
<element name="recordValue" type="jdm:RecordElement"
maxOccurs="unbounded"/>
</choice>
</sequence>
</complexType>
Example
SOAP Request:
<SOAP-ENV:Envelope ... >
<SOAP-ENV:Header ... />
121
Maintenance Release
Version 1.1
<SOAP-ENV:Body>
<executeTask xmlns="http:" www.jsr73.org="2004" http:="www.jsr-73.org"/>
<task xsi:type=BuildTask name="myBuildTask-1">
<objectName>CensusBuildTask_A</objectName>
<modelName>Census_A</modelName>
<buildDataName>CensusBuild</buildDataName>
<buildSettingsName>Census_A_ClassificationSettings
</buildSettingName>
</task>
</executeTask>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
SOAP Response:
<SOAP-ENV:Envelope ... >
<SOAP-ENV:Body>
<executeTaskResponse
xmlns=http://www.jsr-73.org/2004/webservices/
xmlns:jdm= http://www.jsr-73.org/2004/JDMSchema>
<executionStatus state=queued timestamp=April 16, 2004 13:21:33/>
</executeTaskResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
The example above highlights a model build task. The following example provides a task
specification for single record apply involving two predictors for a churn classification
model.
SOAP Request:
<SOAP-ENV:Envelope ... >
<SOAP-ENV:Header ... />
<SOAP-ENV:Body>
<executeTask xmlns="http:" www.jsr73.org="2004" http:="www.jsr-73.org"/>
<task xsi:type="RecordApplyTask"
modelName="ChurnClassification32">
<recordValue name="CustomerAge" value="23"/>
<recordValue name="CustomerIncome" value="50000"/>
<recordValue name="CustomerID" value="1003-2203-120"/>
<applySettingsName xsi:type="ClassificationApplySettings">
<sourceDestinationMap sourceAttrName="CustomerID"
destinationAttrName="CustId"/>
<applyMap content="predictedCategory" destPhysAttrName="churn"
rank="1"/>
<applyMap content="probability" destPhysAttrName="churnProb"
rank="1"/>
</applySettingsName>
</task>
</SOAP-ENV:Envelope>
SOAP Response:
<SOAP-ENV:Envelope ... >
<SOAP-ENV:Body>
<executeTaskResponse
xmlns=http://www.jsr-73.org/2004/webservices/
xmlns:jdm= http://www.jsr-73.org/2004/JDMSchema>
<recordValue name="CustID" value="1003-2203-120"/>
<recordValue name="churn" value="1"/>
<recordValue name="churnProb" value=".87"/>
</executeTaskResponse>
</SOAP-ENV:Body>
June 22, 2005
122
Maintenance Release
Version 1.1
</SOAP-ENV:Envelope>
WSDL Types
<complexType name="getExecutionStatus">
<attribute name="taskName" type="xsd:string" use="required"/>
</complexType>
<complexType name="getExecutionStatusResponse">
<sequence>
<element name="status" type="ExecutionStatus"/>
</sequence>
</complexType>
Example
SOAP Request:
<SOAP-ENV:Envelope ... >
<SOAP-ENV:Header ... />
<SOAP-ENV:Body>
<getExecutionStatus xmlns=http://www.jsr-73.org/2004/webservices/
xmlns:jdm= http://www.jsr-73.org/2004/JDMSchema
taskName=myBuildTask>
</getExecutionStatus >
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
SOAP Response:
<SOAP-ENV:Envelope ... >
<SOAP-ENV:Body>
<getExecutionStatusResponse
xmlns=http://www.jsr-73.org/2004/webservices/
xmlns:jdm= http://www.jsr-73.org/2004/JDMSchema>
<executionStatus state=queued timestamp=April 16, 2004 13:21:33/>
</getExecutionStatusResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
123
Maintenance Release
Version 1.1
WSDL Types
<complexType name="terminateTask">
<attribute name="taskName" type="xsd:string" use="required"/>
</complexType>
<complexType name="terminateTaskResponse">
<sequence>
<element name="status" type="ExecutionStatus"/>
</sequence>
</complexType>
Example
SOAP Request:
<SOAP-ENV:Envelope ... >
<SOAP-ENV:Header ... />
<SOAP-ENV:Body>
<terminateTask xmlns=http://www.jsr-73.org/2004/webservices/
xmlns:jdm= http://www.jsr-73.org/2004/JDMSchema
taskName=myBuildTask>
</terminateTask >
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
SOAP Response:
<SOAP-ENV:Envelope ... >
<SOAP-ENV:Body>
<terminateTaskResponse
xmlns=http://www.jsr-73.org/2004/webservices/
xmlns:jdm= http://www.jsr-73.org/2004/JDMSchema>
<executionStatus state=terminating timestamp=April 16, 2004
13:21:33/>
</terminateTaskResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
124
Maintenance Release
Version 1.1
E.4.2. Task
<xsd:complexType name="Task">
<xsd:complexContent>
<xsd:extension base="MiningObject">
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="BuildTask">
<xsd:complexContent>
<xsd:extension base="Task">
<xsd:sequence>
<xsd:choice>
<xsd:element name="buildDataName" type="xsd:string"/>
<xsd:element name="buildData" type="PhysicalDataSet"/>
</xsd:choice>
<xsd:choice>
<xsd:element name="buildSettingsName" type="xsd:string"/>
<xsd:element name="buildSettings" type="BuildSettings"/>
</xsd:choice>
<xsd:choice>
<xsd:element name="validationDataName" type="xsd:string"
minOccurs="0"/>
<xsd:element name="validationData" type="PhysicalDataSet"
minOccurs="0"/>
</xsd:choice>
<xsd:element name="modelDescription" type="xsd:string" minOccurs="0"/>
<xsd:element name="buildDataMap" type="LogicalAttrNameMap"
minOccurs="0" maxOccurs="unbounded"/>
<xsd:element name="validationDataMap" type="AttributeNameMap"
minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
125
Maintenance Release
Version 1.1
126
Maintenance Release
Version 1.1
<xsd:complexType name="RegressionTestTask">
<xsd:complexContent>
<xsd:extension base="TestTask"/>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="ImportTask">
<xsd:complexContent>
<xsd:extension base="Task">
<xsd:sequence>
<xsd:element name="objectName" type="NameMap" maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute name="uri" type="xsd:anyURI" use="required"/>
<xsd:attribute name="includeModelSettings" type="xsd:boolean"
use="optional"/>
<xsd:attribute name="useOriginalCreationDates" type="xsd:boolean"
use="optional"/>
<xsd:attribute name="populateSummary" type="xsd:boolean"
use="optional"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="ImportSummary">
<xsd:sequence>
<xsd:element name="objectName" type="xsd:string" maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute name="objectCount" type="xsd:int" use="required"/>
<xsd:attribute name="creationDate" type="xsd:string" use="required"/>
<xsd:attribute name="format" type="ImportExportFormat" use="required"/>
</xsd:complexType>
<xsd:complexType name="ExportTask">
<xsd:complexContent>
<xsd:extension base="Task">
<xsd:sequence>
<xsd:element name="objectName" type="xsd:string" maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute name="uri" type="xsd:anyURI" use="required"/>
<xsd:attribute name="format" type="xsd:string" use="required"/>
<xsd:attribute name="includeModelSettings" type="xsd:boolean"
use="optional"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:simpleType name="ImportExportFormatStd">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="JDM1_0"/>
<xsd:enumeration value="PMML1_0"/>
<xsd:enumeration value="PMML2_0"/>
<xsd:enumeration value="PMML2_1"/>
<xsd:enumeration value="PMML3_0"/>
<xsd:enumeration value="CWM1_0"/>
<xsd:enumeration value="CWM1_1"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name="SettingsInclusionOption">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="systemDefault"/>
<xsd:enumeration value="none"/>
<xsd:enumeration value="settings"/>
<xsd:enumeration value="effectiveSettings"/>
<xsd:enumeration value="settingsOnly"/>
<xsd:enumeration value="effectiveSettingsOnly"/>
<xsd:enumeration value="allSettingsOnly"/>
127
Maintenance Release
Version 1.1
<xsd:enumeration value="all"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:complexType name="ComputeStatisticsTask">
<xsd:complexContent>
<xsd:extension base="Task">
<xsd:sequence>
<xsd:choice>
<xsd:element name="physicalDataName" type="xsd:string"/>
<xsd:element name="physicalData" type="PhysicalDataSet"/>
</xsd:choice>
<xsd:choice>
<xsd:element name="logicalDataName" type="xsd:string" minOccurs="0"/>
<xsd:element name="logicalData" type="LogicalData" minOccurs="0"/>
</xsd:choice>
</xsd:sequence>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
E.4.3. Task.Apply
<xsd:complexType name="DataSetApplyTask">
<xsd:complexContent>
<xsd:extension base="Task">
<xsd:sequence>
<xsd:choice>
<xsd:element name="applyDataName" type="xsd:string"/>
<xsd:element name="applyData" type="PhysicalDataSet"/>
</xsd:choice>
<xsd:choice>
<xsd:element name="applySettingsName" type="xsd:string"/>
<xsd:element name="applySettings" type="ApplySettings"/>
</xsd:choice>
<xsd:element name="applyDataMap" type="SignatureAttrNameMap"
minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute name="modelName" type="xsd:string" use="required"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="RecordApplyTask">
<xsd:complexContent>
<xsd:extension base="Task">
<xsd:sequence>
<xsd:element name="recordValue" type="RecordElement"
maxOccurs="unbounded"/>
<xsd:choice>
<xsd:element name="applySettingsName" type="xsd:string"/>
<xsd:element name="applySettings" type="ApplySettings"/>
</xsd:choice>
</xsd:sequence>
<xsd:attribute name="modelName" type="xsd:string" use="required"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="RecordElement">
<xsd:sequence>
<xsd:element name="value" type="DataValueType"/>
</xsd:sequence>
<xsd:attribute name="name" type="xsd:string" use="required"/>
</xsd:complexType>
<xsd:complexType name="ApplySettings">
June 22, 2005
128
Maintenance Release
Version 1.1
<xsd:complexContent>
<xsd:extension base="MiningObject">
<xsd:sequence>
<xsd:element name="sourceDestinationMap" type="AttributeNameMap"
minOccurs=0 maxOccurs=unbounded/>
</xsd:sequence>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
E.4.4. Data
<xsd:complexType name="PhysicalDataSet">
<xsd:complexContent>
<xsd:extension base="MiningObject">
<xsd:sequence>
<xsd:element name="uri" type="xsd:anyURI"/>
<xsd:element name="physicalAttribute" type="PhysicalAttribute"
minOccurs="0" maxOccurs="unbounded"/>
<xsd:element name="attributeStatistics" type="AttributeStatisticsSet"
minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="attributeCount" type="xsd:int" use="optional"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="PhysicalDataRecord">
<xsd:sequence>
<xsd:element name="entry" type="PhysicalAttributeValue" minOccurs="0"
maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute name="attributeCount" type="xsd:int" use="optional"/>
</xsd:complexType>
<xsd:complexType name="PhysicalAttributeValue">
<xsd:sequence>
<xsd:element name="attribute" type="PhysicalAttribute"/>
<xsd:element name="value" type="DataValueType"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="DataValueType" abstract="true">
<xsd:sequence>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="DecimalValue">
<xsd:complexContent>
<xsd:extension base="DataValueType">
<xsd:attribute name="decimal" type="xsd:double" use="required"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="StringValue">
<xsd:complexContent>
<xsd:extension base="DataValueType">
<xsd:sequence>
<xsd:element name="string" type="xsd:string"/>
</xsd:sequence>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="PhysicalAttribute">
<xsd:complexContent>
<xsd:extension base="Attribute">
<xsd:attribute name="dataType" type="AttributeDataType"
use="required"/>
<xsd:attribute name="role" type="PhysicalAttributeRole"
June 22, 2005
129
Maintenance Release
Version 1.1
use="optional"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="Attribute">
<xsd:attribute name="name" type="xsd:string" use="required"/>
<xsd:attribute name="description" type="xsd:string" use="optional"/>
</xsd:complexType>
<xsd:simpleType name="AttributeDataTypeStd">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="unknownType"/>
<xsd:enumeration value="stringType"/>
<xsd:enumeration value="doubleType"/>
<xsd:enumeration value="integerType"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name="PhysicalAttributeRoleStd">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="taxonomyParentId"/>
<xsd:enumeration value="taxonomyChildId"/>
<xsd:enumeration value="attributeValue"/>
<xsd:enumeration value="attributeName"/>
<xsd:enumeration value="caseId"/>
<xsd:enumeration value="data"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:complexType name="LogicalData">
<xsd:complexContent>
<xsd:extension base="MiningObject">
<xsd:sequence>
<xsd:element name="logicalAttribute" type="LogicalAttribute"
minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute name="attributeCount" type="xsd:int" use="optional"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="LogicalAttribute">
<xsd:complexContent>
<xsd:extension base="Attribute">
<xsd:sequence>
<xsd:element name="categorySet" type="CategorySet" minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="attributeType" type="AttributeType"
use="optional"/>
<xsd:attribute name="dataPreparationStatus" type="DataPreparationStatus"
use="optional"/>
<xsd:attribute name="isDiscrete" type="xsd:boolean" use="optional"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:simpleType name="AttributeTypeStd">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="notSpecified"/>
<xsd:enumeration value="numerical"/>
<xsd:enumeration value="ordinal"/>
<xsd:enumeration value="categorical"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name="DataPreparationStatusStd">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="prepared"/>
<xsd:enumeration value="unprepared"/>
</xsd:restriction>
130
Maintenance Release
Version 1.1
</xsd:simpleType>
<xsd:complexType name="CategorySet">
<xsd:sequence>
<xsd:element name="categoryValue" type="CategoryValue" minOccurs="0"
maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute name="dataType" type="AttributeDataType" use="required"/
>
<xsd:attribute name="size" type="xsd:int" use="optional"/>
<xsd:attribute name="name" type="xsd:string" use="required"/>
</xsd:complexType>
<xsd:simpleType name="CategoryPropertyStd">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="missing"/>
<xsd:enumeration value="unknown"/>
<xsd:enumeration value="error"/>
<xsd:enumeration value="valid"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:complexType name="CategoryValue">
<xsd:sequence>
<xsd:element name="categoryValue" type="DataValueType"/>
</xsd:sequence>
<xsd:attribute name="name" type="xsd:string" use="optional"/>
<xsd:attribute name="index" type="xsd:integer" use="optional"/>
<xsd:attribute name="property" type="CategoryProperty" use="optional"/>
</xsd:complexType>
<xsd:complexType name="Interval">
<xsd:sequence>
<xsd:element name="startPoint" type="xsd:double"/>
<xsd:element name="endPoint" type="xsd:double"/>
<xsd:element name="intervalClosure" type="IntervalClosure"/>
</xsd:sequence>
</xsd:complexType>
<xsd:simpleType name="IntervalClosure">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="openOpen"/>
<xsd:enumeration value="openClosed"/>
<xsd:enumeration value="closedOpen"/>
<xsd:enumeration value="closedClosed"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:complexType name="Taxonomy">
<xsd:complexContent>
<xsd:extension base="MiningObject">
<xsd:sequence>
<xsd:element name="name" type="xsd:string"/>
<xsd:choice>
<xsd:choice>
<xsd:element name="dataReference" type="PhysicalDataSet"/>
<xsd:element name="dataReferenceName" type="xsd:string"/>
</xsd:choice>
<xsd:element name="elements" type="TaxonomyElement" minOccurs="0"
maxOccurs="unbounded"/>
</xsd:choice>
</xsd:sequence>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="TaxonomyElement">
<xsd:sequence>
<xsd:element name="parent" type="DataValueType"/>
<xsd:element name="child" type="DataValueType" maxOccurs="unbounded"/
>
131
Maintenance Release
Version 1.1
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="ModelSignature">
<xsd:sequence>
<xsd:element name="attribute" type="SignatureAttribute" minOccurs="0"
maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="SignatureAttribute">
<xsd:complexContent>
<xsd:extension base="Attribute">
<xsd:attribute name="attributeType" type="AttributeType"
use="required"/>
<xsd:attribute name="dataType" type="AttributeDataType"
use="required"/>
<xsd:attribute name="rank" type="xsd:int" use="optional"/>
<xsd:attribute name="importanceValue" type="xsd:double"
use="optional"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="CategoryMatrixElement">
<xsd:sequence>
<xsd:element name="predictedCategory" type="DataValueType"/>
<xsd:element name="actualCategory" type="DataValueType"/>
</xsd:sequence>
<xsd:attribute name="value" type="xsd:double" use="required"/>
</xsd:complexType>
E.4.5. Supervised
<xsd:complexType name="SupervisedSettings" abstract="true">
<xsd:complexContent>
<xsd:extension base="BuildSettings">
<xsd:attribute name="targetAttributeName" type="xsd:string"
use="required"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="TestMetrics">
<xsd:complexContent>
<xsd:extension base="MiningObject">
<xsd:sequence>
<xsd:choice>
<xsd:element name="testDataName" type="xsd:string"/>
<xsd:element name="testData" type="PhysicalDataSet"/>
</xsd:choice>
</xsd:sequence>
<xsd:attribute name="taskIdentifier" type="xsd:string"
use="optional"/>
<xsd:attribute name="modelName" type="xsd:string" use="required"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="SupervisedAlgorithmSettings">
<xsd:complexContent>
<xsd:extension base="AlgorithmSettings"/>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="SupervisedModel" abstract="true">
<xsd:complexContent>
<xsd:extension base="Model">
<xsd:attribute name="targetAttributeName" type="xsd:string"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
June 22, 2005
132
Maintenance Release
Version 1.1
E.4.6. Supervised.Classification
<xsd:complexType name="ClassificationTestMetrics">
<xsd:complexContent>
<xsd:extension base="TestMetrics">
<xsd:sequence>
<xsd:element name="confusionMatrix" type="ConfusionMatrix"
minOccurs="0"/>
<xsd:element name="lift" type="Lift" minOccurs="0"/>
<xsd:element name="ROC" type="ReceiverOperatingCharacteristics"
minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="accuracy" type="xsd:double" use="optional"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:simpleType name="ClassificationTestMetricOptionStd">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="confusionMatrix"/>
<xsd:enumeration value="lift"/>
<xsd:enumeration value="receiverOperatingCharacteristics"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:complexType name="ConfusionMatrix">
<xsd:sequence>
<xsd:element name="category" type="DataValueType" minOccurs="2"
maxOccurs="unbounded"/>
<xsd:element name="countElement" type="CategoryMatrixElement"
minOccurs="4" maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute name="accuracy" type="xsd:decimal" use="optional"/>
<xsd:attribute name="error" type="xsd:decimal" use="optional"/>
<xsd:attribute name="numberOfPredictions" type="xsd:int"
use="optional"/>
</xsd:complexType>
<xsd:complexType name="CostMatrix">
<xsd:complexContent>
<xsd:extension base="MiningObject">
<xsd:sequence>
<xsd:element name="category" type="DataValueType" minOccurs="2"
maxOccurs="unbounded"/>
<xsd:element name="costElement" type="CategoryMatrixElement"
minOccurs="1" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="ReceiverOperatingCharacteristics">
<xsd:sequence>
<xsd:element name="elements" type="ROCElement" maxOccurs="unbounded"/
>
</xsd:sequence>
<xsd:attribute name="numberOfThresholdCandidates" type="xsd:int"/>
</xsd:complexType>
<xsd:complexType name="ROCElement">
<xsd:attribute name="index" type="xsd:int"/>
<xsd:attribute name="probabilityThreshold" type="xsd:double"/>
<xsd:attribute name="hitRate" type="xsd:double"/>
<xsd:attribute name="falseAlarmRate" type="xsd:double"/>
<xsd:attribute name="truePositiveCount" type="xsd:int"/>
<xsd:attribute name="trueNegativeCount" type="xsd:int"/>
<xsd:attribute name="falsePositiveCount" type="xsd:int"/>
<xsd:attribute name="falsePositiveDount" type="xsd:int"/>
</xsd:complexType>
<xsd:complexType name="Lift">
<xsd:sequence>
133
Maintenance Release
Version 1.1
134
Maintenance Release
Version 1.1
<xsd:enumeration value="probability"/>
<xsd:enumeration value="cost"/>
<xsd:enumeration value="nodeId"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:complexType name="ClassificationSettings">
<xsd:complexContent>
<xsd:extension base="SupervisedSettings">
<xsd:sequence>
<xsd:choice>
<xsd:element name="costMatrixName" type="xsd:string" minOccurs="0"/>
<xsd:element name="costMatrix" type="CostMatrix" minOccurs="0"/>
</xsd:choice>
<xsd:element name="priorProbabilities" type="PriorProbabilities"
minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="usePriors" type="xsd:boolean" default="false"
use="optional"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="PriorProbabilities">
<xsd:sequence>
<xsd:element name="entry" type="PriorsEntry" maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute name="attributeName" type="xsd:string"
use="required"/>
</xsd:complexType>
<xsd:complexType name="PriorsEntry">
<xsd:sequence>
<xsd:element name="attributeValue" type="DataValueType"/>
<xsd:element name="priorProbability" type="xsd:double"/>
</xsd:sequence>
</xsd:complexType>
E.4.7. Supervised.Regression
<xsd:complexType name="RegressionSettings">
<xsd:complexContent>
<xsd:extension base="SupervisedSettings"/>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="RegressionTestMetrics">
<xsd:complexContent>
<xsd:extension base="TestMetrics">
<xsd:attribute name="meanPredictedValue" type="xsd:decimal"
use="optional"/>
<xsd:attribute name="meanActualValue" type="xsd:decimal"
use="optional"/>
<xsd:attribute name="meanAbsoluteError" type="xsd:decimal"
use="optional"/>
<xsd:attribute name="rmsError" type="xsd:decimal" use="optional"/>
<xsd:attribute name="rSquared" type="xsd:decimal" use="optional"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="RegressionApplySettings">
<xsd:complexContent>
<xsd:extension base="ApplySettings">
<xsd:sequence>
<xsd:element name="applyMap" type="RegressionApplyMap"
maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:extension>
June 22, 2005
135
Maintenance Release
Version 1.1
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="RegressionApplyMap">
<xsd:attribute name="content" type="RegressionApplyContent"
use="required"/>
<xsd:attribute name="destPhysAttrName" type="xsd:string"
use="required"/>
</xsd:complexType>
<xsd:simpleType name="RegressionApplyContentStd">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="predictedValue"/>
<xsd:enumeration value="confidence"/>
</xsd:restriction>
</xsd:simpleType>
E.4.8. Clustering
<xsd:complexType name="ClusteringApplySettings">
<xsd:complexContent>
<xsd:extension base="ApplySettings">
<xsd:sequence>
<xsd:element name="rankMap" type="ClusteringApplyMap" minOccurs="0"
maxOccurs="unbounded"/>
<xsd:element name="clusterIdentifierMap" type="ClusterIdentifierMap"
minOccurs="0" maxOccurs="unbounded"/>
<xsd:element name="ClusterMap" type="ClusteringApplyMap" minOccurs="0"
maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="ClusteringApplyMap">
<xsd:sequence>
<xsd:element name="destPhysicalAttrName" type="xsd:string"
maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute name="content" type="ClusteringApplyContent"
use="required"/>
<xsd:attribute name="fromTop" type="xsd:boolean" use="required"/>
</xsd:complexType>
<xsd:complexType name="ClusterIdentifierMap">
<xsd:attribute name="clusterID" type="xsd:int" use="required"/>
<xsd:attribute name="content" type="ClusteringApplyContent"
use="required"/>
<xsd:attribute name="destPhysicalAttrName" type="xsd:string"
use="required"/>
</xsd:complexType>
<xsd:complexType name="ClusterMap">
<xsd:attribute name="content" type="ClusteringApplyContent"
use="required"/>
<xsd:attribute name="baseDestPhysicalAttrName" type="xsd:string"
use="required"/>
</xsd:complexType>
<xsd:simpleType name="ClusteringApplyContentStd">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="clusterIdentifier"/>
<xsd:enumeration value="probability"/>
<xsd:enumeration value="qualityOfFit"/>
<xsd:enumeration value="distance"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:complexType name="ClusteringSettings">
June 22, 2005
136
Maintenance Release
Version 1.1
<xsd:complexContent>
<xsd:extension base="BuildSettings">
<xsd:sequence>
<xsd:element name="aggregationFunction" type="AggregationFunction"
minOccurs="0"/>
<xsd:element name="maxClusterCaseCount" type="xsd:int" minOccurs="0"/>
<xsd:element name="maxLevels" type="xsd:int" minOccurs="0"/>
<xsd:element name="maxNumberOfClusters" type="xsd:int" minOccurs="0"/>
<xsd:element name="minClusterCaseCount" type="xsd:int" minOccurs="0"/>
<xsd:sequence minOccurs="0" maxOccurs="unbounded">
<xsd:element name="attrCompLogicalAttr" type="xsd:string"/>
<xsd:element name="attributeComparisonFunction"
type="AttributeComparisonFunction"/>
</xsd:sequence>
<xsd:sequence minOccurs="0" maxOccurs="unbounded">
<xsd:element name="similarityMatrixLogicalAttr"
type="xsd:string"/>
<xsd:element name="similarityMatrix" type="SimilarityMatrix"/>
</xsd:sequence>
</xsd:sequence>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:simpleType name="AggregationFunctionStd">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="binarySimilarity"/>
<xsd:enumeration value="tanimoto"/>
<xsd:enumeration value="jaccard"/>
<xsd:enumeration value="simpleMatching"/>
<xsd:enumeration value="minkowski"/>
<xsd:enumeration value="cityBlock"/>
<xsd:enumeration value="chebychev"/>
<xsd:enumeration value="squaredEuclidean"/>
<xsd:enumeration value="euclidean"/>
<xsd:enumeration value="systemDetermined"/>
<xsd:enumeration value="systemDefault"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name="AttributeComparisonFunctionStd">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="similarityMatrix"/>
<xsd:enumeration value="equal"/>
<xsd:enumeration value="delta"/>
<xsd:enumeration value="gaussSim"/>
<xsd:enumeration value="absDiff"/>
<xsd:enumeration value="systemDetermined"/>
<xsd:enumeration value="systemDefault"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:complexType name="SimilarityMatrix">
<xsd:sequence>
<xsd:element name="category" type="DataValueType" minOccurs="2"
maxOccurs="unbounded"/>
<xsd:element name="similarityElement" type="CategoryMatrixElement"
minOccurs="1" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="ClusteringSignatureAttribute">
<xsd:complexContent>
<xsd:extension base="SignatureAttribute">
<xsd:sequence>
<xsd:element name="attributeComparisonFunction"
137
Maintenance Release
Version 1.1
type="AttributeComparisonFunction" minOccurs="0"/>
<xsd:element name="similarityMatrix" type="SimilarityMatrix"
minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="similarityScale" type="xsd:double"
use="optional"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
E.4.9. Association
<xsd:complexType name="AssociationSettings">
<xsd:complexContent>
<xsd:extension base="BuildSettings">
<xsd:sequence>
<xsd:element name="includedItem" type="DataValueType"
minOccurs="0" maxOccurs="unbounded"/>
<xsd:element name="excludedItem" type="DataValueType"
minOccurs="0" maxOccurs="unbounded"/>
<xsd:element name="attributeTaxonomy" type="AttributeTaxonomy"
minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute name="maxNumberOfRules" type="xsd:int"
use="optional"/>
<xsd:attribute name="maxRuleLength" type="xsd:int" use="optional"/>
<xsd:attribute name="maxAntecedentComponentLength" type="xsd:int"
use="optional"/>
<xsd:attribute name="maxConsequentComponentLength" type="xsd:int"
use="optional"/>
<xsd:attribute name="minConfidence" type="xsd:double"
use="optional"/>
<xsd:attribute name="minSupport" type="xsd:double" use="optional"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="AttributeTaxonomy">
<xsd:attribute name="attributeName" type="xsd:string"/>
<xsd:attribute name="taxonomyName" type="xsd:string"/>
</xsd:complexType>
E.4.10. AttributeImportance
<xsd:complexType name="AttributeImportanceSettings">
<xsd:complexContent>
<xsd:extension base="BuildSettings">
<xsd:attribute name="maxAttributeCount" type="xsd:int"
use="optional"/>
<xsd:attribute name="targetAttributeName" type="xsd:string"
use="optional"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="AttributeImportanceModel">
<xsd:complexContent>
<xsd:extension base="Model">
<xsd:sequence>
<xsd:element name="attribute" type="AttributeImportance"
maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="AttributeImportance">
138
Maintenance Release
Version 1.1
E.4.11. Statistics
<xsd:complexType name="AttributeStatisticsSet">
<xsd:sequence>
<xsd:element name="attrStatistics" type="UnivariateStatistics"
minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute name="statisticsTimestamp" type="xsd:time"
use="optional"/>
<xsd:attribute name="numberOfCases" type="xsd:int" use="optional"/>
</xsd:complexType>
<xsd:complexType name="UnivariateStatistics">
<xsd:sequence>
<xsd:element name="continuousStatistics" type="ContinuousStatistics"
minOccurs="0"/>
<xsd:element name="discrete
" type="DiscreteStatistics"
minOccurs="0"/>
<xsd:element name="numericalStatistics" type="NumericalStatistics"
minOccurs="0"/>
<xsd:element name="frequencies" type="xsd:int" minOccurs="0"
maxOccurs="unbounded"/>
<xsd:element name="probabilities" type="xsd:double" minOccurs="0"
maxOccurs="unbounded"/>
<xsd:element name="values" type="DataValueType" minOccurs="0"
maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute name="attributeName" type="xsd:string"/>
</xsd:complexType>
<xsd:complexType name="ContinuousStatistics">
<xsd:sequence>
<xsd:element name="intervals" type="Interval" minOccurs="0"
maxOccurs="unbounded"/>
<xsd:element name="frequencies" type="xsd:int" minOccurs="0"
maxOccurs="unbounded"/>
<xsd:element name="sum" type="xsd:double" minOccurs="0"
maxOccurs="unbounded"/>
<xsd:element name="sumOfSquares" type="xsd:double" minOccurs="0"
maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute name="numberOfIntervals" type="xsd:int"/>
</xsd:complexType>
<xsd:complexType name="DiscreteStatistics">
<xsd:sequence>
<xsd:element name="modalValue" type="DataValueType" minOccurs="0"/>
<xsd:element name="discreteValues" type="DataValueType" minOccurs="0"
maxOccurs="unbounded"/>
<xsd:element name="frequencies" type="xsd:int" minOccurs="0"
maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="NumericalStatistics">
<xsd:sequence>
<xsd:element name="minimumValue" type="xsd:double" minOccurs="0"/>
<xsd:element name="maximumValue" type="xsd:double" minOccurs="0"/>
<xsd:element name="meanValue" type="xsd:double" minOccurs="0"/>
<xsd:element name="medianValue" type="xsd:double" minOccurs="0"/>
<xsd:element name="variance" type="xsd:double" minOccurs="0"/>
<xsd:element name="standardDeviation" type="xsd:double" minOcJune 22, 2005
139
Maintenance Release
Version 1.1
curs="0"/>
<xsd:element name="quantile" type="xsd:double" minOccurs="0"/>
<xsd:element name="quantileLimits" type="xsd:double" minOccurs="0"
maxOccurs="unbounded"/>
<xsd:element name="interQuartileRange" type="xsd:double" minOccurs="0"/>
</xsd:sequence>
</xsd:complexType>
E.4.12. Algorithm
<xsd:complexType name="NaiveBayesSettings">
<xsd:complexContent>
<xsd:extension base="SupervisedAlgorithmSettings">
<xsd:attribute name="pairwiseThreshold" type="xsd:double"
use="optional"/>
<xsd:attribute name="singletonThreshold" type="xsd:double"
use="optional"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="SVMClassificationSettings">
<xsd:complexContent>
<xsd:extension base="SupervisedAlgorithmSettings">
<xsd:attribute name="cStrategy" type="xsd:double" use="optional"/>
<xsd:attribute name="complexityFactor" type="xsd:double"
use="optional"/>
<xsd:attribute name="kernelCacheSize" type="xsd:int" use="optional"/
>
<xsd:attribute name="kernelFunction" type="KernelFunction"
use="optional"/>
<xsd:attribute name="polynomialDegree" type="xsd:int"
use="optional"/>
<xsd:attribute name="standardDeviation" type="xsd:double"
use="optional"/>
<xsd:attribute name="tolerance" type="xsd:double" use="optional"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="SVMRegressionSettings">
<xsd:complexContent>
<xsd:extension base="SupervisedAlgorithmSettings">
<xsd:attribute name="cStrategy" type="xsd:double" use="optional"/>
<xsd:attribute name="complexityFactor" type="xsd:double"
use="optional"/>
<xsd:attribute name="epsilon" type="xsd:double" use="optional"/>
<xsd:attribute name="kernelCacheSize" type="xsd:int" use="optional"/
>
<xsd:attribute name="kernelFunction" type="KernelFunction"
use="optional"/>
<xsd:attribute name="polynomialDegree" type="xsd:int"
use="optional"/>
<xsd:attribute name="standardDeviation" type="xsd:double"
use="optional"/>
<xsd:attribute name="tolerance" type="xsd:double" use="optional"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:simpleType name="KernelFunctionStd">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="sigmoid"/>
<xsd:enumeration value="hypertangent"/>
<xsd:enumeration value="polynomial"/>
<xsd:enumeration value="kGaussian"/>
<xsd:enumeration value="kLinear"/>
June 22, 2005
140
Maintenance Release
Version 1.1
<xsd:enumeration value="systemDetermined"/>
<xsd:enumeration value="systemDefault"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:complexType name="TreeSettings">
<xsd:complexContent>
<xsd:extension base="SupervisedAlgorithmSettings">
<xsd:attribute name="buildHomogeneityMetric"
type="TreeHomogeneityMetric" use="optional"/>
<xsd:attribute name="computeNodeStatistics" type="xsd:boolean"
use="optional"/>
<xsd:attribute name="determineMaxDepth" type="xsd:boolean"
use="optional"/>
<xsd:attribute name="maxDepth" type="xsd:int" use="optional"/>
<xsd:attribute name="maxSplits" type="xsd:int" use="optional"/>
<xsd:attribute name="maxSurrogates" type="xsd:int" use="optional"/>
<xsd:attribute name="maximumPValue" type="xsd:double"
use="optional"/>
<xsd:attribute name="minDecreaseInImpurity" type="xsd:double"
use="optional"/>
<xsd:attribute name="minNodeSize" type="xsd:double" use="optional"/>
<xsd:attribute name="minNodeSizeUnit" type="SizeUnit"
use="optional"/>
<xsd:attribute name="pruningHomogeneityMetric"
type="TreeHomogeneityMetric" use="optional"/>
<xsd:attribute name="treeSelectionMethod" type="TreeSelectionMethod"
use="optional"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:simpleType name="TreeHomogeneityMetricStd">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="misclassificationRatio"/>
<xsd:enumeration value="entropy"/>
<xsd:enumeration value="gini"/>
<xsd:enumeration value="meanAbsoluteDeviation"/>
<xsd:enumeration value="meanSquaredError"/>
<xsd:enumeration value="systemDefault"/>
<xsd:enumeration value="systemDetermined"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name="TreeSelectionMethodStd">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="oneStandardErrorTree"/>
<xsd:enumeration value="minimumErrorTree"/>
<xsd:enumeration value="systemDefault"/>
<xsd:enumeration value="systemDetermined"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:complexType name="FeedForwardNeuralNetSettings">
<xsd:complexContent>
<xsd:extension base="SupervisedAlgorithmSettings">
<xsd:sequence>
<xsd:element name="neuralLayers" type="NeuralLayer" minOccurs="0"
maxOccurs="unbounded"/>
<xsd:element name="learningAlgorithm" type="LearningAlgorithm"
minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="determineNumberOfNodesPerLayer" type="xsd:boolean"
use="optional"/>
<xsd:attribute name="maxNumberOfIterations" type="xsd:int"
use="optional"/>
<xsd:attribute name="minErrorTolerance" type="xsd:double"
use="optional"/>
141
Maintenance Release
Version 1.1
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="LearningAlgorithm">
</xsd:complexType>
<xsd:complexType name="Backpropagation">
<xsd:complexContent>
<xsd:extension base="LearningAlgorithm">
<xsd:attribute name="learningRate" type="xsd:double"/>
<xsd:attribute name="momentum" type="xsd:double"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="NeuralLayer">
<xsd:attribute name="activationFunction" type="ActivationFunction"/>
<xsd:attribute name="numberOfNodes" type="xsd:decimal"/>
</xsd:complexType>
<xsd:simpleType name="ActivationFunctionStd">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="softMax"/>
<xsd:enumeration value="symmetricSign"/>
<xsd:enumeration value="sign"/>
<xsd:enumeration value="hyperbolicTangent"/>
<xsd:enumeration value="logistic"/>
<xsd:enumeration value="linearIdentity"/>
<xsd:enumeration value="systemDefault"/>
<xsd:enumeration value="systemDetermined"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:complexType name="AssociationRulesAlgorithmSettings">
<xsd:complexContent>
<xsd:extension base="AlgorithmSettings"/>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="AttributeImportanceAlgorithmSettings">
<xsd:complexContent>
<xsd:extension base="AlgorithmSettings"/>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="ClusteringAlgorithmSettings">
<xsd:complexContent>
<xsd:extension base="AlgorithmSettings"/>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="KMeansSettings">
<xsd:complexContent>
<xsd:extension base="ClusteringAlgorithmSettings">
<xsd:attribute name="distanceFunction" type="ClusteringDistanceFunction"
use="optional"/>
<xsd:attribute name="maxNumberOfIterations" type="xsd:int"
use="optional"/>
<xsd:attribute name="minErrorTolerance" type="xsd:double"
use="optional"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:simpleType name="ClusteringDistanceFunctionStd">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="euclidean"/>
<xsd:enumeration value="systemDefault"/>
<xsd:enumeration value="systemDetermined"/>
</xsd:restriction>
</xsd:simpleType>
142
Maintenance Release
Version 1.1
E.4.13. Base
<xsd:complexType name="BuildSettings" abstract="true">
<xsd:complexContent>
<xsd:extension base="MiningObject">
<xsd:sequence>
<xsd:element name="algorithmSettings" type="AlgorithmSettings"
minOccurs="0"/>
<xsd:element name="weightAttribute" type="xsd:string"/>
<xsd:element name="buildAttribute" type="BuildAttribute" minOccurs="0"
maxOccurs="unbounded"/>
<xsd:choice>
<xsd:element name="logicalData" type="LogicalData" minOccurs="0"/>
<xsd:element name="logicalDataName" type="xsd:string" minOccurs="0"/>
</xsd:choice>
</xsd:sequence>
<xsd:attribute name="miningFunction" type="MiningFunction"
use="required"/>
<xsd:attribute name="desiredExecutionTimeInMinutes" type="xsd:int"
use="optional"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="BuildAttribute">
<xsd:attribute name="attributeName" type="xsd:string"/>
<xsd:attribute name="usage" type="LogicalAttributeUsage"
use="optional"/>
<xsd:attribute name="outlierTreatment" type="OutlierTreatment"
use="optional"/>
<xsd:attribute name="weight" type="xsd:double" use="optional"/>
</xsd:complexType>
<xsd:complexType name="Model">
<xsd:complexContent>
<xsd:extension base="MiningObject">
<xsd:sequence>
<xsd:element name="signature" type="ModelSignature" minOccurs="0"/
>
<xsd:choice>
<xsd:element name="buildSettingsName" type="xsd:string"
minOccurs="0"/>
<xsd:element name="buildSettings" type="BuildSettings" minOccurs="0"/>
</xsd:choice>
<xsd:element name="effectiveBuildSettings" type="BuildSettings"
minOccurs="0"/>
<xsd:element name="attributeStatistics" type="AttributeStatisticsSet"
minOccurs="0"/>
<xsd:element name="modelDetail" type="ModelDetail" minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="uniqueIdentifier" type="xsd:string"
use="optional"/>
<xsd:attribute name="version" type="xsd:string" use="optional"/>
<xsd:attribute name="majorVersion" type="xsd:string" use="optional"/
>
<xsd:attribute name="minorVersion" type="xsd:string" use="optional"/
>
<xsd:attribute name="providerName" type="xsd:string" use="optional"/
>
<xsd:attribute name="providerVersion" type="xsd:string"
use="optional"/>
<xsd:attribute name="applicationName" type="xsd:string"
use="optional"/>
143
Maintenance Release
Version 1.1
144
Maintenance Release
Version 1.1
<xsd:simpleType name="MiningFunctionStd">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="classification"/>
<xsd:enumeration value="clustering"/>
<xsd:enumeration value="regression"/>
<xsd:enumeration value="attributeImportance"/>
<xsd:enumeration value="association"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name="MiningAlgorithmStd">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="svmClassification"/>
<xsd:enumeration value="svmRegression"/>
<xsd:enumeration value="decisionTree"/>
<xsd:enumeration value="naiveBayes"/>
<xsd:enumeration value="kMeans"/>
<xsd:enumeration value="feedForwardNeuralNet"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name="SizeUnit">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="percentage"/>
<xsd:enumeration value="count"/>
</xsd:restriction>
</xsd:simpleType>
E.4.14. Root
<xsd:complexType name="JDMException">
<xsd:sequence>
</xsd:sequence>
<xsd:attribute name="errorcode" type="xsd:int" use="required"/>
<xsd:attribute name="message" type="xsd:string" use="optional"/>
<xsd:attribute name="vendorErrorcode" type="xsd:int" use="optional"/>
<xsd:attribute name="vendorMessage" type="xsd:string" use="optional"/>
</xsd:complexType>
<xsd:complexType name="VerificationReport">
<xsd:sequence>
<xsd:element name="reportText" type="xsd:string"/>
</xsd:sequence>
<xsd:attribute name="reportType" type="ReportType" use="required"/>
</xsd:complexType>
<xsd:simpleType name="ReportType">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="error"/>
<xsd:enumeration value="warning"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name="SortOrder">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="ascending"/>
<xsd:enumeration value="descending"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:complexType name="ExecutionStatus">
<xsd:sequence>
<xsd:element name="description" type="xsd:string" minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="state" type="ExecutionState" use="required"/>
<xsd:attribute name="timestamp" type="xsd:string" use="required"/>
<xsd:attribute name="containsWarning" type="xsd:boolean"
use="optional"/>
</xsd:complexType>
<xsd:simpleType name="ExecutionState">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="submitted"/>
June 22, 2005
145
Maintenance Release
Version 1.1
<xsd:enumeration value="executing"/>
<xsd:enumeration value="success"/>
<xsd:enumeration value="error"/>
<xsd:enumeration value="terminating"/>
<xsd:enumeration value="terminated"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name="MiningTaskStd">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="buildTask"/>
<xsd:enumeration value="testTask"/>
<xsd:enumeration value="applyTask"/>
<xsd:enumeration value="computeStatisticsTask"/>
<xsd:enumeration value="exportTask"/>
<xsd:enumeration value="importTask"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:complexType name="ConnectionSpec">
<xsd:sequence>
<xsd:element name="userName" type="xsd:string"/>
<xsd:element name="password" type="xsd:string"/>
<xsd:element name="uri" type="xsd:anyURI"/>
<xsd:element name="locale" type="Locale"/>
</xsd:sequence>
</xsd:complexType>
146
Maintenance Release
Version 1.1
<xsd:union memberTypes="RegressionApplyContentStd
EnumerationExtension"/>
</xsd:simpleType>
<xsd:simpleType name="ClusteringApplyContent">
<xsd:union memberTypes="ClusteringApplyContentStd
EnumerationExtension"/>
</xsd:simpleType>
<xsd:simpleType name="AggregationFunction">
<xsd:union memberTypes="AggregationFunctionStd
EnumerationExtension"/>
</xsd:simpleType>
<xsd:simpleType name="AttributeComparisonFunction">
<xsd:union memberTypes="AttributeComparisonFunctionStd
EnumerationExtension"/>
</xsd:simpleType>
<xsd:simpleType name="KernelFunction">
<xsd:union memberTypes="KernelFunctionStd
EnumerationExtension"/>
</xsd:simpleType>
<xsd:simpleType name="TreeHomogeneityMetric">
<xsd:union memberTypes="TreeHomogeneityMetricStd
EnumerationExtension"/>
</xsd:simpleType>
<xsd:simpleType name="TreeSelectionMethod">
<xsd:union memberTypes="TreeSelectionMethodStd
EnumerationExtension"/>
</xsd:simpleType>
<xsd:simpleType name="ActivationFunction">
<xsd:union memberTypes="ActivationFunctionStd
EnumerationExtension"/>
</xsd:simpleType>
<xsd:simpleType name="ClusteringDistanceFunction">
<xsd:union memberTypes="ClusteringDistanceFunctionStd
EnumerationExtension"/>
</xsd:simpleType>
<xsd:simpleType name="LogicalAttributeUsage">
<xsd:union memberTypes="LogicalAttributeUsageStd
EnumerationExtension"/>
</xsd:simpleType>
<xsd:simpleType name="OutlierTreatment">
<xsd:union memberTypes="OutlierTreatmentStd
EnumerationExtension"/>
</xsd:simpleType>
<xsd:simpleType name="MiningFunction">
<xsd:union memberTypes="MiningFunctionStd
EnumerationExtension"/>
</xsd:simpleType>
<xsd:simpleType name="MiningAlgorithm">
<xsd:union memberTypes="MiningAlgorithmStd
EnumerationExtension"/>
</xsd:simpleType>
<xsd:simpleType name="MiningTask">
<xsd:union memberTypes="MiningTaskStd
EnumerationExtension"/>
</xsd:simpleType>
<xsd:simpleType name="ObjectContentType">
<xsd:union memberTypes="ObjectContentTypeStd
EnumerationExtension"/>
</xsd:simpleType>
147
Maintenance Release
Version 1.1
Appendix F. References
[Alur2001]
Deepak Alur, John Crupi, and Dan Malks, Core J2EE Patterns: Best Practices
and Design Strategies, Prentice Hall, 2001.
[BL1997]
Michael Berry and Gordon Linoff, Data Mining Techniques : For Marketing,
Sales, and Customer Support, 1997.
[CWM]
http://www.omg.org/technology/cwm
[CWM-DM]
http://cgi.omg.org/docs/ad/01-02-01.pdf
[Java-URI]
[JSR16]
http://jcp.org/jsr/detail/16.jsp
[JSR40]
http://jcp.org/jsr/detail/40.jsp
[Mitchell1997]
[PMML]
http://www.dmg.org
[Sharma2001]
Rahul Sharma, Beth Stearns, Tony Ng, J2EE Connector Architecture and Enterprise Application Integration, Addison Wesley, 2001.
[SQL/MM-DM] http://www.sql-99.org/SC32/WG4/Progression_Documents/
Informal_working_drafts/wd-datamining-2000-07.pdf
[SUN-Blueprints1]http://java.sun.com/blueprints/guidelines/
designing_enterprise_applications_2e/deployment/deployment4.html
[SUN-Blueprints2]http://java.sun.com/blueprints/guidelines/
designing_enterprise_applications_2e/web-tier/web-tier5.html
[URI]
http://dev.w3.org/cvsweb/~checkout~/2002/ws/arch/glossary/wsa-glossary.html
[WS-I]
http://www.ws-i.org/
148