You are on page 1of 5

ODERUDWRU\

\


H[HUFLVH
H


SPSS made this software available free of charge for


that lecture.

How to use the data mining system


Clementine
Objectives
• familiarise with the Clementine software package
• learn how to use Clementine for data pre-processing as well as training and testing neural nets
• compare Clementine's data mining system approach with SNNS

You will learn how to use Clementine in a 'learning by doing' approach. We will use the loan approval data file
discussed during the lecture1. The same file has been used for learning how to work with DataSculptor (data pre-
processing) and SNNS (develop neural nets)2. Now, you will develop the same application but this time within a
self-contained software environment which allows for comparing different tools available.
This handout is based on Clementine documentation. You will be referred to the respective chapters for further
reading and more details. You may also want to refer to the online help system available in Clementine.

Preparation of the working environment

Prepare your own working environment by copying files from

see lab exercise or web page for details

into some subdirectory of your home directory.


Also, you might want to change the preferred language used in Clementine's GUI. Call Control Panel from
the Win9x start menu to do that.

Starting the software

Start Clementine from the


program-menu or
alternatively from a MS-
Menu Bar
DOS window3.
Do not forget that Stream pane
Clementine requires a used to describe data processing based
running X-server so start any on a visual programming approach
server (e.g. X-Win) before
starting Clementine. You
should see the Clementine
user interface.
For details see page 25
[Guide]

Palettes
representing processing
operations

1
see lecture notes Project Life Cycle for ANN Applications
2
see laboratory assignment 1
How to use SNNS

Learn how to use the Graphical User Interface (GUI)

Before you attempt to develop a solution to the loan approval problem you should spend some time to familiarise
with the Clementine data mining system. A good starting point is a tutorial example ([Guide], p. 51-77) . A copy
will be available during the laboratory exercises.

Set Up a Working Environment

In Clementine all files and settings comprising a


project can be stored and organised in so called
project files. Set up a project before you start to
develop nodes dealing with particular processing
steps.
Click FILE NEW PROJECT to call a dialog
box used to create a new project.

Accessing Data Files

As a first step you need to read


in all records available from a
raw data file.
Clementine provides source
nodes for that purpose
☞ page 2ff [Ref] for details

To read records from the loan


data file use Fixed Files from
Sources
You will find details describing
the data file (to be found in the
DataSculptor Tutorial, page 3-8
and 3-9) useful.

Once you have managed to read


in the data file you can connect
the input node to successive
nodes for pre-processing.

3
DOS-command% kclem +clementine_current

2
How to use SNNS

Data Pre-Processing
In order to compare both approaches (DataSculpor + SNNS vs. Clementine) pre-processing of raw data is done
along the ideas suggested in the DataSculptor Tutorial4.
To save you time a stream for pre-processing the loan data file is available for this lab exercise. Copy all files
necessary into your project.

see lab exercise or web page for details

Have a look at the various nodes and compare them with objects used in DataSculptor to achieve the same
results. To inspect a node right-click it and select Edit… . Selecting Annotate… displays hints regarding
settings used within the node. You may add comments of your own.
Executing a node (right-click and select Execute… ) will start processing all nodes from the beginning
downstream up to the selected node. You may try that by executing the check samples node.

Stream for pre-processing the raw data file.


Star-shaped nodes represent a collection of
single nodes contributing to the encoding of a
given field, e.g. occupation.
Right-clicking and selecting Zoom in …
gives access to the details of a super node

Setting up and Training of Neural Nets


Clementine provides various so called modelling nodes that may be utilised to model data . Neural Networks
(Backprop, Radial Basis Functions, Kohonen nets) are only one of several options available in Clementine,
regression analysis and k-means clustering models being examples of other methods included in the program.

We will only use supervised neural networks to model the loan approval data set. Therefore, a Train Net node
has been added to the pre-processing stream.

4
just as a reminder: You have got a copy of that tutorial during a previous lab exercise.

3
How to use SNNS

Right-clicking this node and selecting Edit… gives access to a menu used to set up various network parameters
(layout, learning rate, control over training process, etc.).
For details refer to [Guide] p. 162f, [Ref] p. 65f

Note the differences to SNNS which provides a quite different approach to setting up and training neural nets. In
Clementine, focus is on modelling whereas in SNNS architectures and learning algorithms can be controlled in
more detail.

Select an appropriate training method, set up a


suitable network structure, set sensible learning rates
and decide how the training process should be
controlled. Clicking Execute will start the training
process. As result you will get a generated model.

Set up and train several networks trying out


different layouts (one, two three layers) , learning
methods (quick, RBF, multiple, prune), etc.
All models generated will be available for later use
in the Generated Models Palette

4
How to use SNNS

Testing and Validation respectively


As you will remember checking the performance of a model is based on data not used during training. In the
stream you are using this has been taken into account by labelling records as training or validation data sets.
Now, as training has been completed, records set apart for validation are used to evaluate the model' s
performance and to calculate various error measures, one being a misclassification matrix.

In order to check out a model it is taken from the Generated Models palette and included in the stream. Nodes
downstream a model under investigation can be used to calculate performance indicators.

Part of the stream used to


calculate a misclassification
matrix used as a performance
indicator.

Executing the prediction x risk node will calculate a misclassification matrix for a
model plugged into the steam.

T
hat concludes our survey of functions Clementine provides to support the whole project life cycle of a
neural network application development. You have seen that Clementine comprises pre-processing,
model building and post-processing alike in a self-contained environment.

Clementine Documentation
Please note that you are using version 6.02 Therefore, changes to the manual should be considered.
[Ref] Clementine Reference Manual Version 5
[Guide] Clementine User Guide Version 5

You might also like