Professional Documents
Culture Documents
Abstract In this paper, the open source Matlab toolbox analysis of time series and features, especially for classifi-
Gait-CAD is presented. This toolbox is designed for the cation, but also for regression problems. Our intention is
visualization and analysis of time series and single features the design of an open platform as a framework for the
with a special focus to classification problems. The aim is development and improvement of data mining methods.
to provide an open platform for the development and im-
provement of data mining methods and the application to Methods
various medical and technical problems.
The toolbox Gait-CAD bases on Matlab (tested for the
Keywords Data Mining, Tools, Neuroprostheses
versions 5.3 and 2007b). The decision to a Matlab-based
solution was made to use the wide mathematical function-
Introduction ality of this package provided by The Mathworks Inc. A
In many applications, large data sets of time series and main disadvantage is the need for a MATLAB license.
single features are recorded. An at least semi-automatic The toolbox is operated by a graphical user interface
search for unknown or partially known relations requires (GUI) with menu items and control elements like popup
the use of data mining methods [1]. In the last years, a huge lists, checkboxes, and edit elements (Figure 1). This en-
number of potentially useful methods and software tools ables inexperienced users to work with the toolbox. How-
have been proposed including methods for feature extrac- ever, the implemented algorithms work independently
tion, classification, and regression. from the GUI. Thus, the Matlab-typical way of program-
Many existing software tools are very powerful, but they ming using a command prompt and variables is possible.
cover only a very limited subset of implemented methods. Furthermore, an automation and batch standardization of
However, the coupling between different necessary proc- analyzes is possible by designing individual macros. More
essing steps (as e.g. feature extraction from time series and details for the handling are explained in a comprehensible
classification) is rather weak. This leads often to the reim- PDF handbook.
plementation of existing methods or a stepwise transfer of
partial results between different tools.
Some tools are focused on a script-based processing result-
ing in problems for a transfer to other applications due to a
time-consuming manual adaptation of implemented algo-
rithms. A generally accepted tool platform does not exist at
the moment.
These facts make a fast comparison of new developed
methods against a broader set of existing methods very
time consuming. As a consequence, the new methods will
only be compared with a small number of concurrent ap-
proaches - a broad comparison is not feasible.
In our opinion, an ideal data mining tool
• has to contain various data mining methods from
feature extraction to classification and regression us-
ing statistical approaches up to newer approaches
from computational intelligence,
• has to be free and open source to guarantee a wide Figure 1: Gait-CAD screenshot
acceptance in the scientific community and the fast in-
tegration of new methods, Gait-CAD is an open source software. The German version
• needs to be modular with well documented interfaces is available since November 2006, the English one since
to integrate various methods useful for highly special- January 2008 It is licensed under the conditions of the
ized application domains, and GNU General Public License (GNU-GPL) of The Free
• has to support a GUI based exploration of the data set Software Foundation. The download is possible using the
as well as a highly automated script based processing downloading section at
of routine operations.
This paper presents the Matlab toolbox Gait-CAD as a first http://www.iai.fzk.de/projekte/biosignal/index.html.
step in this direction. It is focused on the visualization and
To use the toolbox for the design of a data mining algo-
rithm, a training data set is required. This data set is nor- Database
Problem formulation
(verbalized)
mally given by a binary Matlab project file, containing
matrices and vectors with predefined structures and names. Collecting Problem formulation
This data set is normally given by a binary Matlab project training data set (formalized)