Professional Documents
Culture Documents
SUMMARY
The lack of platform-independent numerical toolsets presents a barrier to the development of
distributed scientific and engineering applications. Unlike self-contained applications, which
can utilize specialized interfaces to numerical algorithms, distributed applications require a
computing environment with cohesive data structures and method interfaces. These features
are essential in providing consistency between independently developed parts of distributed applications. We describe a Java-based framework that provides a set of consistent data structures
and standard interfaces for numerical methods which operate on these data structures. The
data structures we utilize are double precision real and complex matrices in Java. Our method
interfaces are designed to model those of MATLAB. Since many engineering toolsets rely heavily on core numerical linear algebra algorithms, our current work is focused on implementing
a computational foundation of fundamental numerical algorithms operating within our matrix
framework. The matrix framework and numerical algorithm libraries are extremely useful for
a wide range of applications and should prove to be easily extendable for developing various
applications and toolsets beyond their current implementations. 1997 John Wiley & Sons,
Ltd.
Concurrency: Pract. Exper., Vol. 9(11), 11271137 (1997)
No. of Figures: 3.
No. of Tables: 2.
No. of References: 5.
1. INTRODUCTION
Current trends in university research are moving toward projects involving collaboration
between several universities on larger projects[1]. These changes are initiating the development of tools to aid in collaboration on research projects as well as collaborative software
for implementing research projects. These applications are being developed in a distributed
fashion, with several different groups at various schools working separately on different
projects, which are intended to be able to communicate with one another upon completion.
This framework is highlighted in Figure 1.
One motivation for our work is the need for a common mathematical framework for
these distributed applications. Although each application within a project consists of a
different functionality, they often share a common mathematical base. Therefore, in order
for the distributed applications to effectively share data in a distributed environment within
possibly different applications, common data structures are necessary. Similar to shared data
structures, shared methods (e.g. written within one application for use by other applications)
are also needed with well defined interfaces (e.g. well-defined types for inputs and outputs,
as well as the manner in which methods operate on objects).
Correspondence to: T. H. Smith, Massachusetts Institute of Technology, Microsystems Technology Laboratories, Bldg. 39, Rm. 328, Cambridge, MA 02139, USA. (e-mail: taber@mit.edu)
Contract grant sponsor: DARPA; Contract grant numbers: DABT63-94-C-0055; DABT63-95-C-0088.
CCC 10403108/97/11112711$17.50
1997 John Wiley & Sons, Ltd.
1128
T. H. SMITH ET AL.
Figure 1.
A second motivation for the work we present here is derived from commonly used development methodologies and tools. In particular, many university research groups initially
develop scientific applications in MATLAB. The built-in graphics, simplified debugging
and large number of readily available toolboxes make this a useful framework for testing
new theories and ideas. Unfortunately, transitioning much of this research for use in industrial tests requires these initial applications to be converted into stand-alone applications
deployable on a large number of different platforms. In fact, experiments to be performed
in industrial settings can be greatly delayed by slow conversion from the MATLAB environment to a Fortran, C, or Java implementation. Increasing the efficiency of this transition
phase is imperative to the success of: (i) testing research ideas in a realistic setting provided
by industrial interactions with universities; and (ii) proliferating the implementation of new
research methodologies into widespread use.
The current effort to implement platform-independent Java applications presents the
final motivation for our work. For completely new or relatively simple applications, a
clean start with Java is the most efficient approach to creating 100% Java applications.
However, for a large portion of the scientific community this is not a reasonable solution.
Several applications depend heavily on the numerical methods which have been previously
researched and developed over many years[25].
For example, a current research project at the Massachusetts Institute of Technology
(MIT), Stanford, and several other universities involves the integration of tools and software for application in the semiconductor manufacturing industry. The toolsets are being
implemented in the Java framework and range from distributed design libraries for use in
remote design and simulation to characterization utilities such as a remote microscope
to process and equipment diagnostic routines and to process control applications, which rely
heavily on well-established numerical methods (e.g. the singular value decomposition and
eigenvalue decomposition). The collaborative approach to research in the semiconductor
arena is shown in Figure 1.
This approach enables many researchers to combine their expertise and resources via
a wide range of Internet applications. Many of these independent groups have built applications utilizing numerical algorithms developed over many years which are widely
available in languages such as C and Fortran. We have found that the conversion of these
Concurrency: Pract. Exper., Vol. 9, 11271137 (1997)
1129
complex applications (or toolsets), which depend heavily on reliable numerical algorithms,
can also be greatly hindered when confronted with the lack of algorithms available in the
Java environment. While several other platforms offer excellent resources (e.g. NetLib) for
finding well-debugged, reliable and efficient routines, the Java platform is at a loss in this
aspect.
These three needs have driven the development of: (i) a framework, upon which a large
class of numerical algorithms can be implemented in Java; (ii) the ongoing development
of a library of Java methods, similar to that provided by the MATLAB environment,
which cover the core numerical linear algebra algorithms; and (iii) a utility for converting
MATLAB user scripts and functions to Java equivalents. We have termed our matrix
framework MatrixCafe in the spirit of MATLAB and Java. The Java implementation has
three key goals: (i) adopt the software architectural principle from MATLAB that the
complex matrix is a fundamental data object for unifying the framework; (ii) achieve a
functional equivalence to the base MATLAB layer; and (iii) create a one-to-one mapping
from MATLAB functions to equivalents in the MatrixCafe package to facilitate rapid
conversion of MATLAB user-written scripts and functions to Java.
Section 2 outlines the MatrixCafe framework we have developed. The library we are currently developing is discussed in Section 3, and an example application is given. Section 4
briefly highlights our work on converting MATLAB files to Java. Finally, we summarize
our initial work and discuss directions for future work in Section 5.
2. THE MATRIX MATH FRAMEWORK
The goal of our work is to provide a wide range of numerical algorithms in a cohesive
framework within the Java environment upon which distributed scientific and engineering
applications can be built. Specifically, we aim to address the needs outlined above: the
need for a cohesive environment for distributed computing, the need for an advanced
mathematical library and the need to efficiently transfer MATLAB scripts and functions
into a stand-alone platform independent language such as Java.
Therefore, we have focused on developing a matrix framework upon which these algorithms may be built. Although the Java language provides the ability to generate twodimensional arrays, very few of the java.lang.Math methods readily apply to such arrays.
Some basic matrix classes in Java have been distributed with varying degrees of applicability, reliability and functionality. Our effort here is to provide a framework upon which a
full set of matrix operations and methodologies may be implemented.
Several applications in the scientific community require complex matrices for a range of
problems, including image processing and linear system analysis. Therefore, we define a
Matrix class, composed of a real matrix and an imaginary matrix. In strictly real applications, the imaginary part is ignored and underlying methods invoke real versions of the
same operations which operate on only the real component of the data.
The Java language inherently provides the ability for multidimensional array processing
beyond two dimensions. However, many of the publicly available routines which currently
exist were developed with the specific application to two-dimensional matrices. Since
our focus here is to provide an environment in which to implement many of the existing
algorithms, we restrict our framework to cover only two-dimensional matrices. Although
this was one major limitation of MATLAB 4, the extension to multiple dimensions requires
a complex rule set for operations and functions. For a large percentage of applications,
1997 John Wiley & Sons, Ltd.
1130
T. H. SMITH ET AL.
this benefit is secondary to the need for an initial implementation of basic numerical linear
algebra routines.
Applications in the scientific community require varying degrees of precision. In addition,
many of the publicly available mathematical routines are written in various precisions to
address different levels of memory and efficiency requirements. However, in order to
provide a common interface to and within our framework, and to focus on providing a
reduced set of algorithms which address the largest number of applications in a reliable and
general fashion, we have chosen to standardize the Matrix class within our framework to be
composed of two two-dimensional arrays of double precision floating point numbers: one
representing the real component and the other representing the imaginary component.
With this, we restrict all the functions within our framework to accept and provide only such
matrices (with a few exceptions for importing and exporting data). In this manner, every
element within our framework is a double precision complex matrix, including scalars,
vectors, integer matrices and all floating point matrices (both real and imaginary). Where
necessary, matrices within a method may be reduced to integer precision for efficiency
or inherent integer requirements (e.g. integer matrices which contain index information in
reference to another matrix). Exceptions are made only for methods which are designed
to create matrices within the framework or to extract them from the framework. The class
definition for our framework is then of the form
public class Matrix
{
private double[ ][ ] real, imag;
}
As in the case of MATLAB, these definitions have several implications on performance.
First, the use of double precision values for single precision computations will be slow on
single precision floating-point platforms. Integer calculations will be slowed by the double
precision operations or by conversion to integers for use in integer routines. In addition,
memory requirements are greater for representing integer or single precision numbers as
double precision. Scalars and vectors will suffer a slight performance loss due to being
placed in a two-dimensional array wrapper. Future work will be required to assess the full
implication of these performance issues.
In addition to the Matrix data structure outlined above, we provide exception handling
for a large class of commonly occurring exceptions. We define a MatrixException class
as follows
public class MatrixException extends Exception
This exception class is currently subclassed into two general exception classes. The first
class provides exception handling for matrix dimension exceptions, which are thrown
by operations that impose constraints on the dimensions of their operands. This class is
designed to catch severe problems within the matrix library methods, as well as in any
toolsets or applications being developed on top of the framework. A typical example
might be non-equivalent inner matrix dimensions in a matrix multiplication. The second
subclass of MatrixException for the Matrix class is ComputationException. The
ComputationException class itself contains several subclasses to handle: convergence
exceptions, to be thrown when iterative algorithms cannot achieve convergence; precision
exceptions, to be thrown when matrix elements fall beneath a precision limit in algorithms
Concurrency: Pract. Exper., Vol. 9, 11271137 (1997)
1131
(1)
1132
T. H. SMITH ET AL.
...
try {
Array = A.svd();
S = Array[0];
V = Array[1];
D = Array[2];
} catch (ComputationException e) {
System.out.println("Fatal Error computing SVD.");
System.exit(0);
}
The purpose of this matrix math framework is to serve as a foundation upon which a
large number of applications and toolboxes may be built. The application hierarchy of our
particular example of collaborative tools for the semiconductor manufacturing project may
be visualized as in Figure 2. Here, widely accessible applications at the surface build upon
high-level toolsets, which in turn are built upon the MatrixCafe framework. We now turn
to describing the current state of such a library.
Figure 2.
1133
construct a matrix with elements all having a given value: zero, one or other.
construct identity matrices.
construct a matrix with random entries.
construct column vectors with regularly, linearly and logarithmically spaced entries.
construct diagonal matrices from a one-dimensional array of type double.
1134
T. H. SMITH ET AL.
Table 1.
Constructors
Static methods
Matrix
manipuation
methods
From 1D or 2D
array of doubles
From integer
size specifiers
From data files
Zeros matrix
Ones matrix
Identity matrix
Random matrix
Evenly spaced
vectors
Basic matrix
math operations
Basic data
analysis methods
Manipulations
(e.g. submatrix)
Table 2.
Numerical
linear algebra
Matrix determinants,
inverses, etc.
Eigenvalues
Matrix decompositions
(LU, QR, SVD)
Linear equation solving
Constructors and
static methods
DoubleMatrix(DataInputStream)
DoubleMatrix(double[])
DoubleMatrix(double[][])
DoubleMatrix(int)
DoubleMatrix(int, double)
DoubleMatrix(int, int)
DoubleMatrix(int, int, double)
eye(int, int)
init(int, int, double)
linSpace(double, double, int)
regSpace(double, double, int)
logSpace(double, double, int)
ones(int, int)
zeros(int, int)
rand(int, int)
randn(int, int)
Methods
Methods
Methods
abs
acos
acot
acsc
all
and
any
asec
asin
atan
ceil
opy
cos
cot
csc
cumprod
cumsum
diag
divide
exp
find
fix
fliplr
flipup
floor
equals
equalTo
extract
elementsNonZero
getElements
getNumberOfColumns
getNumberOfRows
greaterThan
greaterThanOrEqualTo
isEmpty
isFinite
isInf
isNan
isScalar
isVector
kron
lessThan
lessThanOrEqualTo
log
log10
lu
max
mean
median
min
minus
mpower
mtimes
not
notEqualTo
numCols
numRows
or
plus
power
prod
qr
readMatrix
rem
reshape
round
sec
setElements
sign
sin
size
sort
std
sum
svd
tan
times
toString
trace
transpose
tril
triu
writeMatrix
xor
1135
2. methods for performing orthogonalization, and row echelon reduction, LU, QR and
Cholesky decompositions
3. methods for solving linear equations, non-negative least-squares, pseudoinverses and
least-square solutions
4. methods for generating eigenvalues and eigenvectors, characteristic polynomials,
generalized eigenvalues and the singular value decomposition.
As a rough measure of the magnitude of the project, we have approximately 4800 lines of
Java code as of the beginning of April 1997. We estimate that the base libraries are roughly
25% done. We hope to complete 50% of the project by the end of August 1997.
3.2. Performance
Our focus at this initial stage is not necessarily to achieve optimal efficiency, but rather
on providing robust and reliable routines. Aside from the computational speed limitations
mentioned above, the interpreted nature of Java will limit the performance of the MatrixCafe
package. However, with the addition of just-in-time (JIT) compilers and Java compiler
optimizations, this performance gap is likely to decrease. Alternatives to using 100% Java
applications include using native methods written in other languages within a wrapper or
remote method invocation with application servers. Native methods are fast and decrease
the effort necessary to convert existing algorithms for use in scientific applications, but
platform dependency restricts their usefulness in distributed computing applications as
well as for more general audiences cases in which the platform of the end user may not
be known. The use of application servers restricts the availability of libraries to those users
who have dedicated servers which may be complicated to set up. If application servers are
provided for use via the Internet, then potential users are reliant on network lag and the
number of servers provided for their performance specifications. Neither of these options
are appealing for our interests.
We have not yet reached a point where we can properly assess the performance of
our framework or the routines within it. After several numerical linear algebra routines
are added to the package, we will benchmark the performance of these methods with
other implementations. We can then compare and contrast these routines with respect
to the performance issues described above. Based on these tests, we may be forced to
make changes or offer alternatives to our ideal framework outlined above. In particular,
the methods may need to be implemented for all Java types (integers, single precision
floating-point numbers, as well as double precision floating-point numbers) if computation
performance is low, if the numeric accuracy of the double precision representation causes
accuracy problems (particular for integer computations), or if runtime type-checking causes
significant performance problems. The requirement of instantiating new outputs (the nonoverwriting of inputs) may also be removed if the performance loss is large.
We foresee good numeric stability within specific routines by virtue of the methods
which are being converted for use in the library. However, the stability of the software
implementation within our framework will be heavily tested through the use of the package
in several applications within the distributed semiconductor manufacturing project outlined
in Figure 1. It is hoped that these applications will also test the extensibility of the classes
for use in specific applications.
1136
T. H. SMITH ET AL.
Figure 3.
Java code for an image processing example using the MatrixCafe libraries
1137