You are on page 1of 7

Map India 2005 Geomatics 2005

RELATIONAL WALL AND OBJECT DATA BASE


A GIS PERSPECTIVE

NIRMALENDU KUMAR
SURVEY OF INDIA, CHANDIGARH

1. INTRODUCTION

Relational data base management systems (RDBMSs) have been very successful, but
their success is limited to certain types of applications. As business users expand to newer types of
applications, and grow older ones, their attempts to use RDBMS encounter the "relational wall",
where the RDBMS technology no longer provides the performance and functionality needed. This
wall is encountered when extending information models to support relationship, new data types,
extensible data types and direct support of objects. This wall appears when deploying in
distributed environments with complex operations. Attempts to scale the wall with relational
technology lead to an explosion of tables, many joins, poor performance, Poor scalability and loss
of integrity. ODBMS offers a path beyond the wall. This paper explains this wall, and how it can
be avoided using object database with GIS angle.

2. THE LAY MAN DIFFERENCE

The two different concepts at a higher level can be comprehended with a very simple
example. Suppose we wish to store our car in the Garage at the end of the day. In the object
DBMS (ODMS), this is modeled with an object for the car, another for the garage; one operation
"store" and we are finished. In a relational system, on the other hand, all data needs to be
normalized i.e. it must be flattened to primitives and sorted by type. In this example, it means that
the car must be disassembled down to its primitive elements, which must each be stored in
separate tables. So the screws go in one table, and the nuts in another, and the wheels, Piston etc.
In the morning, when we wish to drive to work, we must first reassemble the automobile and then
we can drive off.

3. MODELLING THE REAL WORLD

Real world is the physical world, a certain perception of which should be modeled. A
thing in the real world is called an entity, which is a unique physical element. Conceptual model is
made up of the relevant real world entities that should be modeled and the relation between these.
Conceptual models are created by users and developers. Users understand and work with their
own perception of the real world and they are some times able to specify their needs, expressed
through their conceptual model, whereas developers communicate their solutions through these
model.
The Data model is how the information is actually stored in a GIS Data base. Data models
are made from conceptual model by the developers.
The real world and the conceptual model stay the same but the data model can be having either
relational approach or an object oriented approach. Relational data model is geometry– centric

 Map India 2005


Map India 2005 Geomatics 2005
approach and represents real world entities as geometric primitives like points, lines and polygons.
Therefore it's distant from the world it tries to model, and from the conceptual model. In object
orient approach the real world entities are represented by features, belonging to predefined object
classes. Therefore these data model is close to the conceptual model and there by the real world it
is meant to model. This results in better solutions and makes it easier to ask questions to, and get
useful answers from the systems.

4. COMPARATIVE ANALYSIS OF RDBMS AND ODBMS

Practically relational theory is only useful for application with simple, tabular data
structure and simple operations, where the SQL queries mechanism applies directly. However in
most cases of GIS the application’s data structures are not simple tables and its operations are not
limited to simple disconnected, SELECT, PROJECT and JOIN. In these cases application
programmer must first translate his problem into primitives tables and thus complexity of problem
is in fact completely managed by application Programmer and the RDBMS works only on
resulting primitive decomposition. This translation results in increased programming time and
cost, integrity loss, and poor/slow performance. With the ODBMS the DBMS itself help manage
the complexity.
The source of the different behavior of RDBMS and ODBMS can be summarized under
two main heads, information model difference and architectural difference.

4.1 INFORMATION MODEL DIFFERENCE


Here we will analyze how the relational wall can appear not only in performance but also,
as a barrier to modeling complexity, extensibility and flexibility.
ODBMS and RDBMS, vary in the models they allow the application programmer to use
to represent his information and its processing. RDBMS support, Tables as data structure and
select, project and join as operation and all application information and processing must be
translated to these. Whereas ODBMS support any user defined; data structures, operations and
relationship.
This information model deference will be studied under the following points.

4.1.1 RELATIONSHIP

In a relational system, to represent a relationship between two piece of information (tables,


rows), the user must create a secondary data structure (foreign key), and store the same value of
foreign key in each structure. Then at run time, to determine which item in connected to which,
the RDBMS search and compare foreign key, called join through all items in the table until it
discover two that match. This join is show, and gets shower as table grow in size.

In ODBMS to create a relationship, the user simply declare it; the ODBMS then
automatically generate the relationship, and all that requires, including operations to
dynamically add and remove instances of many-to-many relationship. Referential integrity,
such a difficult proposition in RDBMSs, usually requiring users to write their own stored
procedures, falls out transparently and automatically. Further, the traversal of relationship
from one object to the other is direct, without any need for join or search-and-compare.
This can literally be order of magnitude faster and scales with size.

 Map India 2005


Map India 2005 Geomatics 2005
4.1.2 VARYING-SIZED DATA STRUCTURE
The RDBMS supports only fixed-size tables, hence if extra structures need to be added, it
results in extra complexity, and lower performance. In order to represent such varying sized
structures the user must break them into some combination of fixed structures, and manually
manage the links between them. This requires extra work, creates extra opportunity for error or
constancy loss, and makes access to these structures much slower.
In ODBMS, there is a primitive to support varying-sized structures. This provides an easy,
direct interface for the user, who no longer needs to break such structures down. Also, it is
understood all the way to the storage manager level, which help efficient access, allocation,
recovery and concurrency.

4.1.3 USER EXTENSIBLE STRUCTURE

Suppose a user wishes to alter a few rows in a table eg. it is required to add two new field
in three rows. The RDBMS is left with two choices. He can enlarge all rows, wasting space (and
hence time also because of increased desk I/O) or he can create a separate structure for those new
columns, add foreign keys to all rows to indicate which has these extra column. This still adds
some overhead to all rows of the table, or its adds a new table, and slow join between the two.
The ODBMS user on the other hand simply declares the changes as a subtype (certain
instances of the original type are different) and the ODBMS manage the storage directly from that,
allocating extra space just for those instances that need it, with no extra foreign key or join
overhead.

4.1.4 FLEXIBILITY AND EFFICIENCY OF STRUCTURE

Flexibility can be critical to many applications, in order to vary structures for one use or
another, or vary them over time, as the system is extended. The RDBMS structures are static, fixed
with hard rectangular boundaries, providing little flexibility.
With an ODBMS, the user may freely define any data structure, any shape. Moreover, at
any time, such structures may be modified into any other shape, including automatic migration of
pre existing existence of the old shape. Any new structure can always be added, which can be
simple, with just a few fields or may be very complex including composite objects or objects that
are composed of multiple other (component) objects.

EFFORTS TO MAKE RDBMS FLEXIBLE

In order to allow storage of complex structures RDBMS have begun to add BLOBs, or binary
large objects. DBMS does not know what is inside the BLOB as the ODBMS do for objects.
Storing information in them is like storing in a flat file, which can be useful but with a BLOB,
DBMS can not support any of it's functionality internally to the BLOB, including concurrencies,
recovering, versioning etc. and left to the application.
In order to support user defined operations RDBMS now support stored procedures, which
will be invoked on certain events and executed on the server. This is much like executing methods
in an ODBMS, except that ODBMS methods may apply to any events (not just certain areas in
RDBMS); may execute any where (not just on server). Similar to stored procedures are Data
Blades or Data cartridge, which are pre-built procedures to go with BLOBs. They are similar to
the class libraries in an ODBMS, except that the ODBMS class libraries can be written by users

 Map India 2005


Map India 2005 Geomatics 2005
(Data blades typically require writing code that insert into the RDBMS engine), and may have any
associated structures and operations.
All these are very limited steps towards satisfying user needs and if the RDBMS are modified
enough, to generalize BLOB to any object structure, stored procedures to any object method and
data blades to any object class, it would require rebuilding of the core DBMS engine to support all
these, instead of just tables and the result would be an ODBMS.

DATA TRACKING

Data often changes over time and for tracking those changes RDBMS user must create
secondary structures and manually copy data, label it and track it. The ODBMS user may simply
turn on versioning, and the system will automatically keep history.

4.1.5 ENCAPSULATION OF OPERATION FOR INTEGRITY, QUALITY AND COST REDUCTION

The encapsulated operations prevent integrity violations, as it allows the user to embed any
desired rules into these operations at each level. RDBMS allow (rather force) user to work at
primitive level, so they might violate or break higher level structures or change primitive values
without making corresponding changes to other, related primitives.
RDMS approach of structuring software creates a set of shared data structures, which are
operated upon by multiple algorithms (code, programs). If one program desires a change to some
data structure, all other algorithms must be examined and changed for the new structure. Similarly
for reusing any algorithm, one has to copy it over, and go through and edit it for the new project's
data structure, which means eradicating new bugs and related maintenance problem.
In ODBMS objects, however the code and the data it uses is combined, so changes can be
made to them together, without breaking or affecting the objects. If we need to reuse an object from
the old project, we can simply use it, as is without copying and changing. It is already tested,
debugged and working, so we gain higher quality by building on the work of the past. Even if the
old object is not exactly what we want, we can define new sub-types for that object (Inheritance)
specifying only the difference (Delta) between the new and the old. This reduces the requirement
of new programming (debugging and potential quality problem) to the much smaller delta, and
reduces the maintenance by keeping the same main object for both the new and the old system.

4.2 ARCHITECTURAL ISSUES

Technically, the term ODBMS and RDBMS say nothing about architecture, but refer only to
the information model. However, in practice, the ODBMS and RDBMS products differ
significantly in architecture and these differences have a major impact on user. We will analyze the
differences in clustering, caching, distribution, scalability and replication.

4.2.1 CLUSTERING

RDBMS model is based on tables, and virtually all implementations are based on tables. For
example, if there is a customer table, then all applications that need to access customers will collide
on that table, which forces all such application to wait for each other. The more the users, the more
the applications, the longer will be the wait.
In an ODBMS, the customer objects may be separated and stored as desired; eg. Customer in
USA may be placed in one database, on one server, while those in Asia may be placed in another.
Though access remains unchanged ,it will not conflict at all and will allow them to sum in parallel.

 Map India 2005


Map India 2005 Geomatics 2005
4.2.2 CACHING

In all RDBMS, all operations are executed on the server, requiring inter-process
communication (IPC) to invoke such operations. The time for such IPC is measured in
milliseconds. With an ODBMS, the object can be brought from the server to wherever the
applications is executing, or cached directly in the application's address space. The operations
within an address space occur at native hardware speed, which is five order faster than IPC.
Although, first operation to cache the object require IPC, any other operation on that object occur
directly ten to the power five, times faster. Such overwhelming performance advantages are a big
part of why ODBMS can be much faster. Combining caching and clustering produces even greater
benefits.

4.2.3 CENTRAL VERSUS DISTRIBUTED ARCHITECTURE

The RDBMS are built around central servers, with all data, buffering, indexing, joining,
projecting and selecting occurring on the server. The user simply sends in, a request and gets back
an answer, which is same as main frame architecture. Meanwhile the world of computing has
changed, with powerful workstations, PC's and high speed network and in many cases the
corporate environment has far more computing power spread around on the desktop than in the
Central server. The main frame architecture is unable to take advantages of these computing
resources.
In Central server architecture of virtually all RDBMS, users send request into the server queue,
which must first be serialized to avoid the conflict. This means that, as new users are added, they
wait longer and longer in Central server queue, which limits multi-user scalability. When the
RDBMS starts to get too slow, all the user can do is buy a bigger server of high performance which
has to be very expensive like main frame machines.
Contrast to this, is a distributed ODBMS architecture. There are two main differences here.
First, the DBMS functionality is split between the client and the server, allowing computing
resources to be used. Second, the DBMS automatically establish direct, independent, parallel
communication paths between each client and each server. This allows clients can be added
without slowing down other, and servers may be added to incrementally increase performance
without limit. Here, when a server capacity is reached, the user may simply add another server and
move some of the objects from the overloaded server to the second. Due to transparent single-
logical view, clients and users see not difference at all and the entire system continues to run and
continue to scale.

4.2.4 REPLICATION

Replication can be a major boost in performance, not to mention reliability and availability of
large multi-user system. While RDBMS can provide some replication but they are hampered in two
ways. First, it is difficult and slow, as any replication request must first go to Central Server, who
then might be able to pass the request on to another server managing a replica. Such multiple server
interactions, required by the lack of a single logical view has to be very slow IPC. Second RDBMS
lacks any knowledge of the appropriate units for replication. All that is available with the RDBMS
server is table of flat data, so that is all it can replicate, either full tables or full databases.
The distributed ODBMS architectures solve both of these problems. The distributed single
logical view makes it transparent, where the objects are located, so the system can directly access
whatever replicas are available. Also the definition of objects at multiple levels allows the system

 Map India 2005


Map India 2005 Geomatics 2005
to replicate those objects, the user wishes to replicate, rather than all of the data bases or all of a
given type

5. GIS PERSPECTIVE

A decision maker in physical planning uses GIS more and more, but the gap between
physical reality, conceptual models and data models has been a barrier for decision makers and
analyst. Up till now, in RDBMS the data models have been based on the primitives: points, lines
and polygon and the persistency is obtained by all data stored in tables. Querying and editing
operations performed on these data are stored in tables. The network infrastructure needed much
custom codes to be managed and queried in a way that are useful to the day to day operations of a
utility.
In Geo-object database, the possibility of encapsulating data and implementing a specified
behavior in the shape of Geo-objects will mirror the reality much better and thereby add
intelligence into the basis of the analysis. In this design, the Geo-objects can be made transient
(data used only for the duration of function or application session) or persistent (Data that is
created and may available throughout the organization) when instantiated. Abstraction can provide
the capability to change from one to another in some systems and provides a mechanism of
scalability for data model expansion.

5.1 THE GEODATABASE DATA MODEL OF ARCINFO 8

The defining purpose of this new object oriented data model is, to make the features of
GIS Data smarter by endowing them with natural behaviors, and to allow any sort of relationship to
be defined among features. The data objects in geo-data base are mostly the same objects we would
define in a conceptual/logical data model, such as owner, buildings, parcels and roads. This data
model lets us implement the majority of custom behaviors without writing any code. Most
behaviors are implemented through domains, validation rules, and the other functions of the
framework provided in Arc Info.

5.2 BENEFITS OF THE GEO DATA BASE DATA MODEL

The object oriented data modeling can characterize features more naturally by defining our
own objects, defining topological, spatial and general relationship and by capturing how these
objects interact with other objects. Some of the benefits of object geo data base are:

• A uniform repository of Geographic data: All of the geographic data can be stored and
centrally managed in one database.

• Data entry and editing is more accurate: Fewer mistakes are made in data entry and
editing, because most of them can be prevented by intelligent validation behavior.

• Users work with more intuitive data objects: A geo-data base contains data objects that
correspond to the users model of data. Instead of generic points, lines and polygons the user work
with objects of interest, such as transformers, roads and lake.
However in RDBMS where an GIS application has complex data structures the application
programmer must write code to translate from these data structures down to primitive tables, which

 Map India 2005


Map India 2005 Geomatics 2005
resulted in increased programming time and east, integrity loss and Poor/slow performance, as lot
of time is wasted in search and compare operations.

• Features have a richer context: With topological associations, spatial representation and
general relationship, we not only define feature's qualities, but its context with other features. This
lets us specify what happens to features when a related feature moved changed or deleted. This
context also permits to locate and inspect a feature that is related.

• Many users can edit geographic data simultaneously: The geo-data base data model
permits work flows where many people can edit features in a local area and then reconcile any
conflict that emerge.

• Shapes of features are better defined: Shapes in a geo-database can be better defined,
using straight lines, circular curves, elliptical curves and splines. Sets of features are continuous as
by their design they can accommodate very large sets of features without tiles or other spatial
partitioning.

 Map India 2005

You might also like