Professional Documents
Culture Documents
Taxonomic reasoning is a typical task performed by many AI knowledge representation systems. In this paper, the effectiveness of taxonomic reasoning techniques as an active support to knowledge acquisition and conceptual schema design is shown. The idea developed is that by extending conceptual models with defined concepts and giving them rigorous logic semantics, it is possible to infer isa relationships between concepts on the basis of their descriptions. From a theoretical point of view, this approach makes it possible to give a formal definition for consistency and mmimality of a conceptual schema. From a pragmatic point of view it is possible to develop an active environment that allows automatic classification of a new concept in the right position of a given taxonomy, ensuring the consistency and minimality of a conceptual schema. A formalism that includes the data semantics of models giving prominence to type constructors (E/R, TAXIS, GALILEO) and algorithms for taxonomic inferences are presented: their soundness, completeness, and tractability properties are proved. Finally, an extended formalism and taxonomic inference algorithms for models giving prominence to attributes (FDM, IFO) are given. Categories and Subject Descriptors: H.2. 1. [Database Management]: Logical Designdata models; schema and subschema; 1.2.4 [Artificial Intelligence]: Knowledge Representation Formalisms and Methodsrepresentation languages; frames and scripts General Additional taxonomic Terms: Design, Key Words reasoning Language, and Phrases: Theory, Verification Schema consistency, schema minimality, semantic models,
1. INTRODUCTION Modeling an application domain in terms of conceptual models is at present an important phase of many database design methodologies. This phase, called conceptual design, assumes that an a priori accurate user requirement collection result, of both a conceptual data and operations on data formalization is performed; of the it gives, requirements as a [4, data schemaa
21]. Formalization is performed 19, 22, 33, 40, 50] that decrease
This work was partially supported by the Italian project Sistemi informatici e Calcolo Parallelo, subproject 5, objective LOGIDATA+ of the National Research Council (CNR). Authors address: CIOC-CNR, Viale Risorgimento 2-40136 Bologna, Italy; Tel: + 3951644.35503548; email: {sonia, claudio}@deis64 .cineca.it. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. @ 1992 ACM 0362-5915/92/0900-0385 $01.50 ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992, Pages 385-422.
386
to
S. Bergamaschiand
C, Sartori
be modeled and the primitives available, compared to traditional DBMS data models. A conceptual data schema describes the in tensional knowledge, i.e., structures of concepts and interrelationships between concepts (relationships, work part-of, isa hierarchies) languages [52, 53]. These ingredients are also found in on programming with type inheritance structures [20] and in
the most recent models for complex Because of the difficulties facing languages Designer and [23], graphic schema tools and expert systems
objects representation [1, 2, 35-38]. a designer in conceptual design, various [4], TAXIS the [40], EASYER and [31], efficiency ERof [14, 24, 26]) have Despite been developed to assist
(GALILEO
in conceptual
construction.
usefulness
these tools, the decision on how to organize concepts in a taxonomy and, above all, the systematic verification of the schemas correctness is still guided only by the designers experience. When difficult complex top-down concepts application we deal modeling and larger order with task, [18, [10, complex since 25]. It application in this becomes domains, necessary but the among we meet to think of a true construction various with not a more is more only of a of each case the intentional of a schema, knowledge
construction often
acquisition designers,
of a large
design corresponding to a different integrated into a larger one with becomes almost mandatory tively support construction the correctness The problem
to have of large,
faced with working environments semantic capabilities, suitable concepts and techniques Section in a taxonomy intelligence that exploit constitutes areas. artificial
that provide, alongside conceptual model reasoning techniques. The organization of a basic modeling Thus, many between principle have concepts, both in database been devoted to in as summarized efforts
isa relationships
8 [2, 3, 5, 6, 10, 27, 28, 32, 36]. area the taxonomy is built by an explicit declaration tiae properties. In the by the designer, since a concept of its parent concepts (isa links) intelligence area the taxon-
artificial
omy is computed, as it is assumed that a concept description can be given as a composition of ancestor concepts (not necessarily parents) and differential properties. of the right place for a concept Automatic classification (i.e., determination task, called taxononzic in a taxonomy 1), is the most important reasoning reasoning, of knowledge representation hybrid systems (KL-TWO [54], KANDOR [44, 45], KRIPTON paradigm [15], [17]). BACK [39], CLASSIC reasoner [13], finds which are based on the KL-ONE A taxonomic all isa relationships
the classification
abstraction
mechanism
of conceptual
models, which
On Taxonomic between make a new concept the inferences description isa links
Reasoning in Conceptual
Design already
387 given, To as
taxonomy
by discovering these
implicit
descriptions. language,
it is necessary
was first proposed by Schmolze and Israel [49], that is, a language in which each expression (i.e., concept) exactly identifies a set of items of a given application domain. The correspondence between syntactic expressions and sets is thus obtained by assigning extensional component, i.e., the rules for the construction semantics to each language of the associated set. In this
way a subsumption relationship between two concepts can be computed by syntactically comparing the language expressions describing them.z Definitional language semantics is almost unheard of in conceptual model tradition, necessary with where a concept whose description is intended satisfy these = Frame models, to represent conditions) Description by means and conditions filled isa links only for the extension are explicitly languages the semantics of a class (i.e., a class must declared. (called FDL of conceptual be explicitly
individuals
descriptions
primitive concepts, and definitional languages semantics, by means of the so-called defined concepts. A defined concept gives a class definition, thus the description of the class represents matically subtype concept class. which filled necessary and sufficient conditions, and individuals inserted in that class. A defined concept can be viewed [34] The where the structural the specification derivation captures condition (embedded rule to fill the usual in description) primitive by adding also defines concept a further can be autoas a derived the defined model
semantics unknown
perspective
to the concept
prevents explicitly.
automatic recognition of items, classes must Furthermore, we observe that both primitive
concepts semantics are useful because the first levels of a conceptual schema are usually constituted by primitive concepts (no full sufficient and necessary conditions are available), while deeper levels are constituted by defined concepts. It is the authors strongly design. database enhanced Furthermore, research, opinion by such that taxonornic concepts, of this reasoning for technique is a powerful conceptual main query for other and technique, schema topics in as defined supporting
the relevance
as recognition
of instances
validation
well as optimization [8, 13], is outstanding, as briefly explained in Section 9. The question now is: Do we have to adopt FDL formalisms to describe conceptual data schemata, or can we still use the conceptual models which are so popular in the database community? There are very good reasons to favor conceptual models: the E/R model
constitutes a standard for the conceptual design of database applications and makes automatic tools available for converting the conceptual schema to the
2 The idea of subtyping relationships computation by means of syntactic presented by Cardelli [20] as the basis of a sound type-checking algorithm language. ACM Transactions
388
S. Bergamaschiand
C, Sartori DBMS nontrivial (ReIGen [9], application SchemaGen [23]). domains and poseasily and managed
data logical schema of commercial Thus, knowledge bases describing sessing very large extensional
knowledge
can be stored
on secondary memory by database technology. Again, E/R constitutes a data description standard for software engineering CASE tools, too. Other very popular conceptual models and languages, such as TAXIS [40], DAPLEX [50], and GALILEO research model constitutes [4], are very the basis DBMS for important, databases; the as they constitute for the and the basis of ongoing functional of an information on object-oriented in particular, project of time the DAPLEX space
development
object-oriented [25].
The answer to the above question is that we can use conceptual models for knowledge representation and to perform taxonomic reasoning if we extend the semantics of conceptual models rigorous extensional semantics, and subsumption relationship is detected) tation of subsumption. The aim here is to propose acquisition framework and organization, is based with then defined develop concepts, give them a a complete (i.e., every algorithm for the compuschema The
on taxonomic
strict,
taxonomies. In particular,
we introduce
the formalism
as a formal
taxonomic reasoning and prove its feasibility. >%$?is a compositional formalism that expresses the defined and primitive concept semantics and makes it possible to detect contradictory concepts (i.e., with empty extension) and compute subsumption between concepts ing rules. In this way it is possible to perform schema, plete including with primitive algorithm respect and defined subsumption a minimal and 5ZY* ,~, allows by means the passive of syntactical consistency Furthermore, of the the minimal more with type-checkcheck of a
concepts. ordering,
computation
to specialization
roles)
extend, respectively, the E/R and DAPLEX data semantics, ple inheritance from other entities as a part of the entity allowing well-known GALILEO,
generalization
defined
entities.
Y=*
includes
most
of the
data
semantics
of the models that give prominence to attributes, FDM, IFO [2] for representing inverse roles. The representation tions is a common feature of ~% and Y2~U and allows
3 The object-centered approach of H%2* makes it suitable for representation object-oriented data models such as 02, if cyclic definitions are excluded. of the semantics of
ACM Transactions
Design AI FD-Ls,
389
as they tradition.
semantics, is given
database on FDLs
of the paper
follows.
background
and on the 3,
of subsumption
computation
in Section
2. In Section
the Frame Description Language, SE%*, is introduced and its semantics formally presented according to the model theoretic approach, and then a subsumption consistency algorithm is given. A formal definition, based on subsumption algorithm of and minimality of an Y%%* schema, and a classification
are presented in Section 4. In Section 5, the E/R model is described by means of&%%*, and the effectiveness of the techniques of Section 4 is shown by some examples of conceptual schema design. In Section 6, ti9~ is presented, and the extended subsumption algorithm is given. In Section 7, the DAPLEX model is described by means of 5Z5Z~U, and some examples of conceptual schema design are given. Section 8 examines related works on conceptual schema tion tion these consistency checks. reasoning some hints Section on the 9 examines applicability related works querying on the applicaand instantiaof this paper to of taxonomic and gives topics. techniques to database
of the results
AND
COMPUTATION description types languages derived and on the complexity from the epistemological of
subsumption
computation
is given syntactic
in the following.
and role. The name of the language family, derives historically from the correspondence (typically less well-defined) on the notions basis of
between concept and role and the frame and slot of AI, respectively. In KL-ONE, the structure
of a concept
is described
of more by in-
general concepts (genus) and of a local structure (differential) additional or differentiated roles. Roles describe relationships stances primitive teachings, graphical is marked concept of that concept and instances For instance, 1. Ellipses arrows is the in of others KL-ONE, represent value (role fillers). or defined. students, schema a world concepts; roles and
courses, of Figure
and student
enrollment
can be described
by an asterisk; which
represent
to a role filler,
restriction
of the role;
name and a number restriction ( rnirz, max ), which maximum number of instances of the role filler arrows represent Person, string, isa links. and teaching are primitive
concepts.
name with a unique value in the role filler string. Student is defined as a person enrolled in a teaching (i.e., a person having the role enrolled-in whose role filler is teaching); the number restriction (1, n) means that the student must be enrolled in at least one teaching, while no upper limit is given. Course is teaching with the additional role enroll and with at least one
ACM Transactions
390
S Bergamaschi
and C. Sartorl
na.e-~m4L ....
4
Y
person* en>~~
\. student ~
Fig. 1.
teaching
~<~( ~ enroll
KI-ONE
course 1,.
schema example
student. Since student is a defined concept, any person enrolled in a can be recognized as a student. Furthermore, if we give a descripthat is a person with the roles name and enrolled-in, with respect to computation was not go far role by the Person and an isa link explicitly the Student, between in
of a concept
the same (or restricted) role fillers with respectively, we can detect by subsumption this concept and a student, capabilities role hierarchies problems of an it impossible the existence even if student its description. The descriptive tion, including make detects features algorithm. which
mentioned above
beyond
descripthese
relationships,
subsumption
but
classification
function,
Computation
isa link
between
and [47]
is the that by
even were
proves first
found
Schmolze and Israel. In specifying they discovered that the classifier subsumption relation discovered sumption relations complete algorithms FDLs derived from time polynomially
a formal semantics for a subset of KL-ONE, (i.e., any developed for KL-ONE was sound is legitimate) but incomplete (some sub-
are not detected) [49]. Further investigations to develop showed that the subsumption problem for a large class of i.e., not running in the ideas of KL-ONE is intractable, proportional to the size of the problem description [16]. ti=-, did Brachman and Levesque find [16]. The recomplete, and tractable
Only for a less expressive FDL, called a classifier algorithm that is sound, search in this and intractable 30, 41-43].
ACM TransactIons
field
FDLs
is now devoted to finding the boundary between tractable in order to couple expressiveness with tractability [29,
On Taxonomic 5?57a value common proved describes restriction ancestor for to be complete defined and concepts, an FDLs,
Design
. qualified has
existential
considered
these
and tractable.
Its syntax
(concept
))1
( ALL(role)(concept
(SOiWE(role)) ( role) ::= ( role-atom)
The
syntactic AND
categories construct
(concept-atom is a conjunction
) and
(role-atom)
indicate
concepts
for which
no description
(AND
male person) of someone multiple is an who is, at the same time, in conceptual eland. of concepts. male and a person of a concept terminology). at.. construct defines a x (i.e., (that a is,
In gen-
concept on the basis of a value restriction is an (ALL rc) iff each role filler of role
(ALL enrolled-in teaching)
a role
enrolled-in attribute
which,
if
is a teaching. in conceptual
to the (a partial
specification
of a or an filler the of a
property
entity component of a partial relationship The SOME construct guarantees that for the named of role ALL (without and any combination property. 3. 52% AND SUBSUMPTION SOME for
in E/R terminology). there will be at least on role its type). defines the
constraint a given
COMPUTATION
3.1. 3?%5? Syntax WY extends 5Z7with the concept-forming and NOT. In addition, 5?5?+ allows naming cepts, and atom negation. The number restriction, NR, corresponds multivalued properties and on relationships constructs mapping negation, constructs of concepts NR, ALL., ( =), primitive NR., conon two
ALL. and NR ~ allow the description of attributes, i.e., roles into value sets. The construct NOT specifies a restricted form of useful to state disjointedness constraints between concepts.
ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992.
392
S. Bergamaschi grammar
{termmology) ( term-declaration)
The language
(concept-declaration)
= {concept)
{concept))
(concept
) ::= (concept-atom)l NOTHINGI ( NOT(concept-atom) (concept-name) (AND(conceptl) I . . . (concept. ))1 (max))l value-set )(min), ))1 )1
))1
( ALL(
role )(concept
(NR(role)(min), ( ALL.(
attribute-atom)(
(NRn(attribute-atom < role) (rein (max) ( value-set) ::= < role-atom) ) ::= Ol(natural-number) ::= Olnl(natural-number) ::= integer lreall stringl
( max))
( value-set-specification) ( value-set-specification) ::= ( integer-range) ( real-range)l ( Lalue-domain-range) (value-domain-name) ( lalue-domain-declaration) ::= ( ualue-domain-name) = (( ualue-atoml) . . . ( ualue-atom~ )) I I
Any concept and value domain name may occur only once on the left-hand side of a term introduction in a terminology. Any terminology must be acyclic, i.e., there may not be any term introduction C = C in a terminology if C (directly or indirectly) occurs in C.4 The value atoms must be specified in only one value domain declaration, thus different value domain names identify domain. disjoint value sets. A total order is defined on the atoms of a value
prevents
due to roles.
Reasoning in Conceptual ), (real-range), pro~amming ), (concept-atom), and ( value-atom) equivalence statements name assign
Design
393
( value-domain-name),
The = statements allow an concept. For instance, the following person (primitive)
= ( AND
and a student
p( ALLU (NRa name name ALL (NR
(defined):
string)
person
1,1))
enrolled-in teaching)
student
= (AND
person(
enrolled-in
1,n))
Note that the fictitious atom p has been introduced in the description of person in order to represent the primitive nature of the concept, that is, p includes the unknown conditions that would make the description of the concept duced, a definition. thus allowing For each primitive concept, between a new two atom must be introwith there in of the max us to distinguish the max) SOME max iff primitive concepts that at most description filling by
the same type structure.G The NR construct extends will be at least fillers refers role min in role (the and general, different student enrolled-in that the qualification The ALL. x is in (NR r rein,
of >%5?- and guarantees role fillers at least for min the the above and
at most
named
role;
x has
to a specialization of a property
of person is given
at least
one item
is indicated constructors:
n). Note
description
by two
of fillers (ALL) and one for the quantification ( NR). and NR~ constructs are similar to ALL and NR, but sets. NOTHING denotes allows the specification concepts. (NOT.. and For with
attributes mapping into value The (NOT( concept-atom)) tween common other classes described descriptions words,
person
instance,
following
specialization
p(NOT ( ALL. t) name ) name
is the empty
= (AND
string
)(NRa
name
1,1))
teaching
= ( AND
t ( NOTp ( ALL.
string
)(NR~
name
1,1))
5Arbitrary enumerated subsets are not admitted. 6An ad hoc high-level language construct as prim could be easily added to %!?*, and thus an atom p could be automatically introduced by the system. ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992.
394 3.2 E%
and C. Sartorl
be associated (concept
in an application and an attribute an application and attribute the terminolof the powerset
description
can be modeled
value domain
and a mapping
and couples
if a terminology
theory, it is possible to map the terminology to different domains same domain in different ways. Every mapping has characteristics outlined This Following defined.~ assigning definition associated subsumption Definition by the semantics formalization this approach, assigned a family to the language. as model called theoretic extension technique is known
approach functions
of functions
The extension functions family defines the language semantics by an extension to each concept, role, and attribute. This explicit of the semantics of each language expression allows the sets to the concept descriptions to be compared in order to define a procedure and prove function). its correctness Let and completeness. that is, a set
1 (extension
T be a terminology,
of SC%*; TC the set of concept specifications, TC~ and the sets of concept names and concept atoms mentioned in
T; T, and T= the sets of role and attribute atoms mentioned in T, respecin T, including the Tz,~ and TV. sets tively;8 TU the set of value sets mentioned of value domain names and value and roles attributes atoms, of the respectively. domain. Let ~ Let C and R c C X C be the sets of concepts set of value the domain sets and and of the domain; let V and A c C X V be the
= C u R u V u A be
g be a function
such that
Va=Ta, Vv c T,,,
%[a]
%[u]
q q
2A 2
7 This definition of the semmtics of a language is simdar to the formal for types presented by Cardelli [20] and adopted in the data model 02. a Obviously T, Q TCa u T,=.
set-theoretic
semantics
ACM TransactIons
. if
395 the
TC.
(1) (2)
i=~
%[(ANDC1 ,..
.,cn)]= f)%[cz]
~=1 = {Zeqby = {x ~lrnin with itl,
q%[r]]
ilf.
(x)= {y
=Vl(x,
y)
q%[a]}
~T=%[cn]
=%[c]
=%[v]
(8)
(9)
q Ta%[vn]
set u to the languages. be univocally extensions of a concept
% univocally
a value
set of values
%[ v],
according The
to the rules
of standard complex
extension from
of any
by the terms.g
equalities Conditions
of the
the extension
and a value
domain name. Therefore, given a terminology a function satisfying the above conditions, valid, i.e., it is a model, in the model we can give schema defined by T. As a last observation, concepts declarations: with its declaration,
T, where 9 is a domain and ~ the interpretation I = (9, 8) is sense [46], of the for that knowledge primitive SC)
theoretic an
extension concept
en = (AND
=%[s]
3.3
Subsumption
The adoption of an extensional semantics for the /9 language following formal definition of the subsumption relationship: Definition 2 ( subsumption). c subsumes Given two concepts %[c] c and c,
c ~ VQ7 V%ouer&Z,
C%[c].
9 The uniqueness
is guaranteed
by the acyclicity
of the terminology
ACM Transactions
396 The
. main
S. Bergamaschi activity
of taxonomic
the subsumption relationship holds for an arbitrary pair of concepts. In the following we show how this computation can be performed by the boolean function SUBS on the basis of a syntactical comparison of the concept descriptions. Algorithm 1 (subs). SUBS is a function defined as
with COMPARE and CONFORM defined as below. The SUBS algorithm is a considerable extension of the one proposed by Brachman and Levesque [16], since it deals with concept names, attributes, number restrictions, atom negations, and contradictory concepts, and exploits techniques introduced by Nebel [42]. CANFORM transforms a concept declaration purpose simpler names moved, discovered the results expressions. subsumption of this the transformation which is to obtain of permutations) is semantically comparison by their by terms equivalent into a canonical description In particular, nestings term. form. (apart The from
a unique
COMPARE.
are substituted
descriptions; NOTHING;
subexpressions
are substituted
As a first step, let us give the definitions denoted in the following as NOTHING). Definition respect 3 (contradictory T iff VL?J V~ Algorithm CANFORM defines pressions of the description. Algorithm 2 (canform). over 9, concept).
A concept,
c, is contradictory
with
to a terminology
%[C]
a set of rewriting
CANFORM IS a function
defined as
by the corresponding
10 It has been observed by Nebel [43] that a concept description with m terms is transformed by name substitution into a description which has, in the worst case, t= 0(m d) terms, where d is the maximum depth of the concept taxonomy. Therefore, name substitution generates an inherent intractability, because of an exponential growth if the order of magnitude of d is greater than log(n), where n is the number of concepts of the taxonomy. On the other hand, this latter case is infrequent in real knowledge bases, and in the following we assume that name substitution does not generate exponential complexity. ACM TransactIons on Database Systems, Vol. 17, No. 3, September 1992
On Taxonomic
Reasoning In Conceptual
Design
397
2b The associative property of the AND operation nested ANDs from the function arguments. (AND... (ANC,.. C,... C,)... C,+ CAN)...
2C If an AND expression contains ALL (ALL,) terms referring to the same role (attribute), they are replaced by a single term, with the concepts (value domains) grouped by AND: (AND . . . (AU.r(ANDcl ~ (ANi3... . .. C.)) . ..(ALLr(AND cc, c,+lcc~) c.+l... ).. c~))... en))...) )11
(ALLr(AND
2d If an AND expression contains two terms like (ALL, a v) and (NR, a rein, max) and v is a finite countable value domain with cardinality m, with m s max, then max is replaced by m: (AND . . . (ALL, ++ (AND... a v)(NR. amin, max) . ..) amin, m)...)
(ALLaav)(NR,
2e If an AND expression contains an atom and its negation, then the AND expression is replaced by NOTHING: (AN13... 2f
c...
(NOTC)
contains
(AND. . . V,...
V)+ NO THINGHING
2h If an AND expression contains at least two value domain ranges referring to the same value domain name, but with an empty intersection (%[vII n %[vJ = 0), then the AND expression is replaced by NOTHING: (AND... 2i vi.. . V2) NOTHING by an NR
If an ALL (ALL.) term has NOTH/NG as a filler, then it is replaced (NR=) term with min = max = O: (ALL r NOTH/NG) + (NRrO, O)
2j The NR (NR~) terms referring to the same role (attribute) are replaced by a single term having as its number restriction the intersection of the original intervals: (AND . . . (NRrminl, _ maxi) . . . (NRrmin2, max2) . ..) max2)) . . . ) the same role and
(AND . . . (NRrmax(minl,
min2), min(max7,
2k The ALL (ALL.) terms with an NR (Nf?.) term containing having min = rnax = O are eiiminated:i2 (AND . . . (ALL r... )( NffrO, O)) ~ (AND . . . (NRrO, O))
value restrictions which are not AND expressions are converted into AND expressions: m) ++ (AU r(AND c)) and (.ALLa ..) * (ALL. a(AND u)). fact, %[( AND(ALL ry)(iVRrO, O)] = %[( NRrO, O)] Vy G C. In other words, (NRrO, O) dethe extreme case reachable by restricting the set of fillers for a given role. ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992.
398
S. Bergamaschi
and C, Sartori
conflicting number restrictions,
2 is extension
V9, V%
PROOF SKETCH.
8[C] follows
=%[CAIWORM(C)].
from the application from of the definition
u
The proof
of
function structurally
% given
by Equations the
compares
expressions to detect
by CANFORM between
and value
COMPDOM
set inclusion
3 (compdom).
COMPDOM
IS a function
defined as
COMPDOM:
COMPDOM(X, y) = true iff: 3a (y= NOTH/NG) v 3b (x is a value domain name or a value domam range) = (Y is a value domain range) A %[yl c %[x I). Algorithm 4 (compare). Given a couple of concepts COMPARE(C, c) = true iff 4a ~
C = [Vc;
[n canonical
form, c and c,
(AND
C:
. ..c.r),
4d
$~m=~~.a A
mint,
fWX))
(3c;
= (NR,a
rein, max)
mint 2 rein
max
((3c;
= (ALL
4f
((~c;
= (ALL.a
y)
COMPDOM(X,
y) = true) v Elc;
THEOREM
computation). c -
Given SUBS(C,
c and
c,
c subsumes
PROOF.
that
SUBS
subsumption
test
descriptions,
SUBS(C,
c ) = true
~ c subsumes
c ~S[c]
V&7 V%over
ry)] Vy e C.
On Taxonomic This this equivalence is equivalent then it will must hold for every
Design
399
function
associating 1, COM-
individuals PARE,
to concept
to proving be shown c)
soundness
] c %[c]
VSZ V% over ~
VS2 V% over 9
(COMPARE(C,
c ) = true)
is polynomial. a couple of concepts c ) = true that exists c and * c, in canonical c (case 4a of case form,
Given
c subsumes c is
either (i.e.,
NOTHING
4) or for each
the appropriate
(from 4b to 4f) of Algorithm 4 is verified tion is true). It will then be shown that
the consequent
of the implica-
If cases 4c, 4d, or 4f are verified, that max)] rein , max)] v)] g%[(NRrmin, g%[(NR~amin, C%[(ALL~a < max) calls
3.2, it results
%[(NRrmin, %[(AIR~a
i%[(A.LL~a V13 V% ouer $Z if (rein true, respectively. Having tion proved the principle can be applied COMPARE(x, then, by virtue of Eq. (4): 2Y[(ALLry)] In any case, by virtue > rein
A max recursive
the induc-
= true
~%[(ALLrx)],
81CUI=naq]pm;], =8nc; Vj [1
J j
and then
and qcf] C%
[1
nc;
i
= %[c]
V9
V&Z70uer13
ACM Transactions
400
S. Bergamaschi
Given
Completeness.
c = COMPARE(C, to *
c ) = true
is equivalent
c, c ) = false
not (c subsumes
c )
C) = false
c]
z %[c]. 4b to
is assumed
different
that
guarantees of canonical
property.
3 (complexity
generation).
The time
necessary
form is 0(z2 ), where ii is the of terms it contains after name definitions, if any. time. Rules terms 2j, at
all
terms
in the nested
of rule
in linear
2k require, level
u
outermost or less.
description;
in time
THEOREM
4 (complexity
of COMPARE).
The
time
to compute argument
li is the length
1 requires recursive
of all PARE
the
itself.
of COMPARE,
the complexity
the computation is 0(Fi2 ). The recursive calls do not increase complexity; in fact, say, nh, k = 1, ..., r, the length of one of the r nested terms (introduced by an (ALL.. following . )), the complexity relation holds: of a recursive
r
call
is O(fi~ ), but
since
the
7i>~nk
k=l
the global
complexity
with
recursive
calls
us to give a syntactical
properties
Given
of concepts.
T and two concepts c and c:
a terminology
c) = true,
that
is, if and
Design
401 and
(c = c) only if
c) = true
c are disjoint
SKETCH.
if and
SUBS(NOTHING, follow
u
The results
immediately SUBS.
of algorithm MINIMALITY
CONSISTENCY problem
AND
SCHEMA
of a schema a formal that is of section we show the use definition
A major consistent
of conceptual
design
redundancies.
of taxonomic
in conceptual
the consistency and minimality of an Y?* schema (i.e., a terminology). A classification algorithm is then introduced to assist schema acquisition, preserving consistency and minimality. An 5zZ* concept is described as a conjunction of ancestors (not necessarily parents) and local properties. A contradiction can arise either through a conflict between the local properties, between the inherited properties. a local property and an inherited Algorithm CANFORM transforms T, a concept properties. c, called generalizations one, or such is
conflicting descriptions into NOTHING. Furthermore, we say that, given a terminology minimal The if it contains set of all the only ancestor parent the SUBS
l,..., kl(t,
description
names
and local
concepts
of a concept
set, is computed
GS(c)={t,,
by using
i=
algorithm,
= TCn A SUBS(tZ,
description
of a concept
c, with
respect
the (user-supplied) : TC + T,
concept
description,
= (~~C1..
.Ch ICD),
{Cl..
.C~}
= MSGS(C)
where
the
generalization c,
Pc
q
is obtained
by GS(C):
MSGS(C)
i = 1,. ..,l(c,
G GS(C)ISUBS(C,,
c)
= true)}
(10)
and the difference concept CD, if any, respect to its parents, and is obtained below. Obviously, if the difference
of c with as shown no
concept
description,
concept c @ TC. such that SUBS(C, CD) = true exists. In the following we show that the minimal description can always
THEOREM
of an 5?9
concept
be expressed
in 92Z* and is unique. For any concept MSGS(C), with c expressed in %5? respect to a given
402
and C. Sartori = {a,, most i = 1, . . . . m} generalization and MSGS(C) that 6*) = true)) = {bJ, j =
PROOF.
specific From s
MSGS(C) ~ (ax
= {a.}.
@ MSGS(C))
G GS(C))
= GS(c]l(SUBS(a*,
@g = GS(c)l(SUBS(a*,
u
g)
= true))
difference
concept
c, with
respect
to the
is computed
through
Let c be a concept in canonical form and let us notation: (nc, ) = (AND. c, . ). Concept c can
c=[~(fiP)(fi(NRrnmn,m~,) ma
n
)(ri(AL~fl,cf,,)
nfa
(fW,an,arnin,,
arnax,)
(_) (ALL,
af,v,) 1)
( ,=1
1( ,=1
where p, IS either an atom or the negation of an atom, r n, and rf, are role names, cf, are concept descnptlons, an, and af, are attribute names, and v, are value domain descriptions, Analogously, the genus concept of the minimal description of c (briefly denoted by CG), has the following descnphon:
CG =
(ANDc1
. .. Ck).
C, G MSGS(C)
nfaG
n (A~~a ,=1
with Vi, j: rf, # ti, and af, + af,. By virtue of elementary set properties, SUBS(CG, it follows that: = true hoIds, and then, from the definition of subsumption, np > npG. All atoms and negations
ACM Transactions
in CG must be in
C;
On Taxonomic
Reasoning in Conceptual
Design
. All ~~
403 in CG
nn>nn Gand V/= l,... ,nn G ((rnin, 2 min~) A (ITEM,s must be in c, where they may have a restricted range; nf > nfG and Vi = 1, ..., nfG (SUBS(cf,G, cf,) = true); nnl((~, = m,) A (rnin, = max, Vj=l,.. .,nG(3i=l, l,..., in CG do not have a corresponding term in c because CANFORM; the survivors terms may have a restricted role
mx~)).
nna > nnaG and Vi = 1, ..., nnaG ((amin, > amin~) A (amax, s amax,G)). All NR, in CG must be in c, where they may have a restricted range; nfa > nfaG and Vi = 1, ..., nfaG (COMPDOM(V,G, V,) = Vj=l,.. .,nfG(3i=l, l,..., nnal((af, = an,) A (amin, terms in CG do not have a corresponding term in algorithm CANFORM; the survivors terms may have a true); = amax, = O))). Some ALL. c, because of rule 2k of restricted value domain.
The difference concept contains the terms in c that are not contained in CG or that have been modified with respect to c, ekher for a more restricted range or a more specialized role filler in the same role. The difference concept description is the following: cD=fl ( ,=]G+IP, n (NRrn, ]( 1=/ rein,, max,)]( fl (ALL ff,cf,)) /=/,
(
with In = {i = l,..., u{i=nn /f={i=l,..., l~fl={i= l,..., u{i=nnaG /~f={i=l,..., From minimal extension Definition description preserving, 10, that G+l,
n (NR.an, 1EIan
amin,, amax, )
)(
(1 (ALL. IEI,,
af,v, )
))
nnGl((min,
nfGl(cf,G #cf, )} U{i=nfG nnaG 1((amin, > amin~) +1, . . ..nna} )}u{i and
nfaGl(v,G+v, Algorithm is 5,
of a concept
c, with
a terminology
V&Z V%over9
In particular, it (MINDESC(C)). Definition description We are
(%[c] can be
= %[cG] CANFORM(C)
n %[c~]). = CANFORM
and
minimality). concepts,
contradictory
give
the
high-level of a system
description supporting
algorithm
52%* schema and maintaining consistency and minimality. This algorithm exploits the canonical form generation and subsumption algorithms presented in Section 3.3 and applies to the above definitions for consistency and minimality.
ACM Transactions on Database Systems, Vol. 17, No 3, September 1992.
404
S Bergamaschi
Algorithm 6 (CLASSIFY). Given a consistent concept c expressed in $Z?*: 6,1 If SUBS(rVOTHING, c) then reJect c, exit,
6.2 Add the minimal description of c to the terminology T:= T U {c= MlNDESC(c)} 6.3 If there does not exist in T any concept equwalent to c, then restructure the terminology by determining the new minimal descriphons for concepts subsumed by c:
if pc = Tcnlc* = c
then Vc = TISUBS(C, c) = true cfo T:=(T -{c=.. .}) u {c = MINDESC(C)} The classification This of concepts of a concept process requires operations, is therefore 0(t) strongly based description number on subsumption where t is the of determination of subsumers
operations,
in the taxonomy.
O(k 2) subsume
where
k is the
the new concept. 5. FORMALIZATION Up to now we have OF E / R IN 55? shown a fairly general language able to describe concepts.
We will now show how 3ZY is able to describe the modeling constructs of the E/R model, including some well-accepted extensions such as generalization, multiple and compound attributes, and cardinality Batini et al. [7]. In Appendix A the syntax of the definition langaage drawn from Batini ratios, as described by extended E\R schema The application of
5Z$F* to E/R allows, as a fundamental extension, the description of defined entities. Moreover, the generalizations of an entity and its participation in relationships are directly included in the description of the entity itself. This makes most proach the definition of the schema is lined of an E/R is described most schema in some way models
entity centered;
that This
is, ap-
as a characteristic
of an entity.
up with
semantic
object-oriented
models [1, 2, 4, 38, 40]. The above extensions reasoning techniques, as will be shown later. as follows:
( entity-declaration)
of taxonomic of an entity is
= ( AND(entity-atom = (entity )I ~) . . . )
)(entity)
entity-component
{entity-component.
( entity-component) ::= ( attribute)l ( cornpound-attribut (generalization) (exclusion )1 I e)l
))
(relationship-side) ACM TransactIons on Database Systems. Vol. 17, No. 3, September 1992
On Taxonomic
(attribute) ::= (AND(
Reasoning in Conceptual
ALL~(attribute-atom)( (NR~(attribute-atom)
Design
value-set ))
405
(rein),
(max)))
(compound-attribute)
::= (AND(
ALL(attribute-atom) (NR(attribute-atom)
::= (~D(
attribute,)
. . . (attribute))
::= (entity-name
)l(entity-atom)
(NR(role)(min),
It
can
be seen
that
an entity
can
be described E/R,
as primitive an entity
) which
Furthermore,
as an addition
to the traditional
terized as a conjunction of entities inheritance and, by means of the the basis of participation
in a relationship,
a specific
) represents one side of a relationship. category (relationship-side The description of a relationship is then reduced to the description
attributes,
with
the following
syntax:
::= (relationship-name) ::= (relationship-atom)l (attribute) I I . . . (attribute.)) = ( relationship )
( relationship-declaration) ( relationship)
(compound-attribute) (AND(attributel)
Since the semantics chies are not allowed. syntax of Section 3.1. Finally, the constraint entities be represented
.. en = (ANDe(NOTeal)
is mainly represented on entities, relationship hierarCardinalities and value sets are defined as in the Y?* of exclusive hierarchy with respect to the primitive eal, . . . . ea~, can
by the atoms
as follows:
e, = (ANDe(NOTea2)
. ..(NOTe).
. ..(NOTe_l)
l)...)
By comparing
the extended
E/R
syntax
of Appendix
A with
the application
of YZ%* to E/R, it can be observed that some aspects notion of total generalization hierarchy (coveragel) representation of these aspects requires an exension would make a complete total subsumption hierarchies is introduced [48]. Instead, algorithm really of representing a limit
cannot be captured: the and identifiers. The of the language which The incapability is related to the and
intractable.
at constraint on schema
406
S. Bergamaschi
and C. Sat-tori
name
person*
A
enrolled-inz student 1,n 1
/ !
enroll I,n
course
enrollment
Fig. 2. E/R schema example.
of taxonomic by a simple
reasoning schema
by introducing
one concept
at a time,
CLASSIFY algorithm will be pointed out. In Example 5.1 the simple schema of Figure 2 is translated into 5??. In Example 5.2, the CLASSIFY algorithm is executed, consistency is checked, and a minimal schema is produced. In Example 5.3, a nonminimal concept description is transformed into a minimal one, and some comments on minimality with respect to primitive and defined entities are given. In Example 5.4, a case of contradictory concept detection Example disjoint described
person
is explained. 5.1 by
= (AND p(NOT t )( ALL= name string )(NR~
name
(E/El schema
and with
description).
Person
is
a primitive name.
entity, Thus, it is
from
teaching
the single-valued
attribute
1,1))
The relationship
enrollment
enrollment
= ( AND( ALL.
is descried
enroll-year
by
integer )( NR~ enroll-year
1,1))
structure but also
Student
is a defined
entity
described
not only
with
a data
by its participation
student = ( AND
in the relationship
person ( ALL enrolled-in
enrollment:
enrollment
)(NR
enrolled-in
1,n ))
Teaching
teaching
and course
= ( AND = ( AND
are described
t(NOTp)( teaching ( ALL enroll ALL.
as follows:
name string )(NRa name
1,1))
course
enrollment
)(NR
enroll
1, n))
The relationship enrollment is therefore described through its explicit duction, and the above implicit descriptions of the two relationship enroll student and enrolled-in, which are given in the description of course respectively.
on Database Systems, Vol. 17, No 3, September 1992.
introsides, and
ACM Transactions
On Taxonomic
Reasoning in Conceptual
Design
407
1
I
person
gracistudent* enrollment
Fig. 3. E/R schema before classification.
enrollment
Example
5.2 (E/R
schema
acquisition).
is executed. CANFORM
For each of the above entities, the First of all, in step 6.1 the canonical form algorithm. Person and teaching are left are transformed by rule 2a of Algorithm
student
and course
= ( ANDp(NOT (ALL
name
string)(NRa
name
course
= ( AND
t (NOTp (ALL
enrollment
)( NR enroll
1,n))
Since none of the above entities is NOTHING, step 6.2 is executed for each entity. In particular, algorithm SUBS finds that no subsumption holds between person and teaching, since, even if they share a common attribute (name), they have different atoms (p and t). Note that the disjunction constraint consistent Example is a person does not affect this result. Then the schema described so far is and minimal. 5.3 ( Minimality). enrolled The new primitive entity grad-student, which
in an enrollment,
gs person (ALL enrolled-in
is to be added
to the schema.
grad-student
= ( ANll
enrollment
)(NR
enrolled-in
1, 1))
fragment
of Figure
3.
name
string
)(NRa
name
1,1)
enrolled-in
enrolled-in
enrollment)(NR
1,1))
the de-
Algorithm scription
6 computes of grad-student
= (AND
SUBS(student, is modified
gs student ]
) = true. minimal
Then form:
grad-student
corresponding
to the E/R
schema
fragment
of Figure
4.
ACM Transactions
408
S. Bergamaschi
and C. Sartorl
m
person*
Fig. 4. E/R schema after classification.
student ii
r \
\ d
\\
enrollment
II!!5
gradstudent*
It is worth grad-student grad-student been description) noting would that have it has been been possible to classify entity atom the primitive entity, gs would it would not entity if have as a specialization as equivalent of the defined introduced to student (the student. Furthermore, as a defined recognized be in the and taken as a synonym (see Algorithm The 6, step 6.2). example role which illustrates restricas a Example 5.4 (Contradictory how inconsistencies on entity tions student are detected. of a last-year-student, 30 units:
25, 30))
concept). descriptions
following
due to wrong
number
is defined
25 and at most
student (NR
last-year-student
passed-units
The addition
description introduces
last-year-bad-student
= ( AND
last-year-student
(NR
passed-units
1, 20))
In this
case, algorithm
CANFORM
= ( AND
produces
t ) name
the following
canonical
form:
last-year-bad-student
p( N(3T ( ALL.
string
)(NR~
name
1,1)
( ALL
( NR
enrolled-in enrolled-in
( NR passed-units
The number restriction of role passed-units, computed by rule 2j of Algorithm 2, makes the above definition contradictory on the basis of rules 21 and 2e. Therefore the entity is rejected by step 6.1 of Algorithm 6. 6. ~~;., AND SUBSUMPTION COMPUTATION allows repre-
extends @5f* with the role-forming construct INV, which Yy; sentation of the notion of inuerse role. INV has been introduced
ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992.
to deal with
On Taxonomic
Reasoning in Conceptual
Design
409
1,t
name b
u
string
11 d < name
teaching*
Y student
xx
++
/
1,n
course
(INV enrolled-in)
Fig. 5.
>9C%;,,schema example.
the
inverse
function
of functional
models
as DAPLEX as follows:
and
IFO.
The
SZ5?
syntax
presented
in Section
3.1 is extended
(role)
(WV
r)
The function
a role whose extension is the mirror image of the role r. of inverse roles is defined by completing the extension 3.2 as follows: Z$-[(IIVVr)] = {(x, y)l(y, x) =%[r]} (11) the arbitrary role 5. Therefore, course
The enroll
example with
by substituting in Figure
as shown
can be described
course = (AND
(NR(
INV
enrolled-in)l, enrolled-in)
n) student ))
( ALL(INV
It is worth
noting
that
inverse
roles
make
available
a new
mechanism
for
the definition of ~oncepts: a specialization of characterized on the basis of being the role filler for a given concept (student). The presence in the same concept description inverse, gives rise to new powerful subsumption the following description: ( AND(NR ( ALL
enrolled-in enrolled-in(
1,n)
AND(NR( INVenrolled-in)l,
1)
student ))))
(ALL(INV enrolled-in)
ACM Transactions
410 The
and C. Sartori roles implies only roles that the anonymous individuates which enrolls to extend concept above is one
semantics
subsumed
by student.
it necessary
the SUBS,
and COMPARE algorithms, giving rise to the SUBSINV, CANand COMPAREINV algorithms, respectively, for the subsumpin a terminology Tn, which includes defined as inverse roles.
computation
Algorithm 7 (SUBSINV).
SUBSINV IS a function
SUBSINV: T;v x T~ ~ {true, fake} SUBSINV(t, u) ~ COMPAREINV(CANFORMINV(t) defined as below is defined as T;nv , CANFORMINV(U))
Algorithm 8 (CANFORMINV),
CANFORMINV
CANFORM14
8m When the role filler of a role r IS defined on the basis of the Inverse of the role itself, say /NV r, and there IS at least one roll filler value, It is possible to simplify the concept description In accordance with the followlng rule: (AND . . . (ALL r(ANEJ . . .
(ALL(/NV~)C)
)))
I)))
If an AND expression (AND c1 . . . CP) contains a term in which the role filler of a role r E defined on the basis of the Inverse of the role itself, such as CP = (ALL r(AND . . . (ALL(hVV r)x))), and with at least one role filler value, such as c ~.l =( NRrrn, rr) with m >1, and the followlng condlhon holds: (AND C1. ..c ~. 2X) = NOTHING, then (AND C, . . . cP) ~ NOTHING
14All rules appliable to a role r are appliable to a role INV r. lS The same rule holds if r is the inverse of a role s (r = (INV transformation could have been applied equally well with ( NR( INV because this generalization is useless for algor,thm CANFORMINV.
s) and (LVV r) = s). The n this r )1, ))); was not done
ACM TransactIons
On Taxonomic
Reasoning in Conceptual
Design
411
80 When the role filler of a role is defined on the basis of a number restriction on
the inverse of the role itself with min = max = O, the ALL term is replaced by an NR term with min = ma = O. (ALL r(NR(hVVr)13,0)) Algorithm COMPAREINV inferences for inverse roles: extends H (M/r 0,0) the specialized
COMPARE
by adding
Algorithm 9 (COMPAREINV).
PAREINV(C, c) = true iff: 9a (c = NOTHING) v c=(ANDc
[Vc;, i= l,.. j;j~)) c
in canonical
form, COM-
I = (AND
C;
. ..c.),
9b... v
9f logical expressions
from 4b to 4f of Algorlthm A
COMPARE
9g 3c; = (AU r(AND . . . (ALf. (/NV r)x))) 2c; =( NRrm, n) Am21 A COMPAREINV(cj, x) = true]. 7. FORMALIZATION DAPLEX systems [50] based OF DAPLEX definition
IN 5?5?; ,, and data manipulation model, which language was first for database by
is a data on the
functional
introduced
Sibley and Kershberg entity and the function, The functional data model if we compare
[51]. The two main constructs which model conceptual objects model, entities
on the surface, is very similar to the KL-ONE to concepts and functions to roles. Therefore, a as in Figure 1. as rules to fill Therefore, with
DAPLEX schema can easily be drawn with ellipses and arrows On the other hand, in DAPLEX the functions are not used entity DAPLEX classes, but constitute to model
) * person) ) * ) * ) *
only
integrity
constraints. of Figure
it is possible
person( name( student( teaching( course( name(
the schema
2 as follows:
teaching)
student) ) *
> INVERSE-OF
Note that an immediate translation from entities and functions of the above schema to YY,;,, concepts and roles is not possible, as it is necessary to prevent the definitional cycle between student and course. The best possible shown in Figure 5. solution in >9,~,, is the one already
ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992.
412
S. Bergamaschl schema
A DAPLEX follows:
can be described
(schema) (entity-declaration)
::= (entity
-declarationl
) . . . (entity-declaration.
::= ( prim-entity
-declaration)l
( function-declaration)) ( clef-entity-declaration) ( function-declaration) ::= ( entity-name) ::= ( function-decl ( function-defn)l (entity-name (exclusion)l (entity ( AND( -tuple) I )... )I = ( function-declaration) )1
function-declarationi ( function-declaration.))
( function-decl
I (attr-predicate) I
role-predicate ualue-set ) ))
(attr-function)
::=
( ALLa(attr-name)(
role)( ALL(
)) ))
1,1))
rein), )) ( max))
(attr-predicate)
(role-predicate) ( function-defn)
min ), (max )
entity-namel
( ALL(IIW(
role-name
))(entity-name,
)))
(exclusion) ( roze}
::= ::=
(NOT
entity-atom)
of DAPLEX in Y5Z;U presents to allow definitional semantics practical, since in DAPLEX the
the the is
Design
413 the
Y2~U
requires
AND constructor. Note that the syntactic category ( role-function) allows the description of nested functions, which represents an extension with respect to DAPLEX, but is also an important feature supported by the more recent IFO and easily expressed by J9~. The description of an entity is taken as defined. If a primitive description is required, the distinction is obtained by adding an entity atom to the AND combination Example Figure of the components. 7.1 (DAPLEX schema in ~~~,,
t)(ALLa
description). as follows:
name name string
The
DAPLEX
schema
of
5 can be expressed
person = (AND = (AND = ( AND = ( AND p(NOT
)(NR~
name name
1,1)) 1,1))
1, n))
enrolled-in
enrolled-in
teaching(
) student ))17 This example shows role filler are detected algorithm. Let us entities and ad-
Example 7.2 (DAPLEX schema acquisition example). how local inconsistencies introduced by a contradictory and removed, in accordance with the CANFORMINV consider the schema description grad-student described introduced (disjoint above from undergrad-student, vanced-course,
undergrad-student),
as follows:
us student gs student course ( ALL(INV enrolled-in) undergrad-student )) (NOTgs (NOT )) us ) )
specialized-course,
= (AND
defined
as follows:
specialized-course
enrolled-in) an results
is NR as
2e generate description
O) )
specialized-course
= (AND
Obviously, even though this description of a teaching individuals is not contradictory, its rewriting points out as it is probably a useless concept.
o The procedural features of DAPLEX are disregarded, since they are beyond the expressive capabilities of Y~~, which is designed only for data structure descriptions. 17Note that, for simplicity, the ~ ~U syntax does not allow naming of inverse roles. ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992.
414
S Bergamaschi
and C. Sartori
e,,
,l_<gy
1,1 enrollment I
enroll-year.,
), 1,1 2
r?
1,1 T integer n
Fig. 6, DAPLEX schema with a multiargument function.
Example 7.3 (Inference with inverse roles). This example shows how concept that is contradictory because of an inverse role is detected. First, course, declared. enrolled named x-course, which can enroll exactly as being one graduated an undergraduate student student Then, the x-student is defined in at least one x-course.
= ( AND teaching( ALL( INV
a a is
x-course
enrolled-in 1))
)grad-student
x-course)) 1, n)
( NR enrolled-in
Algorithm following
( AND
CANFORMINV description:
under-grad-student ( ALL enrolled-in(
generates
(at
a selected
intermediate
step)
the
AND
1))
(NR
enrolled-in
1,n))
and rule
( AND
8m produces
under-grad-student . . . grad-student ...)
which
is obviously
contradictory.
ACM TransactIons
Design This
415
multiargument terminology)
(or aggregation
in Figure
6, is an aggregation enroll-year:
rl
= (AND(ALL
(ALLr2 (ALL.
rl
1,1) 1)
)(NR. enroll-year
integer
1,1)
WORKS of work schema since to be considered validation. the former Most was developed work in the area substanconcepts of
category conceptual
database
of this
differs defined
our approach
[2, 5, 6, 28, 36, 381. With this perspective, the main activity is to check concept description consistency with respect to the given explicit specialization ordering. Our approach, together with those of Finin and Silverman [32], Bergamaschi et al. [10], Delcambre and Davis [27], and Ait-Kaci [3], provide a more active role, allowing computation of the specialization ordering on the basis of concept descriptions. Atzeni and Parker [5] formally introduce consistency and redundancy of a conceptual schema, and the problem of checking the schema for these properties is reduced to a graph analysis problem. The framework of the solution is limited with respect to the expressivity of a schema definition: only explicit isa and disjointness statements are considered. Abiteboul and Hull [2] present the IFO model and a set of rules on isa relationships that consistency of an IFO schema (without disjointedness constraints) that no type of the schema is contradictory. Atzeni and Parker polynomial algorithm for the computation type system. The notion of set containment but either the type descriptors allow only in positive or negative form. of set containment is similar to that set containment guarantee by proving a
[6] introduce
explicit
Di Battista and Lenzerini [281 present a deductive method for E/R modeling. Its purpose is to provide a tool for consistency and minimality checking. The representational tions, disjointedness, mechanisms aggregation, are fairly mandatory powerful, including isa specificato a participation of an entity
relationship, and negation of all the above specifications. The checking algorithms are claimed to be tractable and complete, and the major limit is the lack of defined class semantics which prevents the inference of nonexplicit subsumptions. The problem of schema consistency is also considered by Lecluse et al. [381 and Lecluse and Richard [36] with respect to the Oz object-oriented data model. In these papers, assuming ideas from Cardellis work [20], type-checking algorithms that guarantee that descendant classes are consistent with their parent classes (inherited properties can only restrict value domains) are presented. Lecluse and Richard [36] introduce an algorithm that computes all
ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992.
416
S. Bergamaschi
and C. Sartori
the possible isa links between class descriptions on the basis of their type. This algorithm reflects exactly what a subsumption algorithm can do with implicit isa links are computed. Unfortudefined classes where the valid nately, the tractable algorithm presented is not complete, as the refinement rules on disjunctive types eliminate one case of subsumption [11]. Works of Finin and Silverman [32], Bergamaschi et al. [10], and Delcambre and Davis Silverman based heuristics [27] have [32] aims and methods an interactive and focuses than similar tool more for to the present knowledge interface and proofs. work. Finin and It is and other present acquisition. strategies On the
on user
on formalizations
hand, its data model is very simple Delcambre and Davis [27] present Their purpose is to discover
and does not allow multiple inheritance. for object-oriented schemas. a classifier relationships and/or inconsisten-
new structural
class, and their cies in order to refine a schema. They use the idea of defined semantics include class properties, number restrictions, and disjointness statements. The purpose and methods are in many aspects similar to those presented by Bergamaschi et al. [ 10], but neither considers inverse roles nor generation of a minimal schema. To conclude, subsumption is used as a model of computation for knowledge representation a calculus objects of objects languages of type and label by Ait-Kaci [3]. A programming Type semantics, type language symbols This based denote on sets maps subsumption symbols the is proposed.
intension
structures
of the language,
of functions.
the partial ordering on type symbols into set inclusion, and defines subsumption between type structures in a way similar to the subsumption algorithms developed in hybrid systems and in the present work. The definition of two operators meet (greatest lower bounds) similar to unification and generalization The difference with respect to our structures are primitive. (least upper bounds) is and join of logic programming, respectively. and FDLs
is
proposal
again
that
type
9. SUBSUMPTION
FOR
DATABASE
QUERIES
AND
INSTANCES
The application of subsumption computation to other outstanding database topics, such as instance validation and recognition and query processing, have recently been investigated by Borgida et al. [13] and Beck et al. [8]. In concepts, classification, these papers, by exploiting the semantics of defined new and subsumption in order to process database queries and recognize instances are explored. The novel feature of the proposed models is that DDL and DML are identical, thus providing uniform treatment of data objects, query objects, and view objects. The classification algorithm finds the correct placement for a query object in a given object taxonomy, and the criterion for this placement is the subsumption relationship object classes (the union of the instances of the descendant satisfy the query). Beck et al. [8] present the model CANDIDE more familiar to the database researcher, which is an extension
K4NDOR [44, 45]
representing
standard
data
types
of database
ACM Transactmns
. that which
algorithm
relationship, really
to be complete,
and tractable,
a trap
[41]. Borgida et al. more in the FDL algorithm with is beby respect
processing
and complete.
new features
to FDLs are coreference constraints,18 which specify simple equalities tween single valued roles and the ability to give instances of a concept
enumeration. Furthermore, the paper shows, thanks to the semantics of defined concepts, how the automatic recognition of an object of an application domain as an instance of a class can be done. Finally, the effectiveness of classification for intensional To summarize, CANDIDE CLASSIC objectives. presents a system them To compare query answering is shown. is proposed as a new conceptual based with on an FDL the present and its feasibility work, model, w=* while and
for database
the theoretical framework proposed in this level, being a general formal tool applicable
which can be used to face various database topics. In fact, the SUBS and CLASSIFY algorithms can be the basis of a system performing instance validation and recognition and query processing optimization. A formal treatment of instance validation and recognition, known as hybrid inference, has been investigated by many AI researchers, as surveyed Let by Nebel [42]. on the use of subsumption whether is equivalent embedded for database object that instantiation. to the not step o* does The first is to decide c. This a given o belongs us give a hint
to checking
constraints
in the c description.
determining for each instantiated role the most specialized concept abstracts it and for each attribute the most specialized value domain. SUBS validates o*, as usual in a database environment, as follows: c* = conceptualize SUBS(C, c*) as an instance concept of one or more end, two concepts strategies
of the are
c. To this
(2) MINDESC(C) is computed and o is stored as an instance in MSGS(C *), without any modification in the taxonomy.
18This constraint
is also represented
by Ait-Kaci
[3].
ACM Transactions
418
Analogous queries
as an object
description.
10. CONCLUSIONS In this paper a theoretical framework for schema The aim organized acquisition that preserves
consistency and minimality is presented. designer to build a conceptual schema, taxonomy, To this been ment and
is to allow the database in a strict inheritance in any order. @Y* and /91~U, in an AI of conceptual have models
by freely supplying new concept descriptions end, two distinct compositional formalisms, which extend FDL languages data represent, respectively, the semantics
introduced,
developed
environ-
giving prominence to type structure and attributes. These formalisms include the defined concept semantics and constitute a general formal tool to support the consistency and minimality of a conceptual schema, applicable to any conceptual model: the 52?* and $7$7: ~ represenation of the data semantics of the E/R and DAPLEX Schema consistency concept, which and are is ensured able to models have the aim of showing this capability. is formally defined by the definition of a contradictory by the algorithms CANFORM concepts consistency and with checks, the minimal algorithm, and CANFORMINV, to a given concept of a based contradictory passive respect
detect
relationship,
are formally
The classification
on the complete and tractable subsumption relationship algorithm SUBS, permits the more active part of determining the minimal description of a new concept, thus classifying it in the right place of a given taxonomy. The results of this paper, as skeched in Section 9, can be the foundation for relevant contributions on database querying and instantiation driven by subsumption. APPENDIX A SYNTAX FOR the E/R syntax SCHEMA for E/R DECLARATION schema declaration presented by
+ ENTITY: entity-name [attribute-sectlonl [compound-attnb ute-section] [identifier-section] + ATTRIBUTES: attribute-decl attribute-name type-decl
attribute-section
On Taxonomic type-decl
Reasoning in Conceptual
Design
419
compound-attribute-section + COMPOUND ATTRIBUTES: {comp-attr-decl} comp-attr-decl + [(min-card:max-card)] comp-attr-name of {attribute-decl} identifier-section + {identifier-decl} Identifier-decl + attribute-LIST generalization-section + [gen-bier-section] [SUBSET-section] [(coverage, coverage)] FATHER: entity-name SONS: entity-name-listig GEN-name
coveragel coverage2
+ P IT + E IO OF entity-name
relationship-section + (relationship-decl) relationship-decl + RELATIONSHIP: relationship-name CONNECTED ENTITIES: {corm-entity-decl} AlTRIBUTES: {attribute-decl} corm-ent-decl APPENDIX This + [(min-card:max-card)l entity-name
B SYNTAX recalls
DECLARATION
syntax for schema in 3ZY*. declaration,
appendix
presented
by Shipman
can be expressed
schema + {declarative} declarative + entity-declarative I function-declarative entity-declarative + DECLARE entity-name + > ENTITY function-declarative + DECLARE function-decl I DEFINE definition function-decl + function I function predicate function + role-function I attr-function attr-function + attr-name (entity-tuple) multiplicity value-set I role-function + role-name (entity-tuple) multiplicity entity-name multiplicity + - I -> definition + function-defn I entity-defn function-defn + role-name-1 (entity-set-1) I INVERSE OF role-name-2 (entity-set-2)
420
S. Bergamaschi
and C. Sartor!
entity-defn + entity-name INTERSECTION OF entity-tuple entny-tuple + entity-name I entity-name, entity-tuple pred + quant EXIST quant + AT (LEAST I MOST) integer value-set + integer I real I string I boolean I {atom,, . . . . atom.}
ACKNOWLEDGMENTS
Tiberio
comments
on the subsumption
suggestions
considerably
improved
of the paper.
REFERENCES 1, ARITEBOUL, S., AND GRUMBACH, S. Col: A logic-based language for complex objects. In EDBT Science N.303, S. Ceri, J. W. Schmidt, and M. Missikoff, 88Lecture Notes in Computer Eds,, Springer-Verlag, New York, 1988, pp. 271-293. 2. ABITEBOUL, S., AND HULL, R. IFO: A formal semantic database model. ACM Trans. Database Syst. 12, 4 (1987), 525-565. of the 1st 3. AIT-KACI, H. Type subsumption as a model of computation. In Proceedings International Workshop on Expert Database Systems, Benjamin/Cummmgs, Menlo Park, Calif., 1986, pp. 115-140. 4. ALBANO, A., CARDELLI, L., AND ORSINI, R. Galileo: A strongly typed, interactive conceptual Syst 10, 2 (1985), 230-260. language. ACM Trans. Database 5. ATZENI, P., AND PARKER, D. S. Formal properties of net-based knowledge representation Eng. 3 (1988), 137147, schemes, Data Knowl. 6. ATZENI, P., AND PARKER, D. S. Set containment inference and syllogisms. Theor. Comput. Sci. 62 (1988), 39-65. and Logzcal Database Design: The 7, B~TINI, C., CERI, S., AND NAVATHE, S. B. Conceptual Entity-Relationship Approach. Benjamin/Cummmgs, Menlo Park, Calif., 1992, 8. BEGK, H. W., GALA, S. K., AND NAVATHE, S. B. Classification as a query processing technique of the 5th International Conference on Data in the CANDIDE data model. In Proceedings Engineering (Los Angeles, Feb., 1989), pp. 572-581. 9. B~RGAMASCHI, S., BONFATTI, F., CAVAZZA, L., SARTORI,C., AND TIBERIO, P. Relational database design for the intensional aspects of a knowledge base. Znf. Syst. 13, 3 (1988), 245-256. 10. BERGAMASCHI, S., CAVEDONI, L., SARTORI, C., AND TIBERIO, P. On taxonomical reasoning in of the 7th International Conference on the Entzty RelatzonE/R envmonment. In Proceedings ship Approach (Roma, Italy, Ott., 1988), Elsevier Science, North-Holland, Amsterdamj 1989, pp. 443-454. 11. BERGAMASCHI, S., AND NEBEL, B. The complexity of multiple inheritance in complex object on AI and ObjectsIJCAl 91 (Sidney, Australia, Aug. 1991). data models. In Workshop 12. BERGAMASCHI, S., AND SARTORI, C. On taxonomic reasoning in conceptual design. Tech. Rep. 78, CIOC-CNR, Bologna, Italy, 1991. 13. BORGIDA, A., BRACHMAN, R. J., MCGUINNESS, D. L., AND RESNICK, L. A. CLASSIC: A (Portland, Or., 1989), ACM, New York, 1989, structural data model for objects. In SIGMOD pp. 58-67, 14. BOUZEGHOUB, M., GARDMUN, G., AND METAIS, E. Database design tools: An expert system approach. In Proceedings of the Intern at~onal Conference on Very Large Databases (Stockolm, Aug., 1985), pp. 82-95. 15. BRACHMAN, R. J., GILBERT, V. P., AND LEVESQUE, H, J. An essential hybrid reasoning (Los Angeles, Aug., system: Knowledge and symbol level accounts of KRYPTON. In IJCAI 1985), pp. 532-539. ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992
On Taxonomic
Reasoning in Conceptual
Design
421
16. BRACHMAN, R. J., AND LEVESQUE, H. J. The tractability of subsumption description languages. In AAAZ (Austin, Tex., 1984), pp. 34-37.
in frame-based
17. BRACHMAN, R. J., AND SCHMOLZE, J. G. An overview of the KL-ONE knowledge representaSci. 9, 2 (1985), 171-216. tion system. Cognitive 18. BRODIE, M. L., AND MYLOPOULOS, J., EDS. On Knowledge Base Management Systems. Springer, New York, 1986. P., AND FRANKEL, R. E. FQL: A functional query language. In SIGMOD (Boston, 19. BUNE W, 1979), ACM, New York, 1979, pp. 52-58. of Data TypesLecture 20. CARDELLI, L. A semantics of multiple inheritance. In Semantics Notes in Computer Science, 173. Springer, New York, 1984, 51-67. and Tools for Database Design. North-Holland, Amsterdam, 21. CERI, S., ED. Methodology 1983. 22. CHEN, P. The entity-relationship modelTowards a unified view of data. ACM Trans.
Database Syst. 1,
1 (1976), 9-36.
Chen, Baton-Rouge, La., 1987. 24. CHOOBINEH, J., MANNINO, M. V., NUMAKER, J. F., AND KONSINSKV, B. R. An expert database design system based on forms. IEEE Trans. Softw. Eng. 14, 2 (1988), 242-253. 25. DAYAL, V. ET AL, PROBE: A research project in knowledge oriented database systemsPreliminary analysis. Tech. Rep. 85-03,, Computer Corporation of America, 1985. 26. DE TROVER, O. RIDL: A tool for the computer-assisted engineering of large databases in SIGMOD Rec. 18, 2 (June 1989), 418-429. the presence of integrity constraints. 27. DELCAMBRE, L. M. L., AND DAVIS, K. C. Automatic validation of object-oriented database of the 5th International Conference on Data Engineering (Los structures. In Proceedings Angeles, 1989), pp. 2-9. 28. DI BATTISTA, G., AND LENZERINI) M. A deductive method for Entity-Relationship modelling. of the 15th International Conference on Very Large Databases (Amsterdam, In Proceedings Aug. 1989), pp. 1321.
ER-Designer.
23. CHEN, P.
29. DONINI, F. M., LENZERINI, M., NARDI, D., AND NUTT, W. The complexity of concept languages. of the 2nd International Conference on Principles of Knowledge In KR 91. In Proceedings Representation and Reasoning (Cambridge, Apr. 1991), J. Allen, R. Fikes, and E. Sandewall, Ed., Morgan Kauffmann, Palo Alto, Calif., 1991, pp. 151-162. 30. DONINI, F. M., LENZERINI, M., NARDI, D., AND NUTT, W. Tractable concept languages. In lJCAZ 91 (Australia, Aug., 1991), pp. 458-463. 31. FERRARA, F. EASY-ER, an integrated
of the
of database
Entity-Relationship
32. FININ, T., AND SILVERMAN, D. Interactive classification as a knowledge acquisition tool. In Expert Database Systems, L. Kershberg, Ed. Benjamin\ Cummings, Menlo Park, Calif., 1986, pp. 79-90. 33. HAMMER, M. M., AND MCLEOD, D. Database description with SDM: A semantic data model.
ACM Trans. Database Comput. 518-537. Syst. Surv. 6, 3 (1981), 19, 3 (1987), 351-386. 34. HULL, R., AND KING, R.
Semantic
database
modelling
Survey,
applications
issues. ACM
31, 3 (1984), Symposium
201252.
35. HULL, R. B., AND YAP, C. K. 36. LECLUSE, C., AND RICHARD, P. on Principles
model: A theory
of database
organization.
of
SIGACT-SIGMOD-SIGART, pp. 362-369. 37. LECLUSE, C., AND RICHARD, P. The 02 database
15th International Conference on Very Large
of the programing language. In Proceedings (Amsterdam, Feb. 1989), pp. 411-422. 02, an object-oriented data model. In SIGMOD Databases
(Chicago, June 1988), ACM, New York, pp. 424-433. 39. LUCK, VON K., NEBEL, B., PELTASON, C., AND SCHMIEDEL, A. The BACK System. KIT 41, Tech. Univ. Berlin, 1987. 40. MYLOPOULOS, J., BERNSTEIN, P. A., AND WONG, H. K. T. A language facility for designing ACM Trans. Database Syst. 5, 2 (1980), 185-207. database-intensive applications.
ACM Transactions on Database Systems, Vol. 17, No. 3, September 1992.
422
S. Bergamaschi
Computational
371-383. Reasoning and Vol.
and C. Sartori
complexity
Revlslon 422,
41. NEBEL, B.
34, 3 (1988), 42. NEB~L, Artij?cial
of terminological
in Hybrid New
reasoning
in BACK.
Ai-tLf.
Intell. on
B.
Systems. Artif.
Notes
Intelligence,
Springer,
Terminological reasoning is inherently intractable. Research Note, 235-249. 44, PATEL-SCHNEIDER, P F. Small can be beautiful in knowledge
ings of the Workshop on Principles of KnowlecZge-Based Systems
43. NEBEL, B.
2 (1990).
IEEE, New York, 1984, pp. 11-16. 45. PATEL-SCHNEIDER, P. F. Afour-valued semantics for frame-based description languages. In Proceedings A&U(PhiladeIphia, Pa., 1986), PP. 344-348. 46. REITER, R. Towards alogical reconstruction ofrelational database theory .In On Conceptual Modellmg, M. L. Brodie, J. Mylopoulos, and J. W. Schmidt, Eds. Springer, New York, 1984, pp. 191-233. Interna47. SCHMIDT-SCHAUSS, M. Subsumption m KL-ONE is undecidable. In AX 89lst tional Conference on Prlnclples of Knowledge Representation and Reasoning, R. J. Brachmann, H. J. Levesque, and R, Reiter, Eds. Morgan Kauffmann, Menlo Park, Calif. (Toronto, May 1989), pp. 421-431. 48. SCHMIDT-SCHAUSS, M., AND SMOLKA, G. Attributive concept descriptions with unions and Art+. Intell. 48, 1 (1991), 126. complements. 49. SCHMOLZE, J. G., AND ISRAEL, D. J. KL-ONE: Semantics and classification. In Research in Knowledge Representation and Nataral Language. BBN Tech. Rep. N.5421. Bolt, Beranek and Newman, Cambridge, Mass., 1983. ACM Trans. 50 SHIPMAN, D. W. The functional data model and the data language DAPLEX. Database Syst. 6, 1 (1981), 140-173. 51. SIBLEY, E. H., AND KERSHBERG, L. Data architecture and data model considerations. In Proceedings of the Natzonal Computer Conference (Dallas, Tex., 1977), AFIPS, pp. 85-96, Commun. ACM 20, 6 52. SMITH, J. M., AND SMITH, D. C. P. Database abstractions: Aggregation,
(1977), ACM 405-413. 53. SMITH, J. M., AND SMITH, D. C. P. Trans. Database Syst. 54. VILAIN,
Database
abstractions:
Aggregation
2, 2 (1977),
105-133,
M.
The
restricted
language
Conference
Proceedings
architecture of a hybrid representation (Los Angeles, Aug., 1985), pp. 547-551, 1991
Received February
ACM TransactIons