You are on page 1of 14

1.

Ontology Learning
Alexander Maedche1 and Steffen Staab2
1
FZI Research Center for Information Technologies, University of Karlsruhe,
Germany
email: maedche@fzi.de
2
Institute AIFB, University of Karlsruhe, Germany
email: sst@aifb.uni-karlsruhe.de

Ontology Learning greatly facilitates the construction of ontologies by


the ontology engineer. The vision of ontology learning that we propose here
includes a number of complementary disciplines that feed on different types of
unstructured and semi-structured data in order to support a semi-automatic,
cooperative ontology engineering process. Our ontology learning framework
proceeds through ontology import, extraction, pruning, and refinement giving
the ontology engineer a wealth of coordinated tools for ontology modelling.
Besides of the general architecture, we show in this paper some exemplary
techniques in the ontology learning cycle that we have implemented in our
ontology learning environment, Text-To-Onto.

1.1 Introduction

Ontologies are a formal conceptualization of a particular domain that is


shared by a group of people. The Semantic Web relies heavily on the formal
ontologies that structure underlying data for the purpose of comprehensive
and transportable machine understanding. In the context of the Semantic
Web, ontologies describe domain theories for the explicit representation of
the semantics of Web data. Therefore, the success of the Semantic Web de-
pends strongly on the proliferation of ontologies, which requires fast and easy
engineering of ontologies and avoidance of a knowledge acquisition bottleneck.
Though ontology engineering tools have matured over the last decade, the
manual building of ontologies still remains a tedious, cumbersome task which
can easily result in a knowledge acquisition bottleneck.
When using ontologies as a basis for Semantic Web applications, one has
to face exactly this issue and in particular questions about development time,
difficulty, confidence and the maintenance of ontologies. Thus, what one ends
up with is similar to what knowledge engineers have dealt with over the last
two decades when elaborating methodologies for knowledge acquisition or
workbenches for defining knowledge bases. A method which has proven to be
extremely beneficial for the knowledge acquisition task is the integration of
knowledge acquisition with machine learning techniques.
The notion of Ontology Learning [1.1] aims at the integration of a mul-
titude of disciplines in order to facilitate the construction of ontologies, in
2 Alexander Maedche and Steffen Staab

particular machine learning. Ontology Learning greatly facilitates the con-


struction of ontologies by the ontology engineer. Because the fully automatic
acquisition of knowledge by machines remains in the distant future, the over-
all process is considered to be semi-automatic with human intervention. It
relies on the “balanced cooperative modeling” paradigm, describing a coor-
dinated interaction between human modeler and learning algorithm for the
construction of ontologies for the Semantic Web. This objective in mind,
an approach that combines ontology engineering with machine learning is
described here, feeding on the resources that we nowadays find on the Web.
Organization. This article is organized as following. Section 1.2 introduces
the architecture for ontology learning and its relevant components. In Section
1.3 we introduce different ontology learning algorithms that have serve as a
basis for ontology learning for the Semantic Web. Section 1.4 describes how
we have implemented our vision of ontology learning in the form of a concrete
system called Text-To-Onto.

1.2 An Architecture for Ontology Learning

The purpose of this section is to introduce the ontology learning architecture


and its relevant components. Four core components have been identified and
embedded into a coherent generic architecture for ontology learning:

– Ontology Management Component: As basis for ontology learning


comprehensive ontology management capabilities (e.g. means for including
existing ontologies, evolution, etc.) and a graphical user interface to support
the ontology engineering process which are manually performed by the
ontology engineer.
– Resource Processing Component: This component contains a wide
range of techniques for discovering, importing, analyzing and transforming
relevant input data. An important sub-component is a natural language
processing system. The general task of the resource processing component
is to generate a set of pre-processed data as input for the algorithm library
component.
– Algorithm Library Component: This component acts as the algorith-
mic backbone of the framework. A number of algorithms are provided for
the extraction and maintenance of the entity contained in the ontology
model. In order to be able to combine the extraction results of different
learning algorithms, it is necessary to standardize the output in a common
way. Therefore a common result structure for all learning methods is pro-
vided. If several extraction algorithms obtain the same results, they are
combined in a multi-strategy way and presented to the user only once.
– Coordination Component: The ontology engineer uses this component
to interact with the ontology learning components for resource processing
1. Ontology Learning 3

Fig. 1.1. Ontology Learning Conceptual Architecture

and for the algorithm library. Comprehensive user interfaces are provided
to the ontology engineer to help select relevant data, apply processing and
transformation techniques or start a specific extraction mechanism. Data
processing can also be triggered by the selection of an ontology learning
algorithm that requires a specific representation. Results are merged us-
ing the result set structure and presented to the ontology engineer with
different views of the ontology structures.

In the following we provide a detailed overview of the four components.

1.2.1 Ontology Management

As core to our approach we build on an ontology management and applica-


tion infrastructure called KArlsruhe ONtology and Semantic Web Infrastruc-
ture (KAON), allowing easy ontology management and application. KAON
is based on an ontology model [1.3], derived as an extension of RDF(S), with
some proprietary extensions, such as inverse, symmetric and transitive re-
lations, cardinalities, modularization, meta-modeling and representation of
lexical information.
4 Alexander Maedche and Steffen Staab

KAON is based on an ontology model as introduced in [1.3]. Briefly, the


ontology language is based on RDF(S), but with clean separation of modeling
primitives from the ontology itself (thus avoiding the pitfalls of self-describing
RDFS primitives such as subClassOf), providing means for modelling meta-
classes and incorporating several commonly used modelling primitives, such
as transitive, symmetric and inverse properties, or cardinalities. All infor-
mation is organized in so-called OI-models (ontology-instance models), con-
taining both ontology entities (concepts and properties) as well as their in-
stances. This allows grouping concepts with their well-known instances into
self-contained units. E.g. a geographical information OI-model might contain
the concept Continent along with its seven well-known instances. An OI-
model may include another OI-model, thus making all definitions from the
included OI-model automatically available.

Fig. 1.2. OI-model example

An OI-model which is of particular interest for ontology learning is the


lexical OI-model. It consists of so-called lexical entries that reflect various
lexical properties of ontology entities, such as a label, synonym, stem or
textual documentation. There is an n : m relationship between lexical entries
and instances, established by the property ref erences. Thus, the same lexical
entry may be associated with several elements (e.g. jaguar label may be
associated with an instance representing a Jaguar car or a jaguar cat). The
value of the lexical entry is given by property value, whereas the language
of the value is specified through the inLanguage property that refers to the
set of all languages, and its instances are defined by the ISO standard 639.
1. Ontology Learning 5

1.2.2 Coordination Component

As mentioned earlier this component contains a wide range of techniques


for discovering, importing, analyzing and transforming relevant input data.
An important sub-component is a natural language processing system. The
general task of the resource processing component is to generate a set of
pre-processed data as input for the algorithm library component.
The ontology engineer uses the coordination component to select input
data, i.e. relevant resources such as HTML documents from the Web that are
exploited in the further discovery process. Secondly, using the coordination
component, the ontology engineer also chooses among a set of resource pro-
cessing methods available at the resource processing component and among
a set of algorithms available in the algorithm library.

1.2.3 Resource processing

Resource processing strategies differ depending on the type of input data


made available:
– Semi-structured documents, like dictionaries, may be transformed into a
predefined relational structure as described in [1.1]. HTML documents may
be indexed and reduced to free text.
– For processing free natural text our system accesses language dependent
natural language processing systems may be used. E.g. for the German
language we have used SMES (Saarbrücken Message Extraction System),
a shallow text processor [1.9]. SMES comprises a tokenizer based on regular
expressions, a lexical analysis component including various word lexicons, a
morphological analysis module, a named entity recognizer, a part-of-speech
tagger and a chunk parser. For English, one may build on many available
language processing resource, e.g. the GATE system as described in [1.8].
In the actual implementation that is described in 1.4 we just used the well-
known Porter stemming algorithm as a very simple means for normalizing
natural language.
After first preprocessing according to one of these or similar strategies,
the resource processing module transforms the data into an algorithm-specific
relational representation (see [1.1] for a detailed description).

1.2.4 Algorithm Library

An ontology may be described by a number of sets of concepts, relations,


lexical entries, and links between these entities. An existing ontology defi-
nition may be acquired using various algorithms working on this definition
and the preprocessed input data. While specific algorithms may greatly vary
from one type of input to the next, there is also considerable overlap con-
cerning underlying learning approaches like association rules, pattern-based
6 Alexander Maedche and Steffen Staab

extraction or clustering. Hence, we may reuse algorithms from the library for
acquiring different parts of the ontology definition.
Subsequently, we introduce some of these algorithms available in our im-
plementation. In general, we use a multi-strategy learning and result combi-
nation approach, i.e. each algorithm that is plugged into the library generates
normalized results that adhere to the ontology structures sketched above and
that may be combined into a coherent ontology definition. Several algorithms
are described in more detail in the following section.

1.3 Ontology Learning Algorithms

1.3.1 Lexical Entry & Concept Extraction

A straightforward technique for extracting relevant lexical entries that may


indicate concepts is just counting frequencies of lexical entries in a given set
of (linguistically processed) documents, the corpus D. In general this ap-
proach is based on the assumption that a frequent lexical entry in a set of
domain-specific texts indicates a surface of a concept. Research in information
retrieval has shown that there are more effective methods of term weighting
than simple counting of frequencies. A standard information retrieval ap-
proach is pursued for term weighting, based on the following quantities:

– The lexical entry frequency lefl,d is the frequency of occurrence of


lexical entry l ∈ L in a document d ∈ D.
– The document frequency dfl is the number of documents in the corpus
D that l occurs in.
– The corpus frequency cfl is total number of occurrences of l in the overall
corpus D.

The reader may note that in general dfl ≤ cfl and d lefl,d = cfl . The
extraction of lexical entries is based on the information retrieval measure
tfidf (term frequency inverted document frequency). Tfidf combines
the introduced quantities as follows:

Definition 1.3.1. Let lefd,l be the lexical entry frequency of the lexical entry
l in a document d. Let dfl be the overall document frequency of lexical entry
l. Then tfidfl,d of the lexical entry l for the document d is given by:
 
|D|
tfidfl,d = lefl,d ∗ log . (1.1)
dfl
Tfidf weighs the frequency of a lexical entry in a document with a fac-
tor that discounts its importance when it appears in almost all documents.
Therefore terms that appear too rarely or too frequently are ranked lower
than terms that hold the balance. A final step that has to be carried out on
1. Ontology Learning 7

the computed tfidfl,d is the following: A list of all lexical entries contained
in one of the documents from the corpus D without lexical entries that ap-
pear in a standard list of stopwords is produced.1 . The tfidf values for lexical
entries l are computed as follows:

Definition 1.3.2. tfidfl := tfidfl,d , tfidfl ∈ IR. (1.2)
d∈D

The user may define a threshold k ∈ IR+ that tfidfl has to exceed. The
lexical entry approach has been evaluated in detail (e.g. for varying k and
different selection strategies).

1.3.2 Taxonomic Relation Extraction

Taxonomic relation extraction can be done in many different ways. In our


framework we mainly include the following to kinds of approaches:
– Statistics-based extraction using hierarchical clustering
– Lexico-syntactic pattern extraction
In the following we will shortly elaborate on the two approaches.
Hierarchical Clustering. Distributional data about words may be used to
build concepts and their embedding into a hierarchy “from scratch”. In this
case a certain clustering technique is applied to distributional representations
of concepts. Clustering can be defined as the process of organizing objects into
groups whose members are similar in some way (see [1.5]). In general there
are two major styles of clustering: non-hierarchical clustering in which
every object is assigned to exactly one group and hierarchical clustering,
in which each group of size greater than one is in turn composed of smaller
groups. Hierarchical clustering algorithms are preferable for detailed data
analysis. They produce hierarchies of clusters, and therefore contain more
information than non-hierarchical algorithms. However, they are less efficient
with respect to time and space than non-hierarchical clustering2 . [1.7] identify
two main uses for clustering in natural language processing3 : The first is the
use of clustering for exploratory data analysis, the second is for generalization.
Seminal work in this area of so-called distributional clustering of English
words has been described in [1.6]. Their work focuses on constructing class-
based word co-occurrence models with substantial predictive power. In the
following the existing and seminal work of applying statistical hierarchical
clustering in NLP (see [1.6]) is adopted and embedded into the framework.
1
Available for German at http://www.loria.fr/˜bonhomme/sw/stopword.de
2
Hierarchical clustering has in the average quadratic time and space complexity.
3
A comprehensive survey on applying clustering in NLP is also available in the
EAGLES report, see http://www.ilc.pi.cnr.it/EAGLES96/rep2/node37.htm
8 Alexander Maedche and Steffen Staab

Lexico-Syntactic Patterns. The idea of using lexico-syntactic patterns in the


form of regular expressions for the extraction of semantic relations, in par-
ticular taxonomic relations has been introduced by [1.11]. Pattern-based ap-
proaches in general are heuristic methods using regular expressions that orig-
inally have been successfully applied in the area of information extraction.
In this lexico-syntactic ontology learning approach the text is scanned for
instances of distinguished lexico-syntactic patterns that indicate a relation of
interest, e.g. the taxonomic relation. Thus, the underlying idea is very sim-
ple: Define a regular expression that captures re-occurring expressions and
map the results of the matching expression to a semantic structure, such as
taxonomic relations between concepts.
Example. This examples provides a sample pattern-based ontology extrac-
tion scenario. For example, in [1.11] the following lexico-syntactic pattern is
considered

. . . N P {, N P } ∗ {, } or other N P . . .
When we apply this pattern to a sentence it can be inferered that the
NP’s referring to concepts on the left of or other are sub concepts of the NP
referring to a concept on the right. For example from the sentence

Bruises, wounds, broken bones or other injuries are common.

we extract the taxonomic relations (Bruise,Injury), (Wound,Injury) and


(Broken-Bone,Injury).
Within our ontology learning system we have applied different techniques
to dictionary definitions in the context of the insurance and telecommunica-
tion domains is described in [1.12]. An important aspect in this system and
approach is that existing concepts are included in the overall process. Thus,
in contrast to [1.11] the extraction operations have been performed on the
concept level, thus, patterns have been directly matched onto concepts. Thus,
the system is, besides extracting taxonomic relations from scratch, able to
refine existing relations and refer to existing concepts.

1.3.3 Non-Taxonomic Relation Extraction

Association rules have been established in the area of data mining, thus,
finding interesting association relationships among a large set of data items.
Many industries become interested in mining association rules from their
databases (e.g. for helping in many business decisions such as customer re-
lationship management, cross-marketing and loss-leader analysis. A typical
example of association rule mining is market basket analysis. This process
analyzes customer buying habits by finding associations between the differ-
ent items that customers place in their shopping baskets. The information
1. Ontology Learning 9

discovered by association rules may help to develop marketing strategies, e.g.


layout optimization in supermarkets (placing milk and bread within close
proximity may further encourage the sale of these items together within sin-
gle visits to the store). In [1.10] concrete examples for extracted associations
between items are given. The examples are based supermarket products that
are included in a set of transactions collected from customers’ purchases. One
of the classical association rule that has been extracted from these databases
is that “diaper are purchased together with beer”.
For the purpose of illustration, an example is provided to the reader. The
example is based on actual ontology learning experiments as described in
[1.2]. A text corpus given by a WWW provider for tourist information has
been processed. The corpus describes actual objects referring to locations,
accomodations, furnishings of accomodations, administrative information, or
cultural events, such as given in the following example sentences.
(1) a. “Mecklenburg’s” schönstes “Hotel” liegt in Rostock. (“Mecklenburg’s”
most beautiful “hotel” is located in Rostock.)
b. Ein besonderer Service für unsere Gäste ist der “Frisörsalon” in un-
serem “Hotel”. (A “hairdresser” in our “hotel” is a special service for
our guests.)
c. Das Hotel Mercure hat “Balkone” mit direktem “Strandzugang”. (The
hotel Mercure offers “balconies” with direct “access” to the beach.)
d. Alle “Zimmer” sind mit “TV”, Telefon, Modem und Minibar ausges-
tattet. (All “rooms” have “TV”, telephone, modem and minibar.)
Processing the example sentences (1a) and (1b) the dependency relations
between the lexical entries are extracted (and some more). In sentences (1c)
and (1d) the heuristic for prepositional phrase-attachment and the sentence
heuristic relate pairs of lexical entries, respectively. Thus, four concept pairs
– among many others – are derived with knowledge from the lexicon.

Table 1.1. Examples for Linguistically Related Pairs of Concepts

L1 ai,1 L2 ai,2
“Mecklenburgs” area hotel hotel
“hairdresser” hairdresser hotel hotel
“balconies” balcony access access
“room” room TV television

The algorithm for learning generalized association rules uses the con-
cept hierarchy, an excerpt of which is depicted in Figure 1.3, and the con-
cept pairs from above (among many other concept pairs). In our actual ex-
periments, it discovered a large number of interesting and important non-
taxonomic conceptual relations. A few of them are listed in Table 1.2. Note
that in this table we also list two conceptual pairs, viz. (area, hotel)
and (room, television), that are not presented to the user, but that
10 Alexander Maedche and Steffen Staab

Fig. 1.3. An Example Concept Taxonomy as Background Knowledge for Non-


Taxonomic Relation Extraction

are pruned. The reason is that there are ancestral association rules, viz.
(area, accomodation) and (room,furnishing), respectively with higher
confidence and support measures.

Table 1.2. Examples of Discovered Non-Taxonomic Relations

Discovered relation Confidence Support


(area, accomodation) 0.38 0.04
(area, hotel) 0.1 0.03
(room, furnishing) 0.39 0.03
(room, television) 0.29 0.02
(accomodation, address) 0.34 0.05
(restaurant, accomodation) 0.33 0.02

1.3.4 Ontology Pruning

Pruning is of need, if one adopts a (generic) ontology to a given domain.


Again, we take the assumption that the occurrence of specific concepts and
conceptual relations in web documents are vital for the decision whether
or not a given concept or relation should remain in an ontology. We take
a frequency based approach determining concept frequencies in a corpus.
Entities that are frequent in a given corpus are considered as a constituent of
a given domain. But - in contrast to ontology extraction - the mere frequency
of ontological entities is not sufficient.
To determine domain relevance ontological entities retrieved from a do-
main corpus are compared to frequencies obtained from a generic corpus.
The user can select several relevance measures for frequency computation.
The ontology pruning algorithm uses the computed frequencies to determine
the relative relevancy of each concept contained in the ontology. All exist-
ing concepts and relations, which are more frequent in the domain-specific
1. Ontology Learning 11

corpus remain in the ontology. The user can also control the pruning of con-
cepts which are neither contained in the domain-specific nor in the generic
corpus. We refer to interested reader to [1.1], where the pruning algorithms
are described in further detail.

1.4 Implementation
This section describes the implemented ontology learning system Text-To-
Onto that is embedded in KAON, the Karlsruhe Ontology and Semantic web
infrastructure. KAON4 is an open-source ontology management and appli-
cation infrastructure targeted for semantics-driven business applications. It
includes a comprehensive tool suite allowing easy ontology management and
application.

Fig. 1.4. OI-Modeler

Within KAON we have developed OI-modeler, an end-user application


that realizes a graph-based user interface for OI-model creation and evolution
[1.4]. Figure 1.4 shows the graph-based view onto an OI-model.
A specific aspect of OI-modeler is that it supports ontology evolution at
the user level (see right side of the screenshot). The figure shows a modelling
session where the user attempted to remove the concept Person. Before ap-
plying this change to the ontology, the system computed the set of additional
4
http://kaon.semanticweb.org
12 Alexander Maedche and Steffen Staab

changes that must be applied. The tree of dependent changes is presented


to the user, thus allowing the user to comprehend the effects of the change
before it is actually applied. Only when the user agrees, the changed will be
applied to the ontology.

Text-To-Onto5 has been developed on top of KAON and in specific on


top of OI-modeler. Text-To-Onto provides means for defining a corpus as a
basis for ontology learning (see right lower part of Figure 1.5). On top of the
corpus various algorithms may be applied. In the example depicts in Figure
1.5 the user has selected different paramters and executed a pattern-based
extraction and an association rule extraction algorithm in parallel.
Finally, the results are presented to the user (see Figure 1.5) and graph-
ical means for adding lexical entries, concepts and conceptual relations are
provided.

Fig. 1.5. Text-To-Onto

5
Text-To-Onto is open-source and available for download at the KAON Web page.
1. Ontology Learning 13

1.5 Conclusion

Until recently ontology learning per se, i.e. for comprehensive construction of
ontologies, has not existed. However, much work in a number of disciplines —
computational linguistics, information retrieval, machine learning, databases,
software engineering — has actually researched and practiced techniques for
solving part of the overall problem.
We have introduced ontology learning as an approach that may greatly
facilitate the construction of ontologies by the ontology engineer. The notion
of Ontology Learning introduced in this article aims at the integration of a
multitude of disciplines in order to facilitate the construction of ontologies.
The overall process is considered to be semi-automatic with human interven-
tion. It relies on the “balanced cooperative modeling” paradigm, describing a
coordinated interaction between human modeler and learning algorithm for
the construction of ontologies for the Semantic Web.

References
1.1 A. Maedche: Ontology Learning for the Semantic Web. Kluwer Academic Pub-
lishers, 2002.
1.2 A. Maedche and S. Staab: Mining ontologies from text. In Proceedings of
EKAW-2000, Springer Lecture Notes in Artificial Intelligence (LNAI-1937),
Juan-Les-Pins, France, 2000.
1.3 B. Motik and A. Maedche and R. Volz: A Conceptual Modeling Approach for
building semantics-driven enterprise applications. 1st International Conference
on Ontologies, Databases and Application of Semantics (ODBASE-2002), Cal-
ifornia, USA, 2002.
1.4 A. Maedche and B. Motik and L. Stojanovic and R. Studer and R. Volz: Ontolo-
gies for Enterprise Knowledge Management, IEEE Intelligent Systems, 2002.
1.5 L. Kaufman and P. Rousseeuw: Finding Groups in Data: An Introduction to
Cluster Analysis, John Wiley, 1990.
1.6 Pereira, F. and Tishby, N. and Lee, L.: Distributation Clustering of English
Words. In Proceedings of the ACL-93, 1993.
1.7 Manning, C. and Schuetze, H.: Foundations of Statistical Natural Language
Processing. MIT Press, Cambridge, Massachusetts, 1999.
1.8 H. Cunningham and R. Gaizauskas and K. Humphreys and Y. Wilks: Three
Years of GATE, In Proceedings of the AISB’99 Workshop on Reference Archi-
tectures and Data Standards for NLP, Edinburgh, U.K. Apr, 1999.
1.9 G. Neumann and R. Backofen and J. Baur and M. Becker and C. Braun: An
Information Extraction Core System for Real World German Text Processing.
In Proceedings of ANLP-97, Washington, USA, 1997.
1.10 Agrawal, R. and Imielinski, T. and Swami, A.: Mining Associations between
Sets of Items in Massive Databases, In Proceedings of the 1993 ACM SIGMOD
International Conference on Management of Data, Washington, D.C., May 26-
28, 1993.
1.11 Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In
Proceedings of the 14th International Conference on Computational Linguis-
tics. Nantes, France, 1992.
14 Alexander Maedche and Steffen Staab

1.12 Kietz, J.-U. and Volz, R. and Maedche, A.: Semi-automatic ontology acquisi-
tion from a corporate intranet. In International Conference on Grammar Infer-
ence (ICGI-2000), Lecture Notes in Artificial Intelligence, LNAI, 2000.

You might also like