Usage of Word Sense Disambiguation in Concept Identification in Ontology Construction

Usage of Word Sense Disambiguation in
Concept Identification in Ontology

Construction
Guest Talk at University of Moratuwa, Department of Computer Science and Engineering
5th November, 2016
Discussed by: Kiruparan Balachandran
Background Information - Ontology

Ontology provides a potential method to describe domain knowledge
solve
problem
has
algorithm
complexity
is a
sorting algorithm
Background Information - Ontology learning layer-cake approach
isA(sorting algorithm, algorithm) -> solve (sorting algorithm, problem)
Rules
solve (algorithm, problem) - known as Non- Taxonomy relationship

isA(sorting algorithm, algorithm) - known as Taxonomy relationship
Algorithm (I, E, L)
{Randomized algorithm, sorting algorithm}, {system software, application software}

{Randomized algorithm, sorting algorithm, system software, application software}
Relations
Concept Hierarchy
Concepts
Synonyms
Terms
Implemented approach follows Buitelaar et al. criteria in forming concepts

from terms
An intentional definition of the concept
Formal definition: A term can be considered as a concept if the term is linked with a valid relation to
another term.
Informal definition: A term should have a textual description.
A set of concept instances, i.e. its extensions: a term can be considered a concept if it has
instances.
A set of linguistic realizations.
Need of WSD in forming concepts from terms

Iterate each sentence (ts) from the corpus
For example ts = we propose a hardware design, call the

virtual line scheme, that allows the utilization of large virtual
cache line when fetch datum from memory for better
exploitation of spatial locality
Subject Phrase and Object Phrase identified in

each sentence
Full or part of subject phrases (ts) and object

phrases (to) exist in the list of domain-specific
Feed (ts and to separately) referred as t and
sentence ts
List of sense exist in WordNet for t
Identify sense tsense related to domain from the list of sense (disambiguating sense)
If tsense is exist for both
tsense of ts and to are candidate for domain-specific concepts
Need of WSD in forming concepts from terms

Iterate each sentence (ts) from the corpus
Subject Phrase and Object Phrase identified in

each sentence
Full or part of subject phrases (ts) and object

phrases (to) exist in the list of domain-specific
Feed (ts and to separately) referred as t and
sentence ts
List of sense exist in WordNet for t
cache#n#1, cache#n#2, and cache#n#3
Identify sense tsense related to domain from the list of sense (disambiguating sense)
If tsense is exist for both
tsense of ts and to are candidate for domain-specific concepts
Which algorithm best suited ?

LESK
Original LESK
definition of a word meaning as a only source of contextual information for a given sense
combinatorial explosion
Use of Simulated annealing

LESK
Original LESK
definition of a word meaning as a only source of contextual information for a given sense
combinatorial explosion
Use of Simulated annealing
Simplified LESK
To solve combinatorial explosion
Runs a separate disambiguation process for each ambiguous word in the input text
Adapted LESK
Enlarged context : consider hypernyms, hyponyms, holonyms, meronyms, troponyms,
attribute relations, and their associated definitions
Less accuracy
8

Other well known algorithms with good performance use
Path
Depth of least common ancestor (LCS) referred as WUP
Path length and path direction referred as HSO
Link strength of a parent-child link using corpus statistical information

root
N3
C3
N1
C1
N2
ConSim (C1, C2) =
2N3
N1+N2+2N3
C2

Other well known algorithms with good performance use
Path
Depth of least common ancestor (LCS) referred as WUP
Path length and path direction referred as HSO

Weight = C path length k * number of changes of direction
10

Information content + distance
Information Content : obtained by estimating probability of occurrence of class in a large text corpus
11
Disambiguating Concepts (LESK ?)

cache#n#1, cache#n#2, and cache#n#3
For each sense
Extract the informal definition of sense from

WordNet
Calculating the similarity between ts and WNsn by

calculating similarity matrix between ts and WNsn
using a LESK algorithm. The value is normalized
based on number of entries in the distance
matrix.
Return the synset, which has high similarity value
12

For each sense

WordNet

matrix.
For example
WNs1 e.g. a hidden storage space for money or
provisions or weapons
WNs2 e.g. a secret store of valuables or money
WNs3 e.g. RAM memory that is set aside as a
specialized buffer storage, which is continually updated;
used to optimize data transfers between system
elements with different characteristics
13

For each sense

WordNet

matrix.
14

For each sense

WordNet

matrix.
15
Evaluation domain-specific concept extraction

Identified 253 computer science domain-specific concepts validated by three domain experts
Measured the inter-annotator agreement using Fleiss' kappa
0.36712, a fair agreement (3 annotators, 253concepts, 2 categories)
ComSciPrecision for concepts
Annotator 1
Annotator 2
Annotator 3
75%
56%
78%
Identified 47 domain-specific concepts for the GENIA corpus

compared with two different approaches discussed by Zhou et al. and Subramaniam et al.
Bio MedicalRecall
Our
approach
58.70%
MaxMatcher discussed by Zhou et al.
BioAnnotator Subramaniam et al.
57.73%
20.27%
16
Why LESK ?
Conclusion
Choosing a best WSD algorithm based on
Nature of your problem

Available factors
Performance with respect to accuracy and time
17
References
K. Balachandran and S. Ranathunga, "Domain-Specific Term Extraction for Concept Identification in Ontology Construction", in IEEE/WIC/ACM International Conference on
Web Intelligence, Omaha, Nebraska, USA, 2016, pp. 34-41.
P. Buitelaar, P. Cimiano, and B. Magnini, Ontology learning from text: methods, evaluation and applications vol. 123: IOS press, 2005.
X. Zhou, X. Zhang, and X. Hu, "MaxMatcher: Biological concept extraction using approximate dictionary lookup," in PRICAI 2006: Trends in Artificial Intelligence, ed: Springer,
2006, pp. 1145-1149.
L. V. Subramaniam, S. Mukherjea, P. Kankar, B. Srivastava, V. S. Batra, P. V. Kamesam, et al., "Information extraction from biomedical literature: methodology, evaluation and
an application," in Proceedings of the twelfth international conference on Information and knowledge management, 2003, pp. 410-417.
G. Hirst and D. St-Onge, "Lexical chains as representations of context for the detection and correction of malapropisms," WordNet: An electronic lexical database, vol. 305,
pp. 305-332, 1998.
S. Banerjee and T. Pedersen, "An adapted Lesk algorithm for word sense disambiguation using WordNet," in Computational linguistics and intelligent text processing, ed:
Springer, 2002, pp. 136-145.
Z. Wu and M. Palmer, "Verbs semantics and lexical selection," in Proceedings of the 32nd annual meeting on Association for Computational Linguistics, 1994, pp. 133-138.
M. Lesk, "Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone," in Proceedings of the 5th annual
international conference on Systems documentation, 1986, pp. 24-26.
C. Leacock and M. Chodorow, Combining Local Context and Wordnet Similarity for Word Sense Disambiguation, WordNet: An Electronic Lexical Database, vol. 49, pp. 265283, MIT Press, 1998.
J. J. Jiang and D. W. Conrath, Semantic similarity based on corpus statistics and lexical taxonomy, in Proc. Int. Conf. Research in Computational Linguistics, 1998, pp. 1933.
18
Questions ?
Thank You
19

Usage of Word Sense Disambiguation in Concept Identification in Ontology Construction

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Usage of Word Sense Disambiguation in Concept Identification in Ontology Construction

Uploaded by

Copyright:

Available Formats

Usage of Word Sense Disambiguation in

Concept Identification in Ontology

Background Information - Ontology

Background Information - Ontology learning layer-cake approach

isA(sorting algorithm, algorithm) -> solve (sorting algorithm, problem)

solve (algorithm, problem) - known as Non- Taxonomy relationship

{Randomized algorithm, sorting algorithm}, {system software, application software}

Implemented approach follows Buitelaar et al. criteria in forming concepts

Need of WSD in forming concepts from terms

For example ts = we propose a hardware design, call the

Subject Phrase and Object Phrase identified in

Full or part of subject phrases (ts) and object

If tsense is exist for both

tsense of ts and to are candidate for domain-specific concepts

Need of WSD in forming concepts from terms

Subject Phrase and Object Phrase identified in

Full or part of subject phrases (ts) and object

cache#n#1, cache#n#2, and cache#n#3

If tsense is exist for both

tsense of ts and to are candidate for domain-specific concepts

Which algorithm best suited ?

Which algorithm best suited ?

Which algorithm best suited ?

Link strength of a parent-child link using corpus statistical information

ConSim (C1, C2) =

Which algorithm best suited ?

Link strength of a parent-child link using corpus statistical information

Which algorithm best suited ?

Disambiguating Concepts (LESK ?)

Extract the informal definition of sense from

Calculating the similarity between ts and WNsn by

Return the synset, which has high similarity value

Disambiguating Concepts (LESK ?)

Extract the informal definition of sense from

Calculating the similarity between ts and WNsn by

Return the synset, which has high similarity value

Disambiguating Concepts (LESK ?)

Extract the informal definition of sense from

Calculating the similarity between ts and WNsn by

Return the synset, which has high similarity value

Disambiguating Concepts (LESK ?)

Extract the informal definition of sense from

Calculating the similarity between ts and WNsn by

Return the synset, which has high similarity value

Evaluation domain-specific concept extraction

ComSciPrecision for concepts

Identified 47 domain-specific concepts for the GENIA corpus

MaxMatcher discussed by Zhou et al.

BioAnnotator Subramaniam et al.

Nature of your problem

You might also like