Professional Documents
Culture Documents
Coreference analysis in clinical notes: A multi-pass sieve with alternate anaphora resolution modules
Siddhartha Jonnalagadda, PhD NLP program, Mayo Clinic, Rochester i2b2/VA/Cincinnati NLP shared task October 21st 2011
Biomedical Informatics
Siddhartha Jonnalagadda, PhD (Track lead) Dingcheng Li, MS Sunghwan Sohn, PhD Stephen Wu, PhD Kavishwar Wagholikar, MBBS, PhD Manabu Torii, PhD (Georgetown) Hongfang Liu, PhD (Principal Investigator)
2011 Mayo Clinic 2
Biomedical Informatics
Introduction
group mentions to form an entity understanding the links of related concepts critical to
heuristics-based approach based on linguistic theories supervised machine learning approach classification based on ranking of all markables unsupervised machine learning approach
Biomedical Informatics
sieve for coreference resolution. In: EMNLP-2010. Boston, USA: Association for Computational Linguistics; 2010. p. 492-501. Lee H, Peirsman Y, Chang A, Chambers N, Surdeanu M, Jurafsky D. Stanford's Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task. In: CoNLL-2011 Shared Task, 2011. Portland, Oregon: Association for Computational Linguistics; 2011. p. 7379.
Biomedical Informatics
Biomedical Informatics
Vicinity Filter
Section Filter
4. Head match and Word inclusion 5. Head match and Compatible modifiers 6. Relaxed head match and Word Inclusion 7. (Stemmed head match and Stemmed bag of words match) OR Related words match 8a. FHMM-based Pronoun Sieve
Set-up A
Set-up B
2011 Mayo Clinic
Set-up C
6
Biomedical Informatics
appearance For each sieve, coreferential relationship tested for each pair of mentions starting from the last appearing (probable) mention For each mention, a probable antecedent searched starting from the closest mention Assumption: given two exactly similar antecedents, the closer antecedent the coreferential relationship
2011 Mayo Clinic 7
Biomedical Informatics
Section Filter
Vicinity Filter
7. (Stemmed head match and Stemmed bag of words match) OR Related words match 8a. FHMM-based Pronoun Sieve 8b. Rule-based Pronoun Sieve
Set-up A
Set-up B
2011 Mayo Clinic
Set-up C
8
Biomedical Informatics
Section Filter
In general English, if two mentions have the same name,
more than 95% of the times the mentions corefer However, in clinical narratives, this might not be the case A problem or treatment of different patients in family medical
history section A non-chronic problem or a test in the history of present illness A treatment in current medications unrelated to another one in discharge medications section
identifying co-referred pairs Example: two mentions associated with the same term appearing in
Biomedical Informatics
Vicinity Filter
Section Filter
7. (Stemmed head match and Stemmed bag of words match) OR Related words match 8a. FHMM-based Pronoun Sieve 8b. Rule-based Pronoun Sieve
Set-up A
Set-up B
2011 Mayo Clinic
Set-up C
10
Biomedical Informatics
Vicinity Filter
Pathology was negative for tumor and showed peritubal and periovarian adhesions .
Biomedical Informatics
Vicinity Filter
Section Filter
4. Head match and Word inclusion 5. Head match and Compatible modifiers 6. Relaxed head match and Word Inclusion 7. (Stemmed head match and Stemmed bag of words match) OR Related words match 8a. FHMM-based Pronoun Sieve
Set-up A
Set-up B
2011 Mayo Clinic
Set-up C
12
Biomedical Informatics
Sieves
a relative pronoun that is governed by the antecedent. as detected by rules based on part of speech tags abbreviation list using UMLS
2011 Mayo Clinic 13
Biomedical Informatics
Vicinity Filter
Section Filter
4. Head match and Word inclusion 5. Head match and Compatible modifiers 6. Relaxed head match and Word Inclusion 7. (Stemmed head match and Stemmed bag of words match) OR Related words match 8a. FHMM-based Pronoun Sieve
Set-up A
Set-up B
2011 Mayo Clinic
Set-up C
14
Biomedical Informatics
Vicinity Filter
Section Filter
4. Head match and Word inclusion 5. Head match and Compatible modifiers 6. Relaxed head match and Word Inclusion 7. (Stemmed head match and Stemmed bag of words match) OR Related words match 8a. FHMM-based Pronoun Sieve
Set-up A
Set-up B
2011 Mayo Clinic
Set-up C
15
Biomedical Informatics
Sieves (contd..)
synonyms and other relationships extracted from the UMLS UMLS MRREL table synonym (for sieve 2), parent-child and narrow-broad
(sieve 7)
16
Biomedical Informatics
Vicinity Filter
Section Filter
4. Head match and Word inclusion 5. Head match and Compatible modifiers 6. Relaxed head match and Word Inclusion 7. (Stemmed head match and Stemmed bag of words match) OR Related words match 8a. FHMM-based Pronoun Sieve
Set-up A
Set-up B
2011 Mayo Clinic
Set-up C
17
Biomedical Informatics
18
Biomedical Informatics
Training Results
Accuracy of the Machine learning based pronoun sieve
System FHMM Beth 66% Partners 63.5% Discharge 60% Progress 61.5%
.836
0.90|.936|.918
.739|.798|.767
.937|.808|.862
.802|.843|.822
19
Biomedical Informatics
Test Results
Biomedical Informatics
Discussion
Biomedical Informatics
Take home
entity-centered approach better than a mentioncentered approach Watch out for an expert-based open-source under Apache License Any questions/comments please email or see us at AMIA
2011 Mayo Clinic
22
Biomedical Informatics
Thanks!
Siddhartha Jonnalagadda, PhD (Track lead) Dingcheng Li, MS Sunghwan Sohn, PhD Stephen Wu, PhD Kavishwar Wagholikar, MBBS, PhD Manabu Torii, PhD (Georgetown) Hongfang Liu, PhD (Principal Investigator)
2011 Mayo Clinic 23