Welcome to the IKCEST

Journal of the American Medical Informatics Association : JAMIA | Vol.20, Issue.5 | | Pages 922-30

Journal of the American Medical Informatics Association : JAMIA

Towards comprehensive syntactic and semantic annotations of the clinical narrative.

Daniel, Albright Arrick, Lanfranchi Anwen, Fredriksen William F, Styler Colin, Warner Jena D, Hwang Jinho D, Choi Dmitriy, Dligach Rodney D, Nielsen James, Martin Wayne, Ward Martha, Palmer Guergana K, Savova  
Abstract

To create annotated clinical narratives with layers of syntactic and semantic labels to facilitate advances in clinical natural language processing (NLP). To develop NLP algorithms and open source components.Manual annotation of a clinical narrative corpus of 127 606 tokens following the Treebank schema for syntactic information, PropBank schema for predicate-argument structures, and the Unified Medical Language System (UMLS) schema for semantic information. NLP components were developed.The final corpus consists of 13 091 sentences containing 1772 distinct predicate lemmas. Of the 766 newly created PropBank frames, 74 are verbs. There are 28 539 named entity (NE) annotations spread over 15 UMLS semantic groups, one UMLS semantic type, and the Person semantic category. The most frequent annotations belong to the UMLS semantic groups of Procedures (15.71%), Disorders (14.74%), Concepts and Ideas (15.10%), Anatomy (12.80%), Chemicals and Drugs (7.49%), and the UMLS semantic type of Sign or Symptom (12.46%). Inter-annotator agreement results: Treebank (0.926), PropBank (0.891-0.931), NE (0.697-0.750). The part-of-speech tagger, constituency parser, dependency parser, and semantic role labeler are built from the corpus and released open source. A significant limitation uncovered by this project is the need for the NLP community to develop a widely agreed-upon schema for the annotation of clinical concepts and their relations.This project takes a foundational step towards bringing the field of clinical NLP up to par with NLP in the general domain. The corpus creation and NLP components provide a resource for research and application development that would have been previously impossible.

Original Text (This is the original text for your reference.)

Towards comprehensive syntactic and semantic annotations of the clinical narrative.

To create annotated clinical narratives with layers of syntactic and semantic labels to facilitate advances in clinical natural language processing (NLP). To develop NLP algorithms and open source components.Manual annotation of a clinical narrative corpus of 127 606 tokens following the Treebank schema for syntactic information, PropBank schema for predicate-argument structures, and the Unified Medical Language System (UMLS) schema for semantic information. NLP components were developed.The final corpus consists of 13 091 sentences containing 1772 distinct predicate lemmas. Of the 766 newly created PropBank frames, 74 are verbs. There are 28 539 named entity (NE) annotations spread over 15 UMLS semantic groups, one UMLS semantic type, and the Person semantic category. The most frequent annotations belong to the UMLS semantic groups of Procedures (15.71%), Disorders (14.74%), Concepts and Ideas (15.10%), Anatomy (12.80%), Chemicals and Drugs (7.49%), and the UMLS semantic type of Sign or Symptom (12.46%). Inter-annotator agreement results: Treebank (0.926), PropBank (0.891-0.931), NE (0.697-0.750). The part-of-speech tagger, constituency parser, dependency parser, and semantic role labeler are built from the corpus and released open source. A significant limitation uncovered by this project is the need for the NLP community to develop a widely agreed-upon schema for the annotation of clinical concepts and their relations.This project takes a foundational step towards bringing the field of clinical NLP up to par with NLP in the general domain. The corpus creation and NLP components provide a resource for research and application development that would have been previously impossible.

+More

Cite this article
APA

APA

MLA

Chicago

Daniel, Albright Arrick, Lanfranchi Anwen, Fredriksen William F, Styler Colin, Warner Jena D, Hwang Jinho D, Choi Dmitriy, Dligach Rodney D, Nielsen James, Martin Wayne, Ward Martha, Palmer Guergana K, Savova,.Towards comprehensive syntactic and semantic annotations of the clinical narrative.. 20 (5),922-30.

Disclaimer: The translated content is provided by third-party translation service providers, and IKCEST shall not assume any responsibility for the accuracy and legality of the content.
Translate engine
Article's language
English
中文
Pусск
Français
Español
العربية
Português
Kikongo
Dutch
kiswahili
هَوُسَ
IsiZulu
Action
Recommended articles

Report

Select your report category*



Reason*



By pressing send, your feedback will be used to improve IKCEST. Your privacy will be protected.

Submit
Cancel