权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Annotation, development and evaluation for clinical information extraction

临床信息提取的注释、开发和评估

基本信息

批准号：
7985218
负责人：
WENDY W. CHAPMAN
金额：
--
依托单位：
UNIVERSITY OF PITTSBURGH AT PITTSBURGH
依托单位国家：
美国
项目类别：
财政年份：
2010
资助国家：
美国
起止时间：
2010-09-01 至 2010-10-31
项目状态：
已结题

项目摘要

DESCRIPTION (provided by applicant): Much of the clinical information required for accurate clinical research, active decision support, and broad-coverage surveillance is locked in text files in an electronic medical record (EMR). The only feasible way to leverage this information for translational science is to extract and encode the information using natural language processing (NLP). Over the last two decades, several research groups have developed NLP tools for clinical notes, but a major bottleneck preventing progress in clinical NLP is the lack of standard, annotated data sets for training and evaluating NLP applications. Without these standards, individual NLP applications abound without the ability to train different algorithms on standard annotations, share and integrate NLP modules, or compare performance. We propose to develop standards and infrastructure that can enable technology to extract scientific information from textual medical records, and we propose the research as a collaborative effort involving NLP experts across the U.S. To accomplish this goal, we will address three specific aims: Aim 1: Extend existing standards and develop new consensus standards for annotating clinical text in a way that is interoperable, extensible, and usable. Aim 2: Apply existing methods and tools, and develop new methods and tools where necessary for manually annotating a set of publicly available clinical texts in a way that is efficient and accurate. Aim 3: Develop a publicly available toolkit for automatically annotating clinical text and perform a shared evaluation to evaluate the toolkit, using evaluation metrics that are multidimensional and flexible. PUBLIC HEALTH RELEVANCE: In this project, we will develop a publicly available corpus of annotated clinical texts for NLP research. We will experiment with methods for increasing the efficiency of annotation and will annotate de-identified reports of nine types for linguistic and clinical information. In addition, we will create an NLP toolkit that can be shared and will evaluate it against other NLP systems in a shared task evaluation with the community.

描述（由申请人提供）：准确的临床研究、积极的决策支持和广泛的监督所需的大部分临床信息都被锁定在电子病历（EMR）的文本文件中。将这些信息用于翻译科学的唯一可行方法是使用自然语言处理（NLP）提取和编码信息。在过去的二十年里，几个研究小组已经开发了用于临床笔记的NLP工具，但阻碍临床NLP进步的主要瓶颈是缺乏用于训练和评估NLP应用程序的标准注释数据集。没有这些标准，单个NLP应用程序就无法在标准注释上训练不同的算法，共享和集成NLP模块，或者比较性能。我们建议开发标准和基础设施，使技术能够从文本医疗记录中提取科学信息，我们建议这项研究作为一项涉及美国各地NLP专家的合作努力，为了实现这一目标，我们将解决三个具体目标：目标1：扩展现有标准，并开发新的共识标准，以可互操作，可扩展和可用的方式注释临床文本。目标二：应用现有的方法和工具，并在必要时开发新的方法和工具，以高效和准确的方式手动注释一组公开的临床文本。目标3：开发一个公共可用的工具包，用于自动注释临床文本，并使用多维和灵活的评估指标进行共享评估，以评估工具包。公共卫生相关性：在这个项目中，我们将为NLP研究开发一个公共可用的注释临床文本语料库。我们将尝试提高注释效率的方法，并将注释九种类型的去识别报告，以获得语言和临床信息。此外，我们将创建一个可以共享的NLP工具包，并将在与社区共享的任务评估中与其他NLP系统进行评估。