Leveraging Unlabeled and Pseudo Data for Clinical Information Extraction

利用未标记和伪数据进行临床信息提取

基本信息

  • 批准号:
    9813134
  • 负责人:
  • 金额:
    $ 41.48万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2019
  • 资助国家:
    美国
  • 起止时间:
    2019-08-01 至 2022-07-31
  • 项目状态:
    已结题

项目摘要

Project Summary/Abstract Electronic Health Records (EHRs) contain significant information that can benefit many downstream uses. However, most of this information is in unstructured narrative form and is inaccessible to computerized methods that rely on structured representations for exploring, retrieving, and presenting the information. Natural language processing (NLP) and information extraction (IE) open this trove of information to studies that would otherwise be without. Over the past decades, many IE systems have been developed. These systems have typically focused on one task at a time. In addition, most have studied only specific types of records, e.g., discharge summaries, and addressed their task on data from a single institution. Performances achieved by the state-of-the-art IE systems developed under these conditions ranged from 44% F-measure to 99% F-measure. This observed variation can be attributed to the nature of the tasks: some target entities like dates tend to be better represented in the data and also more rigidly stick to known patterns of expression as opposed to reasons for medication administration which are relatively sparse in the data and can show wider linguistic diversity. However, this may not be the only reason: the data used can also explain the performance variation. Narratives of EHRs vary in their style, format, and content going from one department to another, from one hospital to another. Even the same record type in two different hospitals can be very different in narrative style and pose different challenges for IE. Understanding IE performance therefore requires studies of multiple tasks on multiple record types that come from multiple institutions. One major bottleneck for evaluation of IE systems on such a large scale is annotation. The same bottleneck also limits system development. This proposal aims to address this bottleneck for both evaluation and development. It first generates a multi-institution corpus consisting of multiple record types from five institutions. It studies four different IE tasks that broadly represent IE in clinical records and can inform the field of IE as a whole: de-identification, clinical concept extraction, medication extraction, and adverse drug event extraction. Within the context of these IE tasks, the proposal then puts forward methods that learn from unlabeled or pseudo data that can help alleviate reliance on annotated data for development. It evaluates these methods both for performance and generalizability on multiple types of records from multiple institutions. As a result of these activities, this proposal generates de-identified data, annotations, methods, software, and machine learning models which it then makes available to the research community.
项目总结/摘要 电子健康记录(EHR)包含重要信息,可以使许多下游用途受益。 然而,这些信息大部分是非结构化的叙述形式,无法用计算机化方法获取 依赖于结构化表示来探索、检索和呈现信息。自然语言 信息处理(NLP)和信息提取(IE)为研究打开了这一信息宝库, 没有。 在过去的几十年里,已经开发了许多IE系统。这些系统通常集中在一个 任务一次。此外,大多数人只研究了特定类型的记录,例如,出院总结,以及 处理来自单一机构的数据的任务。最先进的IE系统所实现的性能 在这些条件下开发的范围从44% F-测量到99% F-测量。这种观察到的变化可以 归因于任务的性质:一些目标实体(如日期)往往在数据中得到更好的表示 并且也更严格地坚持已知的表达模式,而不是药物施用的原因 这在数据中相对稀疏,并且可以显示更广泛的语言多样性。然而,这可能不是唯一的 原因:使用的数据也可以解释性能变化。EHR的叙述在风格、格式、 内容从一个部门到另一个部门,从一家医院到另一家医院。即使是相同的记录类型, 两家不同的医院在叙述风格上可能有很大的不同,并对IE提出不同的挑战。 因此,了解IE的性能需要研究多个记录类型上的多个任务, 来自多个机构。在如此大的规模上评估IE系统的一个主要瓶颈是注释。 同样的瓶颈也限制了系统的发展。该提案旨在解决这一瓶颈, 评价和发展。它首先生成一个由多种记录类型组成的多机构语料库, 五个机构。它研究了四种不同的IE任务,这些任务广泛代表了临床记录中的IE,并可以告知 IE的整体领域:去标识化、临床概念提取、药物提取和药物不良事件 萃取在这些IE任务的背景下,该提案提出了从未标记的学习方法 或者伪数据,其可以帮助减轻对用于开发的注释数据的依赖。它评估这些方法 对于来自多个机构的多种类型的记录的性能和普遍性两者。的结果 这些活动,本提案生成去识别数据、注释、方法、软件和机器 学习模型,然后提供给研究界。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Ozlem Uzuner其他文献

Ozlem Uzuner的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Ozlem Uzuner', 18)}}的其他基金

Joint learning methods for event and relation extraction from clinical narratives
从临床叙述中提取事件和关系的联合学习方法
  • 批准号:
    10507223
  • 财政年份:
    2022
  • 资助金额:
    $ 41.48万
  • 项目类别:
National NLP Clinical Challenges (n2c2): Challenges in Natural Language Processing for Clinical Narratives
国家 NLP 临床挑战 (n2c2):临床叙述自然语言处理的挑战
  • 批准号:
    10670801
  • 财政年份:
    2019
  • 资助金额:
    $ 41.48万
  • 项目类别:
National NLP Clinical Challenges (n2c2): Challenges in Natural Language Processing for Clinical Narratives
国家 NLP 临床挑战 (n2c2):临床叙述自然语言处理的挑战
  • 批准号:
    9759499
  • 财政年份:
    2019
  • 资助金额:
    $ 41.48万
  • 项目类别:
National NLP Clinical Challenges (n2c2): Challenges in Natural Language Processing for Clinical Narratives
国家 NLP 临床挑战 (n2c2):临床叙述自然语言处理的挑战
  • 批准号:
    10393499
  • 财政年份:
    2019
  • 资助金额:
    $ 41.48万
  • 项目类别:
Challenges in Natural Language Processing in Clinical Text
临床文本自然语言处理的挑战
  • 批准号:
    9597333
  • 财政年份:
    2017
  • 资助金额:
    $ 41.48万
  • 项目类别:
Challenges in Natural Language Processing for Clinical Narratives
临床叙述自然语言处理的挑战
  • 批准号:
    8722031
  • 财政年份:
    2012
  • 资助金额:
    $ 41.48万
  • 项目类别:
Challenges in Natural Language Processing for Clinical Narratives
临床叙述自然语言处理的挑战
  • 批准号:
    8913773
  • 财政年份:
    2012
  • 资助金额:
    $ 41.48万
  • 项目类别:
Challenges in Natural Language Processing for Clinical Narratives
临床叙述自然语言处理的挑战
  • 批准号:
    8400218
  • 财政年份:
    2012
  • 资助金额:
    $ 41.48万
  • 项目类别:

相似海外基金

Rational design of rapidly translatable, highly antigenic and novel recombinant immunogens to address deficiencies of current snakebite treatments
合理设计可快速翻译、高抗原性和新型重组免疫原,以解决当前蛇咬伤治疗的缺陷
  • 批准号:
    MR/S03398X/2
  • 财政年份:
    2024
  • 资助金额:
    $ 41.48万
  • 项目类别:
    Fellowship
Re-thinking drug nanocrystals as highly loaded vectors to address key unmet therapeutic challenges
重新思考药物纳米晶体作为高负载载体以解决关键的未满足的治疗挑战
  • 批准号:
    EP/Y001486/1
  • 财政年份:
    2024
  • 资助金额:
    $ 41.48万
  • 项目类别:
    Research Grant
CAREER: FEAST (Food Ecosystems And circularity for Sustainable Transformation) framework to address Hidden Hunger
职业:FEAST(食品生态系统和可持续转型循环)框架解决隐性饥饿
  • 批准号:
    2338423
  • 财政年份:
    2024
  • 资助金额:
    $ 41.48万
  • 项目类别:
    Continuing Grant
Metrology to address ion suppression in multimodal mass spectrometry imaging with application in oncology
计量学解决多模态质谱成像中的离子抑制问题及其在肿瘤学中的应用
  • 批准号:
    MR/X03657X/1
  • 财政年份:
    2024
  • 资助金额:
    $ 41.48万
  • 项目类别:
    Fellowship
CRII: SHF: A Novel Address Translation Architecture for Virtualized Clouds
CRII:SHF:一种用于虚拟化云的新型地址转换架构
  • 批准号:
    2348066
  • 财政年份:
    2024
  • 资助金额:
    $ 41.48万
  • 项目类别:
    Standard Grant
The Abundance Project: Enhancing Cultural & Green Inclusion in Social Prescribing in Southwest London to Address Ethnic Inequalities in Mental Health
丰富项目:增强文化
  • 批准号:
    AH/Z505481/1
  • 财政年份:
    2024
  • 资助金额:
    $ 41.48万
  • 项目类别:
    Research Grant
ERAMET - Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
ERAMET - 快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
  • 批准号:
    10107647
  • 财政年份:
    2024
  • 资助金额:
    $ 41.48万
  • 项目类别:
    EU-Funded
BIORETS: Convergence Research Experiences for Teachers in Synthetic and Systems Biology to Address Challenges in Food, Health, Energy, and Environment
BIORETS:合成和系统生物学教师的融合研究经验,以应对食品、健康、能源和环境方面的挑战
  • 批准号:
    2341402
  • 财政年份:
    2024
  • 资助金额:
    $ 41.48万
  • 项目类别:
    Standard Grant
Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
  • 批准号:
    10106221
  • 财政年份:
    2024
  • 资助金额:
    $ 41.48万
  • 项目类别:
    EU-Funded
Recite: Building Research by Communities to Address Inequities through Expression
背诵:社区开展研究,通过表达解决不平等问题
  • 批准号:
    AH/Z505341/1
  • 财政年份:
    2024
  • 资助金额:
    $ 41.48万
  • 项目类别:
    Research Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了