DeconDTN: Deconfounding Deep Transformer Networks for Clinical NLP

DeconDTN:为临床 NLP 解构深度 Transformer 网络

基本信息

  • 批准号:
    10626888
  • 负责人:
  • 金额:
    $ 34.2万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2022
  • 资助国家:
    美国
  • 起止时间:
    2022-06-01 至 2026-02-28
  • 项目状态:
    未结题

项目摘要

Natural Language Processing (NLP) methods have been broadly applied to clinical problems, from recognition of clinical findings in physician notes to identification of transcribed speech samples indicating changes in cognitive status. Deep transformer networks (DTNs) have dramatically advanced NLP accuracy. These deep learning models have multiple hidden layers that may correspond to billions of trainable parameters, allowing them to apply information learned from training on large unlabeled corpora to a specific task of interest. However, their size leaves them especially vulnerable to confounding bias, induced by variables that can influence both the predictor (text) and the outcome (e.g. an associated diagnosis) of a predictive model. Such systematic biases are a recognized danger in the application of artificial intelligence methods to clinical problems, and are the focus of NLM NOT-LM-19-003 which invites applications proposing methods to identify and address them. Deep learning models in general require large amounts of training data, spurring initiatives to aggregate medical data from across institutional siloes. This can increase data set size and enhance model portability, but leaves the resulting models vulnerable to confounding by provenance, where models learn to recognize the origin of dataset components and make biased predictions based on site-specific class distributions (e.g. COVID prevalence). Such models will assign classes based on indicators of dataset provenance, rather than diagnostically meaningful linguistic differences, and make erroneous predictions when the provenance-specific distributions at the point of deployment differ from those in the training set. Confounding of this nature is a pervasive problem that presents a fundamental barrier to the portability of trained models, and threatens the utility of datasets assembled from across institutions and services. Unlike traditional statistical and machine learning models, with deep transformer networks feature representations are distributed across parameters spread throughout the entire network. New methods are needed to meet the challenge of identifying and mitigating the influence of confounding variables in such models. In the proposed research we will develop a systematic approach to Deconfounding Deep Transformer Networks (DeconDTN), embodied in an eponymous and publicly available set of open source tools for (1) identification of provenance-related biases, (2) mitigation of these biases using a novel set of validated methods, and (3) systematic evaluation of the resulting effects on model performance. While DeconDTN will be generally applicable, development and evaluation will occur in the context of three use cases involving data sets drawn from different sources: classification of speech transcripts from participants with dementia drawn from two locations, identification of goals-of-care discussions in clinical notes drawn from multiple studies involving a range of clinical services, and prediction of COVID-19 status in notes drawn from different clinical units. Our driving hypothesis is that the resulting models will make more accurate predictions in these heterogenous datasets than corresponding models without correction for confounding by provenance.
自然语言处理(NLP)方法已广泛应用于临床问题,从识别 医生笔记中的临床发现,以识别转录的语音样本,表明 认知状态深度Transformer网络(DTN)极大地提高了NLP精度。这些深 学习模型具有多个隐藏层,这些隐藏层可能对应于数十亿个可训练参数, 他们将从大型未标记语料库的训练中学到的信息应用于感兴趣的特定任务。然而,在这方面, 它们的规模使它们特别容易受到混杂偏差的影响,这些偏差是由可以影响两者的变量引起的。 预测模型的预测因子(文本)和结果(例如相关诊断)。这种系统性的偏见 是人工智能方法应用于临床问题的公认危险, NLM NOT-LM-19-003的申请,其邀请提出识别和解决它们的方法的申请。深 学习模型通常需要大量的训练数据,这促使人们主动聚合医疗数据 从各个机构的筒仓中。这可以增加数据集的大小并增强模型的可移植性,但 由此产生的模型容易受到来源的混淆,其中模型学习识别数据集的来源 成分,并根据特定地点的类别分布(例如COVID患病率)进行有偏见的预测。 这样的模型将根据数据集来源的指示器而不是诊断性地分配类别 有意义的语言差异,并作出错误的预测时,种源特定的分布在 部署点与训练集中的部署点不同。这种性质的混淆是一个普遍存在的问题 这对训练模型的可移植性构成了根本性障碍,并威胁到数据集的实用性 来自不同机构和服务部门。与传统的统计和机器学习模型不同, 深层Transformer网络要素表示分布在整个 整个网络。需要新的方法来应对确定和减轻环境影响的挑战。 这些模型中的混杂变量。在拟议的研究中,我们将开发一种系统的方法, Deconfounding Deep Transformer Networks(DeconDTN),包含在一个公开的ependix中 一套开源工具,用于(1)识别与来源相关的偏差,(2)使用 一套新的验证方法,和(3)对模型性能的影响进行系统评估。 虽然DeconDTN将普遍适用,但开发和评估将在以下三个方面进行: 涉及来自不同来源的数据集的案例: 痴呆症从两个地方,确定目标的护理讨论的临床笔记,从 涉及一系列临床服务的多项研究,以及对COVID-19状态的预测, 不同的临床单位。我们的驱动假设是,由此产生的模型将作出更准确的预测, 这些异质的数据集比相应的模型,而不校正混杂的起源。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Trevor Cohen其他文献

Trevor Cohen的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Trevor Cohen', 18)}}的其他基金

Professional to Plain Language Neural Translation: A Path Toward Actionable Health Information
专业到通俗语言的神经翻译:通向可行健康信息的道路
  • 批准号:
    10349319
  • 财政年份:
    2022
  • 资助金额:
    $ 34.2万
  • 项目类别:
Professional to Plain Language Neural Translation: A Path Toward Actionable Health Information
专业到通俗语言的神经翻译:通向可行健康信息的道路
  • 批准号:
    10579898
  • 财政年份:
    2022
  • 资助金额:
    $ 34.2万
  • 项目类别:
DeconDTN: Deconfounding Deep Transformer Networks for Clinical NLP
DeconDTN:为临床 NLP 解构深度 Transformer 网络
  • 批准号:
    10467107
  • 财政年份:
    2022
  • 资助金额:
    $ 34.2万
  • 项目类别:
DeconDTN: Deconfounding Deep Transformer Networks for Clinical NLP
DeconDTN:为临床 NLP 解构深度 Transformer 网络
  • 批准号:
    10711315
  • 财政年份:
    2022
  • 资助金额:
    $ 34.2万
  • 项目类别:
Computerized assessment of linguistic indicators of lucidity in Alzheimer's Disease dementia
阿尔茨海默病痴呆症语言清醒度指标的计算机化评估
  • 批准号:
    10093304
  • 财政年份:
    2020
  • 资助金额:
    $ 34.2万
  • 项目类别:
Using Biomedical Knowledge to Identify Plausible Signals for Pharmacovigilance
利用生物医学知识识别药物警戒的合理信号
  • 批准号:
    8914098
  • 财政年份:
    2013
  • 资助金额:
    $ 34.2万
  • 项目类别:
Using Biomedical Knowledge to Identify Plausible Signals for Pharmacovigilance
利用生物医学知识识别药物警戒的合理信号
  • 批准号:
    8727094
  • 财政年份:
    2013
  • 资助金额:
    $ 34.2万
  • 项目类别:
Encoding Semantic Knowledge in Vector Space for Biomedical Information
在生物医学信息的向量空间中编码语义知识
  • 批准号:
    8138564
  • 财政年份:
    2010
  • 资助金额:
    $ 34.2万
  • 项目类别:
Encoding Semantic Knowledge in Vector Space for Biomedical Information
在生物医学信息的向量空间中编码语义知识
  • 批准号:
    7977263
  • 财政年份:
    2010
  • 资助金额:
    $ 34.2万
  • 项目类别:

相似海外基金

Practical Study on Disaster Countermeasure Architecture Model by Sustainable Design in Asian Flood Area
亚洲洪泛区可持续设计防灾建筑模型实践研究
  • 批准号:
    17K00727
  • 财政年份:
    2017
  • 资助金额:
    $ 34.2万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Functional architecture of a face processing area in the common marmoset
普通狨猴面部处理区域的功能架构
  • 批准号:
    9764503
  • 财政年份:
    2016
  • 资助金额:
    $ 34.2万
  • 项目类别:
Heating and airconditioning by hypocausts in residential and representative architecture in Rome and Latium studies of a phenomenon of luxury in a favoured climatic area of the Roman Empire on the basis of selected examples.
罗马和拉齐奥的住宅和代表性建筑中的火烧供暖和空调根据选定的例子,研究了罗马帝国有利的气候地区的奢华现象。
  • 批准号:
    317469425
  • 财政年份:
    2016
  • 资助金额:
    $ 34.2万
  • 项目类别:
    Research Grants
SBIR Phase II: Area and Energy Efficient Error Floor Free Low-Density Parity-Check Codes Decoder Architecture for Flash Based Storage
SBIR 第二阶段:用于基于闪存的存储的面积和能源效率高、无错误层的低密度奇偶校验码解码器架构
  • 批准号:
    1632562
  • 财政年份:
    2016
  • 资助金额:
    $ 34.2万
  • 项目类别:
    Standard Grant
SBIR Phase I: Area and Energy Efficient Error Floor Free Low-Density Parity-Check Codes Decoder Architecture for Flash Based Storage
SBIR 第一阶段:用于基于闪存的存储的面积和能源效率高、无错误层低密度奇偶校验码解码器架构
  • 批准号:
    1520137
  • 财政年份:
    2015
  • 资助金额:
    $ 34.2万
  • 项目类别:
    Standard Grant
A Study on The Spatial Setting and The Inhavitant's of The Flood Prevention Architecture in The Flood Area
洪泛区防洪建筑空间设置及居民生活研究
  • 批准号:
    26420620
  • 财政年份:
    2014
  • 资助金额:
    $ 34.2万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Area and power efficient interconnect architecture for multi-bit processing on FPGAs
用于 FPGA 上多位处理的面积和功率高效互连架构
  • 批准号:
    327691-2007
  • 财政年份:
    2011
  • 资助金额:
    $ 34.2万
  • 项目类别:
    Discovery Grants Program - Individual
A FUNDAMENTAL STUDY ON UTILIZATION OF THE POST-WAR ARCHITECTURE AS URBAN REGENERATION METHOD, A case of the central area of Osaka city
战后建筑作为城市更新方法的基础研究——以大阪市中心区为例
  • 批准号:
    22760469
  • 财政年份:
    2010
  • 资助金额:
    $ 34.2万
  • 项目类别:
    Grant-in-Aid for Young Scientists (B)
Area and power efficient interconnect architecture for multi-bit processing on FPGAs
用于 FPGA 上多位处理的面积和功率高效互连架构
  • 批准号:
    327691-2007
  • 财政年份:
    2010
  • 资助金额:
    $ 34.2万
  • 项目类别:
    Discovery Grants Program - Individual
Area and power efficient interconnect architecture for multi-bit processing on FPGAs
用于 FPGA 上多位处理的面积和功率高效互连架构
  • 批准号:
    327691-2007
  • 财政年份:
    2009
  • 资助金额:
    $ 34.2万
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了