DeconDTN: Deconfounding Deep Transformer Networks for Clinical NLP

DeconDTN:为临床 NLP 解构深度 Transformer 网络

基本信息

  • 批准号:
    10626888
  • 负责人:
  • 金额:
    $ 34.2万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2022
  • 资助国家:
    美国
  • 起止时间:
    2022-06-01 至 2026-02-28
  • 项目状态:
    未结题

项目摘要

Natural Language Processing (NLP) methods have been broadly applied to clinical problems, from recognition of clinical findings in physician notes to identification of transcribed speech samples indicating changes in cognitive status. Deep transformer networks (DTNs) have dramatically advanced NLP accuracy. These deep learning models have multiple hidden layers that may correspond to billions of trainable parameters, allowing them to apply information learned from training on large unlabeled corpora to a specific task of interest. However, their size leaves them especially vulnerable to confounding bias, induced by variables that can influence both the predictor (text) and the outcome (e.g. an associated diagnosis) of a predictive model. Such systematic biases are a recognized danger in the application of artificial intelligence methods to clinical problems, and are the focus of NLM NOT-LM-19-003 which invites applications proposing methods to identify and address them. Deep learning models in general require large amounts of training data, spurring initiatives to aggregate medical data from across institutional siloes. This can increase data set size and enhance model portability, but leaves the resulting models vulnerable to confounding by provenance, where models learn to recognize the origin of dataset components and make biased predictions based on site-specific class distributions (e.g. COVID prevalence). Such models will assign classes based on indicators of dataset provenance, rather than diagnostically meaningful linguistic differences, and make erroneous predictions when the provenance-specific distributions at the point of deployment differ from those in the training set. Confounding of this nature is a pervasive problem that presents a fundamental barrier to the portability of trained models, and threatens the utility of datasets assembled from across institutions and services. Unlike traditional statistical and machine learning models, with deep transformer networks feature representations are distributed across parameters spread throughout the entire network. New methods are needed to meet the challenge of identifying and mitigating the influence of confounding variables in such models. In the proposed research we will develop a systematic approach to Deconfounding Deep Transformer Networks (DeconDTN), embodied in an eponymous and publicly available set of open source tools for (1) identification of provenance-related biases, (2) mitigation of these biases using a novel set of validated methods, and (3) systematic evaluation of the resulting effects on model performance. While DeconDTN will be generally applicable, development and evaluation will occur in the context of three use cases involving data sets drawn from different sources: classification of speech transcripts from participants with dementia drawn from two locations, identification of goals-of-care discussions in clinical notes drawn from multiple studies involving a range of clinical services, and prediction of COVID-19 status in notes drawn from different clinical units. Our driving hypothesis is that the resulting models will make more accurate predictions in these heterogenous datasets than corresponding models without correction for confounding by provenance.
自然语言处理(NLP)方法已经被广泛地应用于临床问题,从识别 医生笔记中的临床发现,以识别转录的语音样本,表明变化 认知状态。深度变压器网络(DTN)极大地提高了NLP精度。这些深渊 学习模型有多个隐藏层,可能对应于数十亿个可训练参数,从而 让他们将从大型未标记语料库的培训中学到的信息应用于感兴趣的特定任务。然而, 它们的大小使它们特别容易受到混杂偏差的影响,这些偏差是由可能影响两者的变量引起的 预测模型的预测器(文本)和结果(例如,相关诊断)。这种系统性的偏见 在将人工智能方法应用于临床问题时是公认的危险,并且是焦点 NLMNOT-LM-19-003,邀请提出识别和解决这些问题的方法的申请。深沉 一般来说,学习模型需要大量的训练数据,这促使人们主动收集医疗数据 来自不同的机构竖井。这可以增加数据集大小并增强模型的可移植性,但 产生的模型容易受到来源的混淆,其中模型学习识别数据集的来源 并根据站点特定的类别分布(例如,COVID流行情况)做出有偏见的预测。 此类模型将基于数据集起源的指标而不是诊断性地分配类 有意义的语言差异,并在种源特定分布在 部署的要点与培训集中的不同。这种性质的混淆是一个普遍存在的问题。 这给训练好的模型的可移植性带来了根本障碍,并威胁到数据集的实用性 来自不同机构和服务的人员。与传统的统计和机器学习模型不同, 深层变压器网络要素制图表达分布在遍布的参数中 整个网络。需要新的方法来应对识别和减轻病毒影响的挑战 这类模型中的混杂变量。在拟议的研究中,我们将开发一种系统的方法来 取消创建深层变压器网络(DeconDTN),体现在一个同名的和公开可用的 一套开放源码工具,用于(1)识别与产地有关的偏见,(2)使用 一套新的验证方法,以及(3)对由此产生的对模型性能的影响进行系统评估。 虽然DeconDTN将普遍适用,但开发和评估将在三种用途的背景下进行 涉及来自不同来源的数据集的案例:参与者的语音记录分类 来自两个地点的痴呆症,在临床笔记中确定护理目标的讨论 涉及一系列临床服务的多项研究,以及对新冠肺炎状态的预测 不同的临床单位。我们的驱动假设是,由此产生的模型将在 这些异质性的数据集比相应的模型没有校正的来源混杂。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Trevor Cohen其他文献

Trevor Cohen的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Trevor Cohen', 18)}}的其他基金

Professional to Plain Language Neural Translation: A Path Toward Actionable Health Information
专业到通俗语言的神经翻译:通向可行健康信息的道路
  • 批准号:
    10349319
  • 财政年份:
    2022
  • 资助金额:
    $ 34.2万
  • 项目类别:
Professional to Plain Language Neural Translation: A Path Toward Actionable Health Information
专业到通俗语言的神经翻译:通向可行健康信息的道路
  • 批准号:
    10579898
  • 财政年份:
    2022
  • 资助金额:
    $ 34.2万
  • 项目类别:
DeconDTN: Deconfounding Deep Transformer Networks for Clinical NLP
DeconDTN:为临床 NLP 解构深度 Transformer 网络
  • 批准号:
    10467107
  • 财政年份:
    2022
  • 资助金额:
    $ 34.2万
  • 项目类别:
DeconDTN: Deconfounding Deep Transformer Networks for Clinical NLP
DeconDTN:为临床 NLP 解构深度 Transformer 网络
  • 批准号:
    10711315
  • 财政年份:
    2022
  • 资助金额:
    $ 34.2万
  • 项目类别:
Computerized assessment of linguistic indicators of lucidity in Alzheimer's Disease dementia
阿尔茨海默病痴呆症语言清醒度指标的计算机化评估
  • 批准号:
    10093304
  • 财政年份:
    2020
  • 资助金额:
    $ 34.2万
  • 项目类别:
Using Biomedical Knowledge to Identify Plausible Signals for Pharmacovigilance
利用生物医学知识识别药物警戒的合理信号
  • 批准号:
    8914098
  • 财政年份:
    2013
  • 资助金额:
    $ 34.2万
  • 项目类别:
Using Biomedical Knowledge to Identify Plausible Signals for Pharmacovigilance
利用生物医学知识识别药物警戒的合理信号
  • 批准号:
    8727094
  • 财政年份:
    2013
  • 资助金额:
    $ 34.2万
  • 项目类别:
Encoding Semantic Knowledge in Vector Space for Biomedical Information
在生物医学信息的向量空间中编码语义知识
  • 批准号:
    8138564
  • 财政年份:
    2010
  • 资助金额:
    $ 34.2万
  • 项目类别:
Encoding Semantic Knowledge in Vector Space for Biomedical Information
在生物医学信息的向量空间中编码语义知识
  • 批准号:
    7977263
  • 财政年份:
    2010
  • 资助金额:
    $ 34.2万
  • 项目类别:

相似海外基金

Practical Study on Disaster Countermeasure Architecture Model by Sustainable Design in Asian Flood Area
亚洲洪泛区可持续设计防灾建筑模型实践研究
  • 批准号:
    17K00727
  • 财政年份:
    2017
  • 资助金额:
    $ 34.2万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Functional architecture of a face processing area in the common marmoset
普通狨猴面部处理区域的功能架构
  • 批准号:
    9764503
  • 财政年份:
    2016
  • 资助金额:
    $ 34.2万
  • 项目类别:
Heating and airconditioning by hypocausts in residential and representative architecture in Rome and Latium studies of a phenomenon of luxury in a favoured climatic area of the Roman Empire on the basis of selected examples.
罗马和拉齐奥的住宅和代表性建筑中的火烧供暖和空调根据选定的例子,研究了罗马帝国有利的气候地区的奢华现象。
  • 批准号:
    317469425
  • 财政年份:
    2016
  • 资助金额:
    $ 34.2万
  • 项目类别:
    Research Grants
SBIR Phase II: Area and Energy Efficient Error Floor Free Low-Density Parity-Check Codes Decoder Architecture for Flash Based Storage
SBIR 第二阶段:用于基于闪存的存储的面积和能源效率高、无错误层的低密度奇偶校验码解码器架构
  • 批准号:
    1632562
  • 财政年份:
    2016
  • 资助金额:
    $ 34.2万
  • 项目类别:
    Standard Grant
SBIR Phase I: Area and Energy Efficient Error Floor Free Low-Density Parity-Check Codes Decoder Architecture for Flash Based Storage
SBIR 第一阶段:用于基于闪存的存储的面积和能源效率高、无错误层低密度奇偶校验码解码器架构
  • 批准号:
    1520137
  • 财政年份:
    2015
  • 资助金额:
    $ 34.2万
  • 项目类别:
    Standard Grant
A Study on The Spatial Setting and The Inhavitant's of The Flood Prevention Architecture in The Flood Area
洪泛区防洪建筑空间设置及居民生活研究
  • 批准号:
    26420620
  • 财政年份:
    2014
  • 资助金额:
    $ 34.2万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Area and power efficient interconnect architecture for multi-bit processing on FPGAs
用于 FPGA 上多位处理的面积和功率高效互连架构
  • 批准号:
    327691-2007
  • 财政年份:
    2011
  • 资助金额:
    $ 34.2万
  • 项目类别:
    Discovery Grants Program - Individual
A FUNDAMENTAL STUDY ON UTILIZATION OF THE POST-WAR ARCHITECTURE AS URBAN REGENERATION METHOD, A case of the central area of Osaka city
战后建筑作为城市更新方法的基础研究——以大阪市中心区为例
  • 批准号:
    22760469
  • 财政年份:
    2010
  • 资助金额:
    $ 34.2万
  • 项目类别:
    Grant-in-Aid for Young Scientists (B)
Area and power efficient interconnect architecture for multi-bit processing on FPGAs
用于 FPGA 上多位处理的面积和功率高效互连架构
  • 批准号:
    327691-2007
  • 财政年份:
    2010
  • 资助金额:
    $ 34.2万
  • 项目类别:
    Discovery Grants Program - Individual
Area and power efficient interconnect architecture for multi-bit processing on FPGAs
用于 FPGA 上多位处理的面积和功率高效互连架构
  • 批准号:
    327691-2007
  • 财政年份:
    2009
  • 资助金额:
    $ 34.2万
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了