DeconDTN: Deconfounding Deep Transformer Networks for Clinical NLP
DeconDTN:为临床 NLP 解构深度 Transformer 网络
基本信息
- 批准号:10467107
- 负责人:
- 金额:$ 34.53万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-06-01 至 2026-02-28
- 项目状态:未结题
- 来源:
- 关键词:AddressArchitectureAreaArtificial IntelligenceAutomobile DrivingBehaviorBridge to Artificial IntelligenceCOVID-19CaringCharacteristicsClassificationClinicalClinical ServicesCognitiveComputer softwareConfounding Factors (Epidemiology)CoupledDataData AggregationData SetData SourcesDementiaDevelopmentDiagnosisDiagnosticEnsureEquilibriumEvaluationGoalsHigh PrevalenceIndividualInstitutionInvestmentsLabelLanguageLearningLinguisticsLocationMedicalMethodsModelingModificationNatural Language ProcessingNatureNeural Network SimulationOutcomeOutputParticipantPatientsPerformancePhysiciansPredictive textPrevalenceResearchSARS-CoV-2 positiveSamplingServicesSiteSourceSpeechSystematic BiasTestingTextTimeTrainingTranscriptUnited States National Institutes of HealthUnited States National Library of MedicineUpdateVisionWeightWorkbasecoronavirus diseasedeep learningdeep learning modeldesignheterogenous datainterestlarge datasetslearning strategyloss of functionmachine learning modelnetwork modelsnovelopen sourceopen source toolportabilitypredictive modelingprogramsrelating to nervous systemstatistical and machine learning
项目摘要
Natural Language Processing (NLP) methods have been broadly applied to clinical problems, from recognition
of clinical findings in physician notes to identification of transcribed speech samples indicating changes in
cognitive status. Deep transformer networks (DTNs) have dramatically advanced NLP accuracy. These deep
learning models have multiple hidden layers that may correspond to billions of trainable parameters, allowing
them to apply information learned from training on large unlabeled corpora to a specific task of interest. However,
their size leaves them especially vulnerable to confounding bias, induced by variables that can influence both
the predictor (text) and the outcome (e.g. an associated diagnosis) of a predictive model. Such systematic biases
are a recognized danger in the application of artificial intelligence methods to clinical problems, and are the focus
of NLM NOT-LM-19-003 which invites applications proposing methods to identify and address them. Deep
learning models in general require large amounts of training data, spurring initiatives to aggregate medical data
from across institutional siloes. This can increase data set size and enhance model portability, but leaves the
resulting models vulnerable to confounding by provenance, where models learn to recognize the origin of dataset
components and make biased predictions based on site-specific class distributions (e.g. COVID prevalence).
Such models will assign classes based on indicators of dataset provenance, rather than diagnostically
meaningful linguistic differences, and make erroneous predictions when the provenance-specific distributions at
the point of deployment differ from those in the training set. Confounding of this nature is a pervasive problem
that presents a fundamental barrier to the portability of trained models, and threatens the utility of datasets
assembled from across institutions and services. Unlike traditional statistical and machine learning models, with
deep transformer networks feature representations are distributed across parameters spread throughout the
entire network. New methods are needed to meet the challenge of identifying and mitigating the influence of
confounding variables in such models. In the proposed research we will develop a systematic approach to
Deconfounding Deep Transformer Networks (DeconDTN), embodied in an eponymous and publicly available
set of open source tools for (1) identification of provenance-related biases, (2) mitigation of these biases using
a novel set of validated methods, and (3) systematic evaluation of the resulting effects on model performance.
While DeconDTN will be generally applicable, development and evaluation will occur in the context of three use
cases involving data sets drawn from different sources: classification of speech transcripts from participants with
dementia drawn from two locations, identification of goals-of-care discussions in clinical notes drawn from
multiple studies involving a range of clinical services, and prediction of COVID-19 status in notes drawn from
different clinical units. Our driving hypothesis is that the resulting models will make more accurate predictions in
these heterogenous datasets than corresponding models without correction for confounding by provenance.
自然语言处理(NLP)方法已广泛应用于临床问题,从识别
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Trevor Cohen其他文献
Trevor Cohen的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Trevor Cohen', 18)}}的其他基金
DeconDTN: Deconfounding Deep Transformer Networks for Clinical NLP
DeconDTN:为临床 NLP 解构深度 Transformer 网络
- 批准号:
10626888 - 财政年份:2022
- 资助金额:
$ 34.53万 - 项目类别:
Professional to Plain Language Neural Translation: A Path Toward Actionable Health Information
专业到通俗语言的神经翻译:通向可行健康信息的道路
- 批准号:
10349319 - 财政年份:2022
- 资助金额:
$ 34.53万 - 项目类别:
Professional to Plain Language Neural Translation: A Path Toward Actionable Health Information
专业到通俗语言的神经翻译:通向可行健康信息的道路
- 批准号:
10579898 - 财政年份:2022
- 资助金额:
$ 34.53万 - 项目类别:
DeconDTN: Deconfounding Deep Transformer Networks for Clinical NLP
DeconDTN:为临床 NLP 解构深度 Transformer 网络
- 批准号:
10711315 - 财政年份:2022
- 资助金额:
$ 34.53万 - 项目类别:
Computerized assessment of linguistic indicators of lucidity in Alzheimer's Disease dementia
阿尔茨海默病痴呆症语言清醒度指标的计算机化评估
- 批准号:
10093304 - 财政年份:2020
- 资助金额:
$ 34.53万 - 项目类别:
Using Biomedical Knowledge to Identify Plausible Signals for Pharmacovigilance
利用生物医学知识识别药物警戒的合理信号
- 批准号:
8914098 - 财政年份:2013
- 资助金额:
$ 34.53万 - 项目类别:
Using Biomedical Knowledge to Identify Plausible Signals for Pharmacovigilance
利用生物医学知识识别药物警戒的合理信号
- 批准号:
8727094 - 财政年份:2013
- 资助金额:
$ 34.53万 - 项目类别:
Encoding Semantic Knowledge in Vector Space for Biomedical Information
在生物医学信息的向量空间中编码语义知识
- 批准号:
8138564 - 财政年份:2010
- 资助金额:
$ 34.53万 - 项目类别:
Encoding Semantic Knowledge in Vector Space for Biomedical Information
在生物医学信息的向量空间中编码语义知识
- 批准号:
7977263 - 财政年份:2010
- 资助金额:
$ 34.53万 - 项目类别:
相似海外基金
Practical Study on Disaster Countermeasure Architecture Model by Sustainable Design in Asian Flood Area
亚洲洪泛区可持续设计防灾建筑模型实践研究
- 批准号:
17K00727 - 财政年份:2017
- 资助金额:
$ 34.53万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Functional architecture of a face processing area in the common marmoset
普通狨猴面部处理区域的功能架构
- 批准号:
9764503 - 财政年份:2016
- 资助金额:
$ 34.53万 - 项目类别:
Heating and airconditioning by hypocausts in residential and representative architecture in Rome and Latium studies of a phenomenon of luxury in a favoured climatic area of the Roman Empire on the basis of selected examples.
罗马和拉齐奥的住宅和代表性建筑中的火烧供暖和空调根据选定的例子,研究了罗马帝国有利的气候地区的奢华现象。
- 批准号:
317469425 - 财政年份:2016
- 资助金额:
$ 34.53万 - 项目类别:
Research Grants
SBIR Phase II: Area and Energy Efficient Error Floor Free Low-Density Parity-Check Codes Decoder Architecture for Flash Based Storage
SBIR 第二阶段:用于基于闪存的存储的面积和能源效率高、无错误层的低密度奇偶校验码解码器架构
- 批准号:
1632562 - 财政年份:2016
- 资助金额:
$ 34.53万 - 项目类别:
Standard Grant
SBIR Phase I: Area and Energy Efficient Error Floor Free Low-Density Parity-Check Codes Decoder Architecture for Flash Based Storage
SBIR 第一阶段:用于基于闪存的存储的面积和能源效率高、无错误层低密度奇偶校验码解码器架构
- 批准号:
1520137 - 财政年份:2015
- 资助金额:
$ 34.53万 - 项目类别:
Standard Grant
A Study on The Spatial Setting and The Inhavitant's of The Flood Prevention Architecture in The Flood Area
洪泛区防洪建筑空间设置及居民生活研究
- 批准号:
26420620 - 财政年份:2014
- 资助金额:
$ 34.53万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Area and power efficient interconnect architecture for multi-bit processing on FPGAs
用于 FPGA 上多位处理的面积和功率高效互连架构
- 批准号:
327691-2007 - 财政年份:2011
- 资助金额:
$ 34.53万 - 项目类别:
Discovery Grants Program - Individual
A FUNDAMENTAL STUDY ON UTILIZATION OF THE POST-WAR ARCHITECTURE AS URBAN REGENERATION METHOD, A case of the central area of Osaka city
战后建筑作为城市更新方法的基础研究——以大阪市中心区为例
- 批准号:
22760469 - 财政年份:2010
- 资助金额:
$ 34.53万 - 项目类别:
Grant-in-Aid for Young Scientists (B)
Area and power efficient interconnect architecture for multi-bit processing on FPGAs
用于 FPGA 上多位处理的面积和功率高效互连架构
- 批准号:
327691-2007 - 财政年份:2010
- 资助金额:
$ 34.53万 - 项目类别:
Discovery Grants Program - Individual
Area and power efficient interconnect architecture for multi-bit processing on FPGAs
用于 FPGA 上多位处理的面积和功率高效互连架构
- 批准号:
327691-2007 - 财政年份:2009
- 资助金额:
$ 34.53万 - 项目类别:
Discovery Grants Program - Individual