CAREER: Structured Scientific Evidence Extraction: Models and Corpora
职业:结构化科学证据提取:模型和语料库
基本信息
- 批准号:1750978
- 负责人:
- 金额:$ 54.99万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-07-01 至 2024-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Scientific evidence is primarily disseminated in free-text journal articles. Drawing upon this evidence to make decisions or inform policies therefore requires perusing relevant articles and manually extracting the findings of interest. Unfortunately, this process is time-consuming and has not scaled to meet the demands imposed by the torrential expansion of the scientific evidence base. This work seeks to design novel Natural Language Processing (NLP) methods that can automatically "read" and make sense of unstructured published scientific evidence. This is critically important because decisions by policy-makers, care-givers and individuals should be informed by the entirety of the relevant published scientific evidence; but because evidence is predominantly unstructured -- and hence not directly actionable -- this is currently impossible in practice. Consider clinical medicine, an important example which serves as the target domain of this proposal (although the framework and models will generalize to other scientific areas). Roughly 100 articles describing trials were published every single day in 2015. Healthcare professionals cannot possibly make sense of this, and thus treatment decisions must be made without full consideration of the available evidence. Methods that can automatically infer from this torrential mass of unstructured literature which treatments are actually supported by the evidence would facilitate better, evidence-based decisions. Toward this end, this research seeks to design NLP models capable of mapping from natural language scientific articles describing studies or trials to structured "evidence frames" that codify the interventions and outcomes studied, and the reported findings concerning these. NLP technology is not presently up to this task. Therefore, this project will support core methodological contributions that will advance systems for data extraction and machine reading of lengthy articles; these will have impact beyond the present motivating application. From a technical perspective, the focus of this work concerns developing novel, interpretable (transparent) neural network models for extraction from and inference over lengthy articles. Specifically, this project aims to design models that can automatically identify treatments and associated outcomes from free-texts, and then infer the reported comparative effects of the former with respect to the latter. This pushes against limits of existing language technology capabilities. In particular, this necessitates models that perform deep analysis of individual, potentially lengthy, technical documents. Furthermore, model transparency is critical here, as domain experts must be able to recover from where in documents evidential claims were inferred. New corpora curated for this project (to be shared with the broader community) will facilitate core NLP research on such models. To realize the aforementioned methodological aims, the researchers leading this project will develop conditional and dynamic "attentive" neural models. Specific methodological lines of research to be explored include: (i) Models equipped with conditional, sparse attention mechanisms over textual units that reflect scientific discourse structure to achieve accurate and transparent extraction of, and inference concerning, reported evidence. (ii) Neural sequence tagging models that take multiple 'reads' of a text, exploiting iteratively adjusted conditional document representations as global context to inform local predictions. A project website (http://www.byronwallace.com/evidence-extraction) provides access to papers, datasets and other project outputs.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
科学证据主要通过自由文本的期刊文章传播。因此,利用这些证据做出决定或为政策提供信息需要仔细阅读相关文章并手动提取感兴趣的发现。不幸的是,这一过程很耗时,而且规模还不能满足科学证据基础迅速扩大所带来的要求。这项工作旨在设计新颖的自然语言处理(NLP)方法,可以自动“读取”并理解非结构化的已发表科学证据。这一点至关重要,因为决策者、照护者和个人的决策应参考已发表的全部相关科学证据;但由于证据主要是非结构化的,因此不能直接提起诉讼,这在实践中目前是不可能的。以临床医学为例,这是本提案的目标领域(尽管框架和模型将推广到其他科学领域)。2015年,每天大约有100篇描述试验的文章发表。医疗保健专业人员不可能理解这一点,因此必须在没有充分考虑现有证据的情况下做出治疗决定。能够从大量的非结构化文献中自动推断出哪些治疗方法实际上得到了证据的支持的方法,将有助于做出更好的、基于证据的决策。为此,本研究试图设计NLP模型,能够将描述研究或试验的自然语言科学文章映射到结构化的“证据框架”,这些“证据框架”将研究的干预措施和结果以及与这些相关的报告结果编纂起来。NLP技术目前还不能胜任这项任务。因此,该项目将支持核心方法贡献,这将推进数据提取和机器阅读长篇文章的系统;这些将产生超出当前激励应用程序的影响。从技术角度来看,这项工作的重点是开发新颖的、可解释的(透明的)神经网络模型,用于从长篇文章中提取和推理。具体来说,该项目旨在设计能够自动识别自由文本治疗和相关结果的模型,然后推断前者相对于后者的报道比较效果。这突破了现有语言技术能力的极限。特别是,这需要对个别的、可能很长的技术文档进行深入分析的模型。此外,模型透明度在这里是至关重要的,因为领域专家必须能够从文档中推断证据主张的位置恢复。为这个项目策划的新语料库(将与更广泛的社区共享)将促进这些模型的核心NLP研究。为了实现上述的方法目标,领导这个项目的研究人员将开发条件和动态的“注意”神经模型。需要探索的具体研究方法包括:(i)在反映科学话语结构的文本单元上配备有条件的、稀疏的注意机制的模型,以实现对报告证据的准确和透明的提取和推断。(ii)对文本进行多次“读取”的神经序列标记模型,利用迭代调整的条件文档表示作为全局上下文,为局部预测提供信息。项目网站(http://www.byronwallace.com/evidence-extraction)提供论文、数据集和其他项目产出。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(12)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Learning to Faithfully Rationalize by Construction
- DOI:10.18653/v1/2020.acl-main.409
- 发表时间:2020-04
- 期刊:
- 影响因子:0
- 作者:Sarthak Jain;Sarah Wiegreffe;Yuval Pinter;Byron C. Wallace
- 通讯作者:Sarthak Jain;Sarah Wiegreffe;Yuval Pinter;Byron C. Wallace
Understanding Clinical Trial Reports: Extracting Medical Entities and Their Relations
- DOI:
- 发表时间:2020-10
- 期刊:
- 影响因子:0
- 作者:Benjamin E. Nye;Jay DeYoung;Eric P. Lehman;A. Nenkova;I. Marshall;Byron C. Wallace
- 通讯作者:Benjamin E. Nye;Jay DeYoung;Eric P. Lehman;A. Nenkova;I. Marshall;Byron C. Wallace
Biomedical Interpretable Entity Representations
生物医学可解释的实体表示
- DOI:
- 发表时间:2021
- 期刊:
- 影响因子:0
- 作者:Garcia-Olano, Diego;Onoe, Yasumasa;Baldini, Ioana;Ghosh, Joydeep;Wallace, Byron C.;Varshney, Kush
- 通讯作者:Varshney, Kush
Evidence Inference 2.0: More Data, Better Models
- DOI:10.18653/v1/2020.bionlp-1.13
- 发表时间:2020-05
- 期刊:
- 影响因子:0
- 作者:Jay DeYoung;Eric P. Lehman;Benjamin E. Nye;I. Marshall;Byron C. Wallace
- 通讯作者:Jay DeYoung;Eric P. Lehman;Benjamin E. Nye;I. Marshall;Byron C. Wallace
Trialstreamer: Mapping and Browsing Medical Evidence in Real-Time.
- DOI:10.18653/v1/2020.acl-demos.9
- 发表时间:2020-07
- 期刊:
- 影响因子:0
- 作者:Nye BE;Nenkova A;Marshall IJ;Wallace BC
- 通讯作者:Wallace BC
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Byron Wallace其他文献
Edinburgh Research Explorer Living systematic reviews
爱丁堡研究探索者生活系统评论
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
James Thomas;Anna Noel;Iain J Marshall;Byron Wallace;Steven McDonald;Chris Mavergames;Paul Glasziou;I. Shemilt;Anneliese J Synnot;Tari Turner;Julian H. Elliott - 通讯作者:
Julian H. Elliott
Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews
评估法学硕士在医学系统评价中的潜在用途和危害
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Hye Sun Yun;I. Marshall;T. Trikalinos;Byron Wallace - 通讯作者:
Byron Wallace
Byron Wallace的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Byron Wallace', 18)}}的其他基金
Collaborative Research: RI: Medium: Expert-in-the-Loop Neural Summarization for Consequential Domains
合作研究:RI:中:结果领域的专家在环神经摘要
- 批准号:
2211954 - 财政年份:2022
- 资助金额:
$ 54.99万 - 项目类别:
Standard Grant
RI: Medium: Learning Disentangled Representations for Text to Aid Interpretability and Transfer
RI:媒介:学习文本的解缠表示以帮助可解释性和迁移
- 批准号:
1901117 - 财政年份:2019
- 资助金额:
$ 54.99万 - 项目类别:
Standard Grant
Collaborative research: ABI Development: Making Advanced Statistical Tools Accessible for Quantitative Research Synthesis and Discovery in Ecology and Evolutionary Biology
合作研究:ABI 开发:使先进的统计工具可用于生态学和进化生物学的定量研究综合和发现
- 批准号:
1520781 - 财政年份:2014
- 资助金额:
$ 54.99万 - 项目类别:
Standard Grant
Collaborative research: ABI Development: Making Advanced Statistical Tools Accessible for Quantitative Research Synthesis and Discovery in Ecology and Evolutionary Biology
合作研究:ABI 开发:使先进的统计工具可用于生态学和进化生物学的定量研究综合和发现
- 批准号:
1262442 - 财政年份:2013
- 资助金额:
$ 54.99万 - 项目类别:
Standard Grant
相似海外基金
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
- 批准号:
2338846 - 财政年份:2024
- 资助金额:
$ 54.99万 - 项目类别:
Continuing Grant
Study on p-type doping of ultra wide bandgap rutile-structured germanium oxide
超宽带隙金红石结构氧化锗的p型掺杂研究
- 批准号:
24K17312 - 财政年份:2024
- 资助金额:
$ 54.99万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Computing over Compressed Graph-Structured Data
压缩图结构数据的计算
- 批准号:
EP/X039447/1 - 财政年份:2024
- 资助金额:
$ 54.99万 - 项目类别:
Research Grant
Nano-structured RC Networks - A Pathway To Artificial Skin
纳米结构 RC 网络 - 人造皮肤的途径
- 批准号:
EP/Y002172/1 - 财政年份:2024
- 资助金额:
$ 54.99万 - 项目类别:
Research Grant
Efficient Federated Learning for Deep Learning Through Structured Training
通过结构化训练实现深度学习的高效联邦学习
- 批准号:
24K20845 - 财政年份:2024
- 资助金额:
$ 54.99万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Revolutionizing Tactile AI: Developing a Soft, Liquid-Structured, High Density, 3-Axis Tactile Sensor
彻底改变触觉 AI:开发柔软、液体结构、高密度、3 轴触觉传感器
- 批准号:
24K20874 - 财政年份:2024
- 资助金额:
$ 54.99万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Additive Micro/Nano-manufacturing of Structured Piezoelectric Active Materials for Intelligent Stent Monitoring
用于智能支架监测的结构化压电活性材料的增材微/纳米制造
- 批准号:
EP/Y003551/1 - 财政年份:2024
- 资助金额:
$ 54.99万 - 项目类别:
Research Grant
CAREER: Elucidating the Formation and Evolution of Metastable Phases in Fluorite-Structured Ferroelectrics using Advanced Electron Microscopy
职业:使用先进电子显微镜阐明萤石结构铁电体中亚稳相的形成和演化
- 批准号:
2338558 - 财政年份:2024
- 资助金额:
$ 54.99万 - 项目类别:
Continuing Grant
CAREER: Interfacial behavior of motile bacteria at structured liquid crystal interfaces
职业:运动细菌在结构化液晶界面的界面行为
- 批准号:
2338880 - 财政年份:2024
- 资助金额:
$ 54.99万 - 项目类别:
Continuing Grant
CAREER: Learning from Data on Structured Complexes: Products, Bundles, and Limits
职业:从结构化复合体的数据中学习:乘积、捆绑和限制
- 批准号:
2340481 - 财政年份:2024
- 资助金额:
$ 54.99万 - 项目类别:
Continuing Grant