权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Structured Scientific Evidence Extraction: Models and Corpora

职业：结构化科学证据提取：模型和语料库

基本信息

批准号：
1750978
负责人：
Byron Wallace
金额：
$ 54.99万
依托单位：
Northeastern University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2018
资助国家：
美国
起止时间：
2018-07-01 至 2024-06-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1750978&HistoricalAwards=false
关键词：
CAREER Structured Scientific Evidence Extraction

项目摘要

Scientific evidence is primarily disseminated in free-text journal articles. Drawing upon this evidence to make decisions or inform policies therefore requires perusing relevant articles and manually extracting the findings of interest. Unfortunately, this process is time-consuming and has not scaled to meet the demands imposed by the torrential expansion of the scientific evidence base. This work seeks to design novel Natural Language Processing (NLP) methods that can automatically "read" and make sense of unstructured published scientific evidence. This is critically important because decisions by policy-makers, care-givers and individuals should be informed by the entirety of the relevant published scientific evidence; but because evidence is predominantly unstructured -- and hence not directly actionable -- this is currently impossible in practice. Consider clinical medicine, an important example which serves as the target domain of this proposal (although the framework and models will generalize to other scientific areas). Roughly 100 articles describing trials were published every single day in 2015. Healthcare professionals cannot possibly make sense of this, and thus treatment decisions must be made without full consideration of the available evidence. Methods that can automatically infer from this torrential mass of unstructured literature which treatments are actually supported by the evidence would facilitate better, evidence-based decisions. Toward this end, this research seeks to design NLP models capable of mapping from natural language scientific articles describing studies or trials to structured "evidence frames" that codify the interventions and outcomes studied, and the reported findings concerning these. NLP technology is not presently up to this task. Therefore, this project will support core methodological contributions that will advance systems for data extraction and machine reading of lengthy articles; these will have impact beyond the present motivating application. From a technical perspective, the focus of this work concerns developing novel, interpretable (transparent) neural network models for extraction from and inference over lengthy articles. Specifically, this project aims to design models that can automatically identify treatments and associated outcomes from free-texts, and then infer the reported comparative effects of the former with respect to the latter. This pushes against limits of existing language technology capabilities. In particular, this necessitates models that perform deep analysis of individual, potentially lengthy, technical documents. Furthermore, model transparency is critical here, as domain experts must be able to recover from where in documents evidential claims were inferred. New corpora curated for this project (to be shared with the broader community) will facilitate core NLP research on such models. To realize the aforementioned methodological aims, the researchers leading this project will develop conditional and dynamic "attentive" neural models. Specific methodological lines of research to be explored include: (i) Models equipped with conditional, sparse attention mechanisms over textual units that reflect scientific discourse structure to achieve accurate and transparent extraction of, and inference concerning, reported evidence. (ii) Neural sequence tagging models that take multiple 'reads' of a text, exploiting iteratively adjusted conditional document representations as global context to inform local predictions. A project website (http://www.byronwallace.com/evidence-extraction) provides access to papers, datasets and other project outputs.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

科学证据主要通过自由文本的期刊文章传播。因此，利用这些证据做出决定或为政策提供信息需要仔细阅读相关文章并手动提取感兴趣的发现。不幸的是，这一过程很耗时，而且规模还不能满足科学证据基础迅速扩大所带来的要求。这项工作旨在设计新颖的自然语言处理（NLP）方法，可以自动“读取”并理解非结构化的已发表科学证据。这一点至关重要，因为决策者、照护者和个人的决策应参考已发表的全部相关科学证据；但由于证据主要是非结构化的，因此不能直接提起诉讼，这在实践中目前是不可能的。以临床医学为例，这是本提案的目标领域（尽管框架和模型将推广到其他科学领域）。2015年，每天大约有100篇描述试验的文章发表。医疗保健专业人员不可能理解这一点，因此必须在没有充分考虑现有证据的情况下做出治疗决定。能够从大量的非结构化文献中自动推断出哪些治疗方法实际上得到了证据的支持的方法，将有助于做出更好的、基于证据的决策。为此，本研究试图设计NLP模型，能够将描述研究或试验的自然语言科学文章映射到结构化的“证据框架”，这些“证据框架”将研究的干预措施和结果以及与这些相关的报告结果编纂起来。NLP技术目前还不能胜任这项任务。因此，该项目将支持核心方法贡献，这将推进数据提取和机器阅读长篇文章的系统；这些将产生超出当前激励应用程序的影响。从技术角度来看，这项工作的重点是开发新颖的、可解释的（透明的）神经网络模型，用于从长篇文章中提取和推理。具体来说，该项目旨在设计能够自动识别自由文本治疗和相关结果的模型，然后推断前者相对于后者的报道比较效果。这突破了现有语言技术能力的极限。特别是，这需要对个别的、可能很长的技术文档进行深入分析的模型。此外，模型透明度在这里是至关重要的，因为领域专家必须能够从文档中推断证据主张的位置恢复。为这个项目策划的新语料库（将与更广泛的社区共享）将促进这些模型的核心NLP研究。为了实现上述的方法目标，领导这个项目的研究人员将开发条件和动态的“注意”神经模型。需要探索的具体研究方法包括：(i)在反映科学话语结构的文本单元上配备有条件的、稀疏的注意机制的模型，以实现对报告证据的准确和透明的提取和推断。（ii）对文本进行多次“读取”的神经序列标记模型，利用迭代调整的条件文档表示作为全局上下文，为局部预测提供信息。项目网站（http://www.byronwallace.com/evidence-extraction）提供论文、数据集和其他项目产出。该奖项反映了美国国家科学基金会的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（12）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Learning to Faithfully Rationalize by Construction

DOI：
10.18653/v1/2020.acl-main.409
发表时间：
2020-04
期刊：
影响因子：
0
作者：
Sarthak Jain;Sarah Wiegreffe;Yuval Pinter;Byron C. Wallace
通讯作者：
Sarthak Jain;Sarah Wiegreffe;Yuval Pinter;Byron C. Wallace

Understanding Clinical Trial Reports: Extracting Medical Entities and Their Relations

DOI：
发表时间：
2020-10
期刊：
AMIA ... Annual Symposium proceedings. AMIA Symposium
影响因子：
0
作者：
Benjamin E. Nye;Jay DeYoung;Eric P. Lehman;A. Nenkova;I. Marshall;Byron C. Wallace
通讯作者：
Benjamin E. Nye;Jay DeYoung;Eric P. Lehman;A. Nenkova;I. Marshall;Byron C. Wallace

Biomedical Interpretable Entity Representations

生物医学可解释的实体表示

DOI：
发表时间：
2021
期刊：
Proceedings of the Association for Computational Linguistics (ACL
影响因子：
0
作者：
Garcia-Olano, Diego;Onoe, Yasumasa;Baldini, Ioana;Ghosh, Joydeep;Wallace, Byron C.;Varshney, Kush
通讯作者：
Varshney, Kush

Evidence Inference 2.0: More Data, Better Models

DOI：
10.18653/v1/2020.bionlp-1.13
发表时间：
2020-05
期刊：
ArXiv
影响因子：
0
作者：
Jay DeYoung;Eric P. Lehman;Benjamin E. Nye;I. Marshall;Byron C. Wallace
通讯作者：
Jay DeYoung;Eric P. Lehman;Benjamin E. Nye;I. Marshall;Byron C. Wallace

Trialstreamer: Mapping and Browsing Medical Evidence in Real-Time.

DOI：
10.18653/v1/2020.acl-demos.9
发表时间：
2020-07
期刊：
Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting
影响因子：
0
作者：
Nye BE;Nenkova A;Marshall IJ;Wallace BC
通讯作者：
Wallace BC

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Byron Wallace其他文献

Edinburgh Research Explorer Living systematic reviews

爱丁堡研究探索者生活系统评论

DOI：
发表时间：
期刊：
影响因子：
0
作者：
James Thomas;Anna Noel;Iain J Marshall;Byron Wallace;Steven McDonald;Chris Mavergames;Paul Glasziou;I. Shemilt;Anneliese J Synnot;Tari Turner;Julian H. Elliott
通讯作者：
Julian H. Elliott