权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Pretrained Transformers for Effective and Efficient Information Access: BERT and Beyond

用于有效和高效信息访问的预训练 Transformer：BERT 及其他

基本信息

批准号：
RGPIN-2021-02490
负责人：
Lin, Jimmy
金额：
$ 6.12万
依托单位：
University of Waterloo
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=750214
关键词：
Pretrained Transformers Effective Efficient Information

项目摘要

Users expect search systems that are fast (i.e., efficient) and return good results (i.e., effective), but these features are often in tension: effective in-depth content analysis can be slow, and fast systems often sacrifice quality. Building on a quarter of a century of experience developing techniques and building systems that connect users to relevant information, this proposal will lead to groundbreaking techniques for search and question answering (QA) that are effective as well as efficient. These capabilities will be demonstrated in high-impact applications on scientific texts such as the literature on coronaviruses or hydrology. Overall, my research will advance the scientific frontiers of both natural language processing (NLP) and information retrieval (IR). This proposal formulates search and question answering as ranking problems, and adopts an approach based on deep learning using a class of neural network architectures known as transformers. I will pursue the overall research vision from two complementary perspectives: (A) From the perspective of effectiveness, my research group will develop retrieval-specific self-supervision techniques and gain a better understanding of why transformers work as the basis for building improved ranking models. (B) From the perspective of efficiency, this research will build learned representations for ranking that are amenable to simple vector comparisons and develop model distillation techniques for accelerated inference. These separate threads will come together (C) in a principled framework for reasoning about effectiveness/efficiency tradeoffs, (D) demonstrated in applications that provide search and QA capabilities to literature in three scientific domains: biomedicine, hydrology, and artificial intelligence. Deployed prototypes will serve as a testbed for research and provide useful tools for stakeholders. Efforts along these lines have already begun: shortly after the start of the global COVID-19 pandemic, I led the development of Covidex (covidex.ai), an online and publicly accessible search engine for a collection of scientific articles related to coronaviruses. Such a system could be valuable, for example, to public health officials assessing the efficacy of different interventions and clinicians conducting meta-analyses. This work will achieve impact in four ways: (1) scientific innovations in the form of breakthroughs in deep learning methods that tackle search and QA, (2) computational artifacts in the form of open-source code, data, and models that will help foster adoption of the innovations arising from this research, (3) real-world applications for searching scientific literature in three domains, and (4) high-quality training opportunities. These efforts will contribute to the Canadian economy by enriching the ecosystem of ideas and talent around artificial intelligence and data science, complementing investments in these fields at the federal and provincial levels.

用户期望搜索系统快速（即高效）并返回良好的结果（即有效），但这些功能通常是紧张的：有效的深度内容分析可能很慢，而快速的系统通常会牺牲质量。基于四分之一个世纪的开发技术和构建将用户与相关信息连接起来的系统的经验，该提案将导致搜索和问答（QA）的突破性技术，这些技术既有效又高效。这些能力将在冠状病毒或水文学等科学文献的高影响力应用中得到展示。总的来说，我的研究将推动自然语言处理（NLP）和信息检索（IR）的科学前沿。该建议将搜索和问答作为排序问题，并采用基于深度学习的方法，使用一类称为变压器的神经网络架构。我将从两个互补的角度来追求整体的研究愿景：(A)从有效性的角度来看，我的研究小组将开发特定于检索的自我监督技术，并更好地理解为什么变压器作为构建改进排名模型的基础。(B)从效率的角度来看，本研究将构建适合简单向量比较的排名学习表示，并开发用于加速推理的模型蒸馏技术。这些单独的线索将汇集在一起(C)在一个关于有效性/效率权衡的原则框架中进行推理，(D)在三个科学领域（生物医学、水文学和人工智能）的文献中提供搜索和QA功能的应用程序中进行演示。部署的原型将作为研究的测试平台，并为利益相关者提供有用的工具。这方面的工作已经开始：在全球COVID-19大流行开始后不久，我领导了Covidex （Covidex）的开发。Ai)是一个可公开访问的在线搜索引擎，用于收集与冠状病毒相关的科学文章。这样的系统可能是有价值的，例如，对公共卫生官员评估不同干预措施的效果和临床医生进行荟萃分析。这项工作将在四个方面产生影响：(1)以解决搜索和QA问题的深度学习方法突破为形式的科学创新，(2)以开源代码、数据和模型为形式的计算工件，这将有助于促进采用本研究产生的创新，(3)在三个领域搜索科学文献的实际应用，以及(4)高质量的培训机会。这些努力将通过丰富围绕人工智能和数据科学的思想和人才生态系统，补充联邦和省级在这些领域的投资，为加拿大经济做出贡献。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Lin, Jimmy其他文献

The proper care and feeding of CAMELS: How limited training data affects streamflow prediction

DOI：
10.1016/j.envsoft.2020.104926
发表时间：
2021-01-01
期刊：
ENVIRONMENTAL MODELLING & SOFTWARE
影响因子：
4.9
作者：
Gauch, Martin;Mai, Juliane;Lin, Jimmy
通讯作者：
Lin, Jimmy

Identification of tissue-specific cis-regulatory modules based on interactions between transcription factors

DOI：
10.1186/1471-2105-8-437
发表时间：
2007-11-09
期刊：
BMC BIOINFORMATICS
影响因子：
3
作者：
Yu, Xueping;Lin, Jimmy;Qian, Jiang
通讯作者：
Qian, Jiang

Design and analysis issues in genome-wide somatic mutation studies of cancer.

DOI：
10.1016/j.ygeno.2008.07.005
发表时间：
2009-01
期刊：
GENOMICS
影响因子：
4.4
作者：
Parmigiani, Giovanni;Boca, Simina;Lin, Jimmy;Kinzler, Kenneth W.;Velculescu, VicLor;Vogelstein, Bert
通讯作者：
Vogelstein, Bert