Pretrained Transformers for Effective and Efficient Information Access: BERT and Beyond
用于有效和高效信息访问的预训练 Transformer:BERT 及其他
基本信息
- 批准号:RGPIN-2021-02490
- 负责人:
- 金额:$ 6.12万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2022
- 资助国家:加拿大
- 起止时间:2022-01-01 至 2023-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Users expect search systems that are fast (i.e., efficient) and return good results (i.e., effective), but these features are often in tension: effective in-depth content analysis can be slow, and fast systems often sacrifice quality. Building on a quarter of a century of experience developing techniques and building systems that connect users to relevant information, this proposal will lead to groundbreaking techniques for search and question answering (QA) that are effective as well as efficient. These capabilities will be demonstrated in high-impact applications on scientific texts such as the literature on coronaviruses or hydrology. Overall, my research will advance the scientific frontiers of both natural language processing (NLP) and information retrieval (IR). This proposal formulates search and question answering as ranking problems, and adopts an approach based on deep learning using a class of neural network architectures known as transformers. I will pursue the overall research vision from two complementary perspectives: (A) From the perspective of effectiveness, my research group will develop retrieval-specific self-supervision techniques and gain a better understanding of why transformers work as the basis for building improved ranking models. (B) From the perspective of efficiency, this research will build learned representations for ranking that are amenable to simple vector comparisons and develop model distillation techniques for accelerated inference. These separate threads will come together (C) in a principled framework for reasoning about effectiveness/efficiency tradeoffs, (D) demonstrated in applications that provide search and QA capabilities to literature in three scientific domains: biomedicine, hydrology, and artificial intelligence. Deployed prototypes will serve as a testbed for research and provide useful tools for stakeholders. Efforts along these lines have already begun: shortly after the start of the global COVID-19 pandemic, I led the development of Covidex (covidex.ai), an online and publicly accessible search engine for a collection of scientific articles related to coronaviruses. Such a system could be valuable, for example, to public health officials assessing the efficacy of different interventions and clinicians conducting meta-analyses. This work will achieve impact in four ways: (1) scientific innovations in the form of breakthroughs in deep learning methods that tackle search and QA, (2) computational artifacts in the form of open-source code, data, and models that will help foster adoption of the innovations arising from this research, (3) real-world applications for searching scientific literature in three domains, and (4) high-quality training opportunities. These efforts will contribute to the Canadian economy by enriching the ecosystem of ideas and talent around artificial intelligence and data science, complementing investments in these fields at the federal and provincial levels.
用户期望搜索系统快速(即高效)并返回良好的结果(即有效),但这些功能通常是紧张的:有效的深度内容分析可能很慢,而快速的系统通常会牺牲质量。基于四分之一个世纪的开发技术和构建将用户与相关信息连接起来的系统的经验,该提案将导致搜索和问答(QA)的突破性技术,这些技术既有效又高效。这些能力将在冠状病毒或水文学等科学文献的高影响力应用中得到展示。总的来说,我的研究将推动自然语言处理(NLP)和信息检索(IR)的科学前沿。该建议将搜索和问答作为排序问题,并采用基于深度学习的方法,使用一类称为变压器的神经网络架构。我将从两个互补的角度来追求整体的研究愿景:(A)从有效性的角度来看,我的研究小组将开发特定于检索的自我监督技术,并更好地理解为什么变压器作为构建改进排名模型的基础。(B)从效率的角度来看,本研究将构建适合简单向量比较的排名学习表示,并开发用于加速推理的模型蒸馏技术。这些单独的线索将汇集在一起(C)在一个关于有效性/效率权衡的原则框架中进行推理,(D)在三个科学领域(生物医学、水文学和人工智能)的文献中提供搜索和QA功能的应用程序中进行演示。部署的原型将作为研究的测试平台,并为利益相关者提供有用的工具。这方面的工作已经开始:在全球COVID-19大流行开始后不久,我领导了Covidex (Covidex)的开发。Ai)是一个可公开访问的在线搜索引擎,用于收集与冠状病毒相关的科学文章。这样的系统可能是有价值的,例如,对公共卫生官员评估不同干预措施的效果和临床医生进行荟萃分析。这项工作将在四个方面产生影响:(1)以解决搜索和QA问题的深度学习方法突破为形式的科学创新,(2)以开源代码、数据和模型为形式的计算工件,这将有助于促进采用本研究产生的创新,(3)在三个领域搜索科学文献的实际应用,以及(4)高质量的培训机会。这些努力将通过丰富围绕人工智能和数据科学的思想和人才生态系统,补充联邦和省级在这些领域的投资,为加拿大经济做出贡献。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Lin, Jimmy其他文献
The proper care and feeding of CAMELS: How limited training data affects streamflow prediction
- DOI:
10.1016/j.envsoft.2020.104926 - 发表时间:
2021-01-01 - 期刊:
- 影响因子:4.9
- 作者:
Gauch, Martin;Mai, Juliane;Lin, Jimmy - 通讯作者:
Lin, Jimmy
Identification of tissue-specific cis-regulatory modules based on interactions between transcription factors
- DOI:
10.1186/1471-2105-8-437 - 发表时间:
2007-11-09 - 期刊:
- 影响因子:3
- 作者:
Yu, Xueping;Lin, Jimmy;Qian, Jiang - 通讯作者:
Qian, Jiang
Design and analysis issues in genome-wide somatic mutation studies of cancer.
- DOI:
10.1016/j.ygeno.2008.07.005 - 发表时间:
2009-01 - 期刊:
- 影响因子:4.4
- 作者:
Parmigiani, Giovanni;Boca, Simina;Lin, Jimmy;Kinzler, Kenneth W.;Velculescu, VicLor;Vogelstein, Bert - 通讯作者:
Vogelstein, Bert
Approach Zero and Anserini at the CLEF-2021 ARQMath Track: Applying Substructure Search and BM25 on Operator Tree Path Tokens
CLEF-2021 ARQMath Track 中的 Approach Zero 和 Anserini:在算子树路径标记上应用子结构搜索和 BM25
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Zhong, Wei;Zhang, Xinyu;Xin, Ji;Zanibbi, Richard;Lin, Jimmy - 通讯作者:
Lin, Jimmy
Precise Zero-Shot Dense Retrieval without Relevance Labels
无需相关标签的精确零样本密集检索
- DOI:
10.18653/v1/2023.acl-long.99 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Gao, Luyu;Ma, Xueguang;Lin, Jimmy;Callan, Jamie - 通讯作者:
Callan, Jamie
Lin, Jimmy的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Lin, Jimmy', 18)}}的其他基金
Pretrained Transformers for Effective and Efficient Information Access: BERT and Beyond
用于有效和高效信息访问的预训练 Transformer:BERT 及其他
- 批准号:
RGPIN-2021-02490 - 财政年份:2021
- 资助金额:
$ 6.12万 - 项目类别:
Discovery Grants Program - Individual
Modeling Time, Space, and Networks for Effective and Efficient Information Retrieval
对时间、空间和网络进行建模以实现有效且高效的信息检索
- 批准号:
RGPIN-2016-04138 - 财政年份:2020
- 资助金额:
$ 6.12万 - 项目类别:
Discovery Grants Program - Individual
Modeling Time, Space, and Networks for Effective and Efficient Information Retrieval
对时间、空间和网络进行建模以实现有效且高效的信息检索
- 批准号:
RGPIN-2016-04138 - 财政年份:2019
- 资助金额:
$ 6.12万 - 项目类别:
Discovery Grants Program - Individual
Modeling Time, Space, and Networks for Effective and Efficient Information Retrieval
对时间、空间和网络进行建模以实现有效且高效的信息检索
- 批准号:
492965-2016 - 财政年份:2018
- 资助金额:
$ 6.12万 - 项目类别:
Discovery Grants Program - Accelerator Supplements
Modeling Time, Space, and Networks for Effective and Efficient Information Retrieval
对时间、空间和网络进行建模以实现有效且高效的信息检索
- 批准号:
RGPIN-2016-04138 - 财政年份:2018
- 资助金额:
$ 6.12万 - 项目类别:
Discovery Grants Program - Individual
Modeling Time, Space, and Networks for Effective and Efficient Information Retrieval
对时间、空间和网络进行建模以实现有效且高效的信息检索
- 批准号:
492965-2016 - 财政年份:2017
- 资助金额:
$ 6.12万 - 项目类别:
Discovery Grants Program - Accelerator Supplements
Modeling Time, Space, and Networks for Effective and Efficient Information Retrieval
对时间、空间和网络进行建模以实现有效且高效的信息检索
- 批准号:
RGPIN-2016-04138 - 财政年份:2017
- 资助金额:
$ 6.12万 - 项目类别:
Discovery Grants Program - Individual
Modeling Time, Space, and Networks for Effective and Efficient Information Retrieval
对时间、空间和网络进行建模以实现有效且高效的信息检索
- 批准号:
RGPIN-2016-04138 - 财政年份:2016
- 资助金额:
$ 6.12万 - 项目类别:
Discovery Grants Program - Individual
Modeling Time, Space, and Networks for Effective and Efficient Information Retrieval
对时间、空间和网络进行建模以实现有效且高效的信息检索
- 批准号:
492965-2016 - 财政年份:2016
- 资助金额:
$ 6.12万 - 项目类别:
Discovery Grants Program - Accelerator Supplements
相似海外基金
University of Nottingham and Transformers & Rectifiers Limited KTP 22_23 R4
诺丁汉大学和变形金刚
- 批准号:
10056201 - 财政年份:2024
- 资助金额:
$ 6.12万 - 项目类别:
Knowledge Transfer Partnership
Development of magnetic coated rectangular wire for reducing AC copper loss and reducing size and weight of transformers and inductors
开发磁性涂层矩形线,以减少交流铜损并减小变压器和电感器的尺寸和重量
- 批准号:
23H01397 - 财政年份:2023
- 资助金额:
$ 6.12万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Deep transformers for integrating protein sequence, structure and interaction data to predict function
用于整合蛋白质序列、结构和相互作用数据以预测功能的深度转换器
- 批准号:
2308699 - 财政年份:2023
- 资助金额:
$ 6.12万 - 项目类别:
Continuing Grant
CAREER: Multi-level Bridge Tapped Resonant (MBTR) Solid-State Transformers (SSTs)
职业:多级桥抽头谐振 (MBTR) 固态变压器 (SST)
- 批准号:
2238472 - 财政年份:2023
- 资助金额:
$ 6.12万 - 项目类别:
Continuing Grant
Collaborative Research: Digital Twin Predictive Reliability Modeling of Solid-State Transformers
合作研究:固态变压器的数字孪生预测可靠性建模
- 批准号:
2228873 - 财政年份:2023
- 资助金额:
$ 6.12万 - 项目类别:
Standard Grant
SHF: Small: Improving Efficiency of Vision Transformers via Software-Hardware Co-Design and Acceleration
SHF:小型:通过软硬件协同设计和加速提高视觉变压器的效率
- 批准号:
2233893 - 财政年份:2023
- 资助金额:
$ 6.12万 - 项目类别:
Standard Grant
Advanced Electromagnetic Analysis and High-frequency Impedance Design for Magnetic Ferrite Inductors and Transformers
适用于磁性铁氧体电感器和变压器的先进电磁分析和高频阻抗设计
- 批准号:
2322529 - 财政年份:2023
- 资助金额:
$ 6.12万 - 项目类别:
Standard Grant
Collaborative Research: Digital Twin Predictive Reliability Modeling of Solid-State Transformers
合作研究:固态变压器的数字孪生预测可靠性建模
- 批准号:
2228872 - 财政年份:2023
- 资助金额:
$ 6.12万 - 项目类别:
Standard Grant
Inherently-Safer Hybrid Power Electronics Transformers (INSPIRE)
本质安全的混合电力电子变压器 (INSPIRE)
- 批准号:
10045685 - 财政年份:2022
- 资助金额:
$ 6.12万 - 项目类别:
Grant for R&D
The Operation and Control of Solid-State Transformers for Smart Grid Applications
智能电网应用中固态变压器的运行和控制
- 批准号:
RGPIN-2018-03870 - 财政年份:2022
- 资助金额:
$ 6.12万 - 项目类别:
Discovery Grants Program - Individual