权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CRII: RI: Can Low-Bias Machine Learners Acquire English Grammar? Deep Learning and Linguistic Acceptability

CRII：RI：低偏差机器学习者能否获得英语语法？

基本信息

批准号：
1850208
负责人：
Samuel Bowman
金额：
$ 17.49万
依托单位：
New York University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2019
资助国家：
美国
起止时间：
2019-03-15 至 2021-02-28
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1850208&HistoricalAwards=false
关键词：
CRII RI Low Bias Machine

项目摘要

Widely-deployed applications of language technology such as translation systems and smart assistants rely heavily on machine learning models for sentence understanding. These models learn to understand language from data, which can often be as simple as a collection of published books or a download of Wikipedia, rather than through any kind of manual engineering or hands-on guidance by linguistic expert. While modern machine learning methods are quite effective, they are not perfect. When they fail understand some text, it can be difficult to discover why, and even more difficult to craft interventions to address those failures. This CISE Research Initiation Initiative (CRII) project develops tools to help use methods and insights from research in linguistic science to analyze and refine machine learning systems for sentence understanding. The project should have a practical impact in making it easier to develop effective language technologies, a scientific impact in helping linguists use machine learning as a proxy to study human language learning, and a training impact in supporting several PhD students---through both research seminars and direct research collaborations---as they develop into experts in the interaction between linguistic science and language technology.The methods used in this project relies on the human ability to judge the grammatical acceptability of a sentence; i.e., to decide whether someone could ever use a given sequence of words to say something. The project has three parts: (1) to build a large acceptability-based dataset for English which evaluates machine learning systems on their linguistic knowledge; (2) to use this data to evaluate widely-used standard approaches to machine learning for language, with a focus on promising recent approaches that use artificial neural networks learn from plain text; and (3) to develop methods for using small custom datasets to directly repair any gaps in the knowledge that these machine learning models acquire. Analyzing and improving artificial neural networks is difficult, since their internal representations of language are continuous and at least superficially, their internal representations of language do not at all resemble the kinds of representations that linguists use to analyze language. The investigators' methods are designed to minimize this difficulty, which rely on converging evidence from multiple ways of using the same data in its experiments.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

广泛部署的语言技术应用，如翻译系统和智能助手，在很大程度上依赖于机器学习模型来理解句子。这些模型从数据中学习理解语言，这些数据通常可以像出版书籍的集合或维基百科的下载一样简单，而不是通过语言专家的任何手动工程或实践指导。虽然现代机器学习方法非常有效，但它们并不完美。当他们无法理解某些文本时，很难发现原因，甚至更难制定干预措施来解决这些失败。这个CISE研究启动计划（CRII）项目开发工具，以帮助使用语言科学研究的方法和见解来分析和改进机器学习系统以进行句子理解。该项目应该在使开发有效的语言技术变得更容易方面产生实际影响，在帮助语言学家使用机器学习作为研究人类语言学习的代理方面产生科学影响，以及通过研究研讨会和直接研究合作支持几名博士生的培训影响-当他们发展成为语言科学和语言技术之间相互作用的专家时。这个项目中使用的方法依赖于人类判断句子的语法可接受性的能力;也就是说，来决定一个人是否可以用一个给定的单词序列来表达某件事。该项目包括三个部分：（1）建立一个基于英语可接受性的大型数据集，用于评估机器学习系统的语言知识;（2）使用这些数据来评估广泛使用的语言机器学习标准方法，重点关注最近使用人工神经网络从纯文本学习的有前途的方法;以及（3）开发使用小型自定义数据集的方法，以直接修复这些机器学习模型获取的知识中的任何差距。分析和改进人工神经网络是困难的，因为它们的语言内部表示是连续的，至少在表面上，它们的语言内部表示与语言学家用来分析语言的表示完全不同。研究人员的方法旨在最大限度地减少这一困难，该方法依赖于在实验中使用相同数据的多种方式中收集的证据。该奖项反映了NSF的法定使命，并且通过使用基金会的知识价值进行评估，被认为值得支持和更广泛的影响审查标准。

项目成果

期刊论文数量（9）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

CAN NEURAL NETWORKS ACQUIRE A STRUCTURAL BIAS FROM RAW LINGUISTIC DATA?

神经网络可以从原始语言数据中获取结构偏差吗？

DOI：
发表时间：
2020
期刊：
Proceedings of the Annual Meeting of the Cognitive Science Society
影响因子：
0
作者：
Warstadt, Alex;Bowman, Samuel R.
通讯作者：
Bowman, Samuel R.

Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)

了解哪些特征很重要：RoBERTa（最终）获得了对语言概括的偏好

DOI：
10.18653/v1/2020.emnlp-main.16
发表时间：
2020
期刊：
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP
影响因子：
0
作者：
Warstadt, Alex;Zhang, Yian;Li, Xiaocheng;Liu, Haokun;Bowman, Samuel R.
通讯作者：
Bowman, Samuel R.

Neural Network Acceptability Judgments

DOI：
10.1162/tacl_a_00290
发表时间：
2019-01-01
期刊：
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
影响因子：
10.9
作者：
Warstadt, Alex;Singh, Amanpreet;Bowman, Samuel R.
通讯作者：
Bowman, Samuel R.

Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs

调查 BERT 的语言知识：使用 NPI 的五种分析方法

DOI：
10.18653/v1/d19-1286
发表时间：
2019
期刊：
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP
影响因子：
0
作者：
Warstadt, Alex;Cao, Yu;Grosu, Ioana;Peng, Wei;Blix, Hagen;Nie, Yining;Alsop, Anna;Bordia, Shikha;Liu, Haokun;Parrish, Alicia
通讯作者：
Parrish, Alicia

What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks?

DOI：
10.18653/v1/2021.acl-long.98
发表时间：
2021-06
期刊：
影响因子：
0
作者：
Nikita Nangia;Saku Sugawara;H. Trivedi;Alex Warstadt;Clara Vania;Sam Bowman
通讯作者：
Nikita Nangia;Saku Sugawara;H. Trivedi;Alex Warstadt;Clara Vania;Sam Bowman

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Samuel Bowman其他文献

FarFetched: An Entity-centric Approach for Reasoning on Textually Represented Environments

FarFetched：一种以实体为中心的文本表示环境推理方法

DOI：
发表时间：
2021
期刊：
影响因子：
0
作者：
Colin Raffel;Noam M. Shazeer;A. Roberts;K. Lee;Sharan Narang;Michael Matena;Yanqi;Wei Zhou;J. LiPeter;Liu. 2020;Exploring;Jim Webber;A. programmatic;Adina Williams;Nikita Nangia;Samuel Bowman
通讯作者：
Samuel Bowman

Reactive Transport and Péclet Number Analysis of Hydrogen Flux Pathways in Uniform Clay Matrix: Implications for Underground Storage

DOI：
10.1007/s11242-025-02200-5
发表时间：
2025-07-03
期刊：
TRANSPORT IN POROUS MEDIA
影响因子：
2.600
作者：
Samuel Bowman;Arkajyoti Pathak;Shikha Sharma
通讯作者：
Shikha Sharma

Effect of Ionic Strength on H2O and Si-Species Stability Field Geometry in pH-Eh Space

pH-Eh 空间中离子强度对 H2O 和 Si 物种稳定场几何形状的影响

DOI：
10.1007/s10498-023-09417-0
发表时间：
2023
期刊：
Aquatic Geochemistry
影响因子：
1.6
作者：
Samuel Bowman;Arkajyoti Pathak;V. Agrawal;Shikha Sharma
通讯作者：
Shikha Sharma

What Makes Machine Reading Comprehension Questions Difﬁcult? Investigating Variation in Passage Sources and Question Types

是什么让机器阅读理解问题难以调查文章来源和问题类型的变化？

DOI：
发表时间：
2021
期刊：
影响因子：
0
作者：
Susan Bartlett;Grzegorz Kondrak;Max Bartolo;Alastair Roberts;Johannes Welbl;Steven Bird;Ewan Klein;E. Loper;Samuel Bowman;George Dahl. 2021;What;Chao Pang;Junyuan Shang;Jiaxiang Liu;Xuyi Chen;Yanbin Zhao;Yuxiang Lu;Weixin Liu;Z. Wu;Weibao Gong;Jianzhong Liang;Zhizhou Shang;Peng Sun;Ouyang Xuan;Dianhai;Hao Tian;Hua Wu;Haifeng Wang;Adam Trischler;Tong Wang;Xingdi Yuan;Justin Har;Alessandro Sordoni;Philip Bachman;Adina Williams;Nikita Nangia;Zhilin Yang;Peng Qi;Saizheng Zhang;Y. Bengio;ing. In
通讯作者：
ing. In