Doctoral Dissertation Research: Evaluating the Promise and Pitfalls of Benchmarking in Machine Learning Research

博士论文研究:评估机器学习研究中基准测试的前景和陷阱

基本信息

  • 批准号:
    2124685
  • 负责人:
  • 金额:
    $ 2万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-08-01 至 2023-07-31
  • 项目状态:
    已结题

项目摘要

This award is funded in whole or in part under the American Rescue Plan Act of 2021 (Public Law 117-2).The scientific and commercial success of machine learning (ML) has spurred government and corporate sponsors to invest billions of dollars in machine learning research. Despite this massive investment, there is limited quantitative research on how the ML field measures progress: a process called “benchmarking.” Benchmarking is the act of comparing algorithms on a quantitative metric after training them on the same benchmark dataset. Benchmarks organize ML researchers around common tasks. Achieving “state of the art” performance on an important benchmark can spark new research trajectories and advance careers: consider the 2012 success of “AlexNet” in a prominent computer vision task, which helped to launch current interest in deep learning. However, the practice of benchmarking has already engendered criticism that this near-ubiquitous research culture does not push the field towards socially beneficial outcomes, and leads to overinvestment in methods that maximize performance on academic datasets but are environmentally unsustainable or harm the public when used in the real world. This dissertation research will provide a comprehensive analysis of the strengths and weaknesses of benchmarking practices with respect to several public aims: accelerating innovation in science, increasing equity within the field, and promoting ethical research (i.e., an orientation toward research that benefits society and avoids harms). By blending sociological analysis, computational methods for extracting and analyzing benchmarking data from thousands of papers, and in-depth qualitative interviews, this research will produce an understanding of benchmarking culture in ML research that combines breadth and quantitative rigor with depth and interpretive nuance. This project has significant implications for government and corporate funders, researchers, and society more broadly. The dissertation consists of three subprojects. The first subproject explores evidence that benchmarking culture has stymied innovation by favoring utilization of the same datasets across multiple tasks and by incentivizing researchers to underinvest on nascent benchmarks and overinvest on mature ones. The second subproject explores how patterns in the adoption of benchmarks and rewards for state-of-the-art performance interact with status and resources to create inequities in the field. It tests the hypothesis that high-status researchers and institutions have disproportionate power to set the field’s research agenda by introducing benchmarks, while garnering disproportionate citations for state-of-the-art achievements. Both of these phenomena have the potential to create a “Matthew Effect” that disadvantages under-represented and under-resourced researchers/institutions. These subprojects use network science, natural language processing, and manual coding to create a large dataset of benchmarks and progress on those benchmarks across multiple ML task communities. The third subproject consists of qualitative interviews with ML researchers across career stages and expertise to gain first-hand perspectives on benchmarking culture and assess reforms to improve research ethics and societal outcomes.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
该奖项全部或部分由2021年美国救援计划法案(公法117-2)资助。机器学习(ML)在科学和商业上的成功促使政府和企业赞助商在机器学习研究上投入数十亿美元。尽管有如此大规模的投资,但关于ML领域如何衡量进展的定量研究有限:这是一个被称为“基准测试”的过程。基准测试是在相同的基准数据集上训练算法后,在定量指标上比较算法的行为。基准将ML研究人员围绕常见任务组织起来。在一个重要的基准上实现“最先进”的性能可以激发新的研究轨迹并推动职业发展:考虑2012年“AlexNet”在一个突出的计算机视觉任务中的成功,这有助于激发当前对深度学习的兴趣。然而,基准测试的做法已经引起了批评,这种几乎无处不在的研究文化并没有推动该领域走向社会有益的结果,并导致过度投资于在学术数据集上最大化性能的方法,但在环境上不可持续或在真实的世界中使用时伤害公众。本论文的研究将提供一个基准实践的优势和劣势方面的几个公共目标的全面分析:加速科学创新,增加领域内的公平,并促进伦理研究(即,有利于社会而避免危害的研究方向)。通过融合社会学分析、从数千篇论文中提取和分析基准数据的计算方法以及深入的定性访谈,本研究将对ML研究中的基准文化产生理解,将广度和定量严谨性与深度和解释性细微差别相结合。该项目对政府和企业资助者,研究人员和更广泛的社会具有重要意义。论文由三个子项目组成。第一个子项目探讨了基准文化阻碍创新的证据,因为它倾向于在多个任务中使用相同的数据集,并激励研究人员在新生基准上投资不足,而在成熟基准上投资过度。第二个分项目探讨采用基准和奖励最新业绩的模式如何与地位和资源相互作用,在外地造成不平等。它检验了一个假设,即高地位的研究人员和机构有不成比例的权力,通过引入基准来设定该领域的研究议程,同时为最先进的成就获得不成比例的引用。这两种现象都有可能造成“马太效应”,使代表性不足和资源不足的研究人员/机构处于不利地位。这些子项目使用网络科学、自然语言处理和手动编码来创建一个大型的基准数据集,并在多个ML任务社区的基准上取得进展。第三个子项目包括对ML研究人员的职业生涯阶段和专业知识的定性访谈,以获得对基准文化的第一手观点,并评估改革以改善研究道德和社会成果。该奖项反映了NSF的法定使命,并被认为值得通过使用基金会的知识价值和更广泛的影响审查标准进行评估来支持。

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research
  • DOI:
  • 发表时间:
    2021-12
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Bernard Koch;Emily L. Denton;A. Hanna;J. Foster
  • 通讯作者:
    Bernard Koch;Emily L. Denton;A. Hanna;J. Foster
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Jacob Foster其他文献

A Decade of Police Use of Deadly Force Research (2011–2020)
警察使用致命武力研究十年(2011-2020)
  • DOI:
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    1.5
  • 作者:
    Daniela Oramas Mora;William Terrill;Jacob Foster
  • 通讯作者:
    Jacob Foster
Businesses, Places, and Homicide: A Preliminary Empirical Examination
企业、场所和凶杀案:初步实证检验
  • DOI:
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    1.6
  • 作者:
    David C. Lane;K. Williams;Jacob Foster
  • 通讯作者:
    Jacob Foster
Histiocytoid cardiomyopathy presenting as sudden death in an 18-month-old infant.
组织细胞样心肌病表现为 18 个月大婴儿猝死。
The "autopsy" enigma: etymology, related terms and unambiguous alternatives.
Going above and beyond: assessing the characteristics of officers who complete additional in-service training
超越:评估完成额外在职培训的官员的特征
  • DOI:
    10.1080/15614263.2022.2152028
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    1.8
  • 作者:
    Logan J. Somers;Jacob Foster
  • 通讯作者:
    Jacob Foster

Jacob Foster的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似海外基金

Doctoral Dissertation Research: How New Legal Doctrine Shapes Human-Environment Relations
博士论文研究:新法律学说如何塑造人类与环境的关系
  • 批准号:
    2315219
  • 财政年份:
    2024
  • 资助金额:
    $ 2万
  • 项目类别:
    Standard Grant
Doctoral Dissertation Research: Determinants of social meaning
博士论文研究:社会意义的决定因素
  • 批准号:
    2336572
  • 财政年份:
    2024
  • 资助金额:
    $ 2万
  • 项目类别:
    Standard Grant
Doctoral Dissertation Research: Assessing the chewing function of the hyoid bone and the suprahyoid muscles in primates
博士论文研究:评估灵长类动物舌骨和舌骨上肌的咀嚼功能
  • 批准号:
    2337428
  • 财政年份:
    2024
  • 资助金额:
    $ 2万
  • 项目类别:
    Standard Grant
Doctoral Dissertation Research: Aspect and Event Cognition in the Acquisition and Processing of a Second Language
博士论文研究:第二语言习得和处理中的方面和事件认知
  • 批准号:
    2337763
  • 财政年份:
    2024
  • 资助金额:
    $ 2万
  • 项目类别:
    Standard Grant
Doctoral Dissertation Research: Renewable Energy Transition and Economic Growth
博士论文研究:可再生能源转型与经济增长
  • 批准号:
    2342813
  • 财政年份:
    2024
  • 资助金额:
    $ 2万
  • 项目类别:
    Standard Grant
Doctoral Dissertation Research: Do social environments influence the timing of male maturation in a close human relative?
博士论文研究:社会环境是否影响人类近亲的男性成熟时间?
  • 批准号:
    2341354
  • 财政年份:
    2024
  • 资助金额:
    $ 2万
  • 项目类别:
    Standard Grant
Doctoral Dissertation Research Improvement Grant: Biobanking, Epistemic Infrastructure, and the Lifecycle of Genomic Data
博士论文研究改进补助金:生物样本库、认知基础设施和基因组数据的生命周期
  • 批准号:
    2341622
  • 财政年份:
    2024
  • 资助金额:
    $ 2万
  • 项目类别:
    Standard Grant
Doctoral Dissertation Research: Obstetric constraints on neurocranial shape in nonhuman primates
博士论文研究:非人类灵长类动物神经颅骨形状的产科限制
  • 批准号:
    2341137
  • 财政年份:
    2024
  • 资助金额:
    $ 2万
  • 项目类别:
    Standard Grant
Doctoral Dissertation Research: Human mobility and infectious disease transmission in the context of market integration
博士论文研究:市场一体化背景下的人员流动与传染病传播
  • 批准号:
    2341234
  • 财政年份:
    2024
  • 资助金额:
    $ 2万
  • 项目类别:
    Standard Grant
Doctoral Dissertation Research: Assessing the physiological consequences of diet and environment for gorillas in zoological settings
博士论文研究:评估动物环境中大猩猩饮食和环境的生理后果
  • 批准号:
    2341433
  • 财政年份:
    2024
  • 资助金额:
    $ 2万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了