A framework for evaluating and explaining the robustness of NLP models

评估和解释 NLP 模型稳健性的框架

基本信息

  • 批准号:
    EP/X04162X/1
  • 负责人:
  • 金额:
    $ 40.55万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2024
  • 资助国家:
    英国
  • 起止时间:
    2024 至 无数据
  • 项目状态:
    未结题

项目摘要

The standard practice for evaluating the generalisation of supervised machine learning models in NLP tasks is to use previously unseen (i.e. held-out) data and report the performance on it using various metrics such as accuracy. Whilst metrics reported on held-out data summarise a model's performance, ultimately these results represent aggregate statistics on benchmarks and do not reflect the nuances in model behaviour and robustness when applied in real-world systems.We propose a robustness evaluation framework for NLP models concerned with arguments and facts, which encompasses explanations for robustness failures to support systematic and efficient evaluation. We will develop novel methods for simulating real-world texts stemming from existing datasets, to help evaluate the stability and consistency of models when deployed in the wild. The simulation methods will be used to challenge NLP models through text-based transformations and distribution shifts on datasets as well as on data sub-sets that capture linguistic patterns, to provide a systematic coverage of real-world linguistic phenomena. Furthermore, our framework will shed insights into a model's robustness by generating explanations for robustness failures along the lexical, morphological, and syntactic dimensions, extracted from the various dataset simulations and data sub-sets, thus departing from current approaches that solely provide a metric to quantify robustness. We will focus on two NLP research areas, argument mining and fact verification, however, several simulation methods and the robustness explanations are also scalable to other NLP tasks.
评估 NLP 任务中监督机器学习模型泛化的标准做法是使用以前未见过的(即保留的)数据,并使用各种指标(例如准确性)报告其性能。虽然报告的保留数据指标总结了模型的性能,但最终这些结果代表了基准的汇总统计数据,并不能反映模型行为和鲁棒性在现实世界系统中应用时的细微差别。我们为涉及论点和事实的 NLP 模型提出了鲁棒性评估框架,其中包含对鲁棒性失败的解释,以支持系统和有效的评估。我们将开发新的方法来模拟来自现有数据集的现实世界文本,以帮助评估模型在野外部署时的稳定性和一致性。模拟方法将用于通过基于文本的转换和数据集以及捕获语言模式的数据子集的分布变化来挑战 NLP 模型,以提供对现实世界语言现象的系统覆盖。此外,我们的框架将通过从各种数据集模拟和数据子集中提取的词汇、形态和句法维度生成对鲁棒性失败的解释,从而深入了解模型的鲁棒性,从而偏离当前仅提供量化鲁棒性指标的方法。我们将重点关注两个 NLP 研究领域,即论证挖掘和事实验证,但是,一些模拟方法和鲁棒性解释也可以扩展到其他 NLP 任务。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Oana Cocarascu其他文献

Argumentative review aggregation and dialogical explanations
论证性评论聚合与对话式解释
  • DOI:
    10.1016/j.artint.2025.104291
  • 发表时间:
    2025-03-01
  • 期刊:
  • 影响因子:
    4.600
  • 作者:
    Antonio Rago;Oana Cocarascu;Joel Oksanen;Francesca Toni
  • 通讯作者:
    Francesca Toni

Oana Cocarascu的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似海外基金

Collaborative Research: Bubble Trouble - Re-evaluating olivine melt inclusion barometry and trace-element geochemistry in the Cascades
合作研究:气泡麻烦 - 重新评估喀斯喀特橄榄石熔体包裹体气压和微量元素地球化学
  • 批准号:
    2342155
  • 财政年份:
    2024
  • 资助金额:
    $ 40.55万
  • 项目类别:
    Standard Grant
Collaborative Research: Bubble Trouble - Re-evaluating olivine melt inclusion barometry and trace-element geochemistry in the Cascades
合作研究:气泡麻烦 - 重新评估喀斯喀特橄榄石熔体包裹体气压和微量元素地球化学
  • 批准号:
    2342156
  • 财政年份:
    2024
  • 资助金额:
    $ 40.55万
  • 项目类别:
    Standard Grant
CAREER: Evaluating Cooperative Intelligence in Connected Communities
职业:评估互联社区中的合作智能
  • 批准号:
    2339497
  • 财政年份:
    2024
  • 资助金额:
    $ 40.55万
  • 项目类别:
    Continuing Grant
A Novel Surrogate Framework for evaluating THM Properties of Bentonite
评估膨润土 THM 性能的新型替代框架
  • 批准号:
    DP240102053
  • 财政年份:
    2024
  • 资助金额:
    $ 40.55万
  • 项目类别:
    Discovery Projects
Evaluating the Impact and Efficiency of Engineering the Ocean to Remove CO2
评估海洋工程去除二氧化碳的影响和效率
  • 批准号:
    DE240100115
  • 财政年份:
    2024
  • 资助金额:
    $ 40.55万
  • 项目类别:
    Discovery Early Career Researcher Award
From lymphatics to evaluating resolution therapeutics in clinical trials
从淋巴管到评估临床试验中的解决疗法
  • 批准号:
    MR/Y013050/1
  • 财政年份:
    2024
  • 资助金额:
    $ 40.55万
  • 项目类别:
    Fellowship
Should infant formula be available at UK food banks? Evaluating different pathways to ensuring parents in financial crisis can access infant formula.
英国食品银行应该提供婴儿配方奶粉吗?
  • 批准号:
    MR/Z503575/1
  • 财政年份:
    2024
  • 资助金额:
    $ 40.55万
  • 项目类别:
    Research Grant
Evaluating the effectiveness and sustainability of integrating helminth control with seasonal malaria chemoprevention in West African children
评估西非儿童蠕虫控制与季节性疟疾化学预防相结合的有效性和可持续性
  • 批准号:
    MR/X023133/1
  • 财政年份:
    2024
  • 资助金额:
    $ 40.55万
  • 项目类别:
    Fellowship
Towards Evaluating and Managing Risks Associated with Legacy Wells and Offshore Gas Storage in Scotland
评估和管理与苏格兰传统油井和海上天然气储存相关的风险
  • 批准号:
    2902920
  • 财政年份:
    2024
  • 资助金额:
    $ 40.55万
  • 项目类别:
    Studentship
Evaluating the delivery of whole exome sequencing for patients with muscle diseases in Latin America. Learning from collaborative experiences-Lat SEQ+
评估对拉丁美洲肌肉疾病患者进行全外显子组测序的情况。
  • 批准号:
    MR/X030911/1
  • 财政年份:
    2024
  • 资助金额:
    $ 40.55万
  • 项目类别:
    Research Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了