权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

III: EAGER: Automatically Building Test Collections Using Implicit Relevance Signals from the Web

III：EAGER：使用来自 Web 的隐式相关信号自动构建测试集合

基本信息

批准号：
1147810
负责人：
Eduard Hovy
金额：
$ 15万
依托单位：
University of Southern California
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2011
资助国家：
美国
起止时间：
2011-09-01 至 2013-02-28
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1147810&HistoricalAwards=false
关键词：
III EAGER Automatically Building Test

项目摘要

Helping users find relevant information is undeniably an important problem vital to the functioning of today's information-based societies. It is therefore no surprise that millions of people worldwide make use of search engine technologies each and every day. Although existing search technologies work well, there is still considerable room for improvement. Search engine innovation is driven by the ability to rapidly, and repeatedly, measure the quality of the results produced by a given system. This type of measurement typically requires some form of human input. For example, a human expert may be hired to assess the relevance of search results, or the search engine may log user interactions, such as the queries entered and the results clicked. After a sufficiently large amount of data has been collected, it can then be used to accurately measure search engine quality. It can also be used to improve the quality of existing search engines via a process known as "tuning" or "training". However, gathering large amounts of this information typically requires a significant amount of human effort or computational resources. Therefore, sustained innovation is only possible at a very steep cost.Techniques for constructing large information retrieval test collections that require no human effort are the primary focus of this research study. Rather than relying on human-curated information, implicit relevance signals from the Web are mined to automatically construct large, reusable test collections for a variety of search tasks, including Web search, news search, and enterprise search. The observation that the Web contains a large number of implicit relevance signals is the starting point of the research. The simplest example of an implicit relevance signal is the hyperlink, which can be interpreted as a signal acknowledging the relevance of the target page by the source author. The hypothesis that such implicit relevance signals can be effectively mined and aggregated in a completely unsupervised manner to create test collections without any human effort is investigated in this research. Automatically generated test collections are evaluated in two different ways. First, the test collections are evaluated according to their ability to accurately measure the quality of search systems compared to human-generated test collections. Second, the quality of search engines tuned using the automated test collections are compared against engines tuned using manual test collections.The broader impact of this project is derived from automatically constructed test collections that are freely distributed to the broader research community. Advances in search engine technologies are expected as the result of increased availability of training data to systematically evaluate and tune search engines, both in industrial and academic settings. Additional broader impact is expected from the integration of research and education at both the graduate and undergraduate levels and from engaging women and underrepresented students through various outreach programs.

帮助用户找到相关信息无疑是一个重要问题，对当今信息社会的运作至关重要。因此，全世界每天都有数百万人使用搜索引擎技术也就不足为奇了。虽然现有的搜索技术运行良好，但仍有相当大的改进空间。搜索引擎创新的驱动力是快速、重复地衡量给定系统产生的结果质量的能力。这种类型的测量通常需要某种形式的人工输入。例如，可以雇用人类专家来评估搜索结果的相关性，或者搜索引擎可以记录用户交互，诸如输入的查询和点击的结果。在收集了足够多的数据之后，它可以用来准确地衡量搜索引擎的质量。它也可以用来提高现有的搜索引擎的质量，通过一个过程被称为“调整”或“训练”。然而，收集大量的这种信息通常需要大量的人力或计算资源。因此，持续的创新是唯一可能在一个非常陡峭的cost.Techniques构建大型信息检索测试集，不需要人力的努力是本研究的主要重点。而不是依赖于人类策划的信息，隐式的相关性信号从Web挖掘自动构建大型，可重用的测试集合的各种搜索任务，包括Web搜索，新闻搜索和企业搜索。本研究的出发点是发现网络中存在大量的隐性关联信号。隐式相关性信号的最简单示例是超链接，其可以被解释为源作者确认目标页面的相关性的信号。在这项研究中，这种隐含的相关性信号可以有效地挖掘和聚合在一个完全无监督的方式来创建测试集合，而无需任何人力的假设进行了调查。自动生成的测试集合以两种不同的方式进行评估。首先，测试集合是根据它们与人工生成的测试集合相比准确测量搜索系统质量的能力来评估的。其次，使用自动测试集合优化的搜索引擎的质量与使用手动测试集合优化的引擎进行比较。该项目的更广泛影响来自于自动构建的测试集合，这些测试集合免费分发给更广泛的研究社区。由于在工业和学术环境中有更多的培训数据可供系统地评估和调整搜索引擎，预计搜索引擎技术将取得进展。研究生和本科生两级的研究和教育一体化以及通过各种外联方案吸引妇女和代表性不足的学生，预计会产生更广泛的影响。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Eduard Hovy其他文献

A Framework for Effective Annotation of Information from Closed Captions Using Ontologies

DOI：
10.1007/s10844-005-0188-9
发表时间：
2005-09-01
期刊：
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS
影响因子：
3.400
作者：
Latifur Khan;Dennis McLeod;Eduard Hovy
通讯作者：
Eduard Hovy

A Sentiment Consolidation Framework for Meta-Review Generation

用于生成元评论的情绪巩固框架

DOI：
发表时间：
2024
期刊：
影响因子：
0
作者：
Miao Li;Jey Han Lau;Eduard Hovy
通讯作者：
Eduard Hovy

ezCoref : A Scalable Approach for Collecting Crowdsourced Annotations for Coreference Resolution

ezCoref：一种收集众包注释以进行共指解析的可扩展方法

DOI：
发表时间：
2022
期刊：
影响因子：
0
作者：
A. Crowdsourced;David Bamman;Olivia Lewke;Rachel Bawden;Rico Sennrich;Alexandra Birch;Ari Bornstein;Arie Cattan;Ido Dagan;Hong Chen;Zhenhua Fan;Hao Lu;Alan Yuille;Eduard Hovy;Mitch Marcus;M. Palmer;Lance;Rodney Huddleston. 2002;Frédéric Landragin;T. Poibeau;Bernard Vic;Belinda Z. Li;Gabriel Stanovsky;Robert L Logan;Andrew McCallum;Sameer Singh
通讯作者：
Sameer Singh

What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions

您的数据对 GPT 有何价值？

DOI：
发表时间：
2024
期刊：
arXiv.org
影响因子：
0
作者：
Sang Keun Choe;Hwijeen Ahn;Juhan Bae;Kewen Zhao;Minsoo Kang;Youngseog Chung;Adithya Pratapa;W. Neiswanger;Emma Strubell;Teruko Mitamura;Jeff Schneider;Eduard Hovy;Roger Grosse;Eric Xing
通讯作者：
Eric Xing

Cooperative Semi-Supervised Transfer Learning of Machine Reading Comprehension

机器阅读理解的协作半监督迁移学习

DOI：
发表时间：
2021
期刊：
影响因子：
0
作者：
Oliver Bender;F. Och;Y. Bengio;Réjean Ducharme;Pascal Vincent;Kevin Clark;Quoc Minh;V. Le;J. Devlin;Ming;Kenton Lee;Adam Fisch;Alon Talmor;Robin Jia;Minjoon Seo;Michael R. Glass;A. Gliozzo;Rishav Chakravarti;Ian Goodfellow;Jean Pouget;Mehdi Mirza;Serhii Havrylov;Ivan Titov. 2017;Emergence;Jun;Jiatao Gu;Jiajun Shen;Marc’Aurelio;Matthew Henderson;I. Casanueva;Nikola Mrkˇsi´c;Pei;Tsung;Ivan Vuli´c;Yikang Shen;Yi Tay;Che Zheng;Dara Bahri;Donald;Metzler Aaron;Courville;Structformer;Ashish Vaswani;Noam M. Shazeer;Niki Parmar;Thomas Wolf;Lysandre Debut;Julien Victor Sanh;Clement Chaumond;Anthony Delangue;Pier;Tim ric Cistac;Rémi Rault;Morgan Louf;Qizhe Xie;Eduard Hovy;Silei Xu;Sina J. Semnani;Giovanni Campagna
通讯作者：
Giovanni Campagna