III: EAGER: Automatically Building Test Collections Using Implicit Relevance Signals from the Web
III:EAGER:使用来自 Web 的隐式相关信号自动构建测试集合
基本信息
- 批准号:1147810
- 负责人:
- 金额:$ 15万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2011
- 资助国家:美国
- 起止时间:2011-09-01 至 2013-02-28
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Helping users find relevant information is undeniably an important problem vital to the functioning of today's information-based societies. It is therefore no surprise that millions of people worldwide make use of search engine technologies each and every day. Although existing search technologies work well, there is still considerable room for improvement. Search engine innovation is driven by the ability to rapidly, and repeatedly, measure the quality of the results produced by a given system. This type of measurement typically requires some form of human input. For example, a human expert may be hired to assess the relevance of search results, or the search engine may log user interactions, such as the queries entered and the results clicked. After a sufficiently large amount of data has been collected, it can then be used to accurately measure search engine quality. It can also be used to improve the quality of existing search engines via a process known as "tuning" or "training". However, gathering large amounts of this information typically requires a significant amount of human effort or computational resources. Therefore, sustained innovation is only possible at a very steep cost.Techniques for constructing large information retrieval test collections that require no human effort are the primary focus of this research study. Rather than relying on human-curated information, implicit relevance signals from the Web are mined to automatically construct large, reusable test collections for a variety of search tasks, including Web search, news search, and enterprise search. The observation that the Web contains a large number of implicit relevance signals is the starting point of the research. The simplest example of an implicit relevance signal is the hyperlink, which can be interpreted as a signal acknowledging the relevance of the target page by the source author. The hypothesis that such implicit relevance signals can be effectively mined and aggregated in a completely unsupervised manner to create test collections without any human effort is investigated in this research. Automatically generated test collections are evaluated in two different ways. First, the test collections are evaluated according to their ability to accurately measure the quality of search systems compared to human-generated test collections. Second, the quality of search engines tuned using the automated test collections are compared against engines tuned using manual test collections.The broader impact of this project is derived from automatically constructed test collections that are freely distributed to the broader research community. Advances in search engine technologies are expected as the result of increased availability of training data to systematically evaluate and tune search engines, both in industrial and academic settings. Additional broader impact is expected from the integration of research and education at both the graduate and undergraduate levels and from engaging women and underrepresented students through various outreach programs.
帮助用户找到相关信息无疑是一个重要问题,对当今信息社会的运作至关重要。 因此,全世界每天都有数百万人使用搜索引擎技术也就不足为奇了。虽然现有的搜索技术运行良好,但仍有相当大的改进空间。搜索引擎创新的驱动力是快速、重复地衡量给定系统产生的结果质量的能力。 这种类型的测量通常需要某种形式的人工输入。例如,可以雇用人类专家来评估搜索结果的相关性,或者搜索引擎可以记录用户交互,诸如输入的查询和点击的结果。在收集了足够多的数据之后,它可以用来准确地衡量搜索引擎的质量。它也可以用来提高现有的搜索引擎的质量,通过一个过程被称为“调整”或“训练”。然而,收集大量的这种信息通常需要大量的人力或计算资源。因此,持续的创新是唯一可能在一个非常陡峭的cost.Techniques构建大型信息检索测试集,不需要人力的努力是本研究的主要重点。而不是依赖于人类策划的信息,隐式的相关性信号从Web挖掘自动构建大型,可重用的测试集合的各种搜索任务,包括Web搜索,新闻搜索和企业搜索。 本研究的出发点是发现网络中存在大量的隐性关联信号。隐式相关性信号的最简单示例是超链接,其可以被解释为源作者确认目标页面的相关性的信号。在这项研究中,这种隐含的相关性信号可以有效地挖掘和聚合在一个完全无监督的方式来创建测试集合,而无需任何人力的假设进行了调查。 自动生成的测试集合以两种不同的方式进行评估。首先,测试集合是根据它们与人工生成的测试集合相比准确测量搜索系统质量的能力来评估的。其次,使用自动测试集合优化的搜索引擎的质量与使用手动测试集合优化的引擎进行比较。该项目的更广泛影响来自于自动构建的测试集合,这些测试集合免费分发给更广泛的研究社区。由于在工业和学术环境中有更多的培训数据可供系统地评估和调整搜索引擎,预计搜索引擎技术将取得进展。研究生和本科生两级的研究和教育一体化以及通过各种外联方案吸引妇女和代表性不足的学生,预计会产生更广泛的影响。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Eduard Hovy其他文献
A Framework for Effective Annotation of Information from Closed Captions Using Ontologies
- DOI:
10.1007/s10844-005-0188-9 - 发表时间:
2005-09-01 - 期刊:
- 影响因子:3.400
- 作者:
Latifur Khan;Dennis McLeod;Eduard Hovy - 通讯作者:
Eduard Hovy
A Sentiment Consolidation Framework for Meta-Review Generation
用于生成元评论的情绪巩固框架
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Miao Li;Jey Han Lau;Eduard Hovy - 通讯作者:
Eduard Hovy
ezCoref : A Scalable Approach for Collecting Crowdsourced Annotations for Coreference Resolution
ezCoref:一种收集众包注释以进行共指解析的可扩展方法
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
A. Crowdsourced;David Bamman;Olivia Lewke;Rachel Bawden;Rico Sennrich;Alexandra Birch;Ari Bornstein;Arie Cattan;Ido Dagan;Hong Chen;Zhenhua Fan;Hao Lu;Alan Yuille;Eduard Hovy;Mitch Marcus;M. Palmer;Lance;Rodney Huddleston. 2002;Frédéric Landragin;T. Poibeau;Bernard Vic;Belinda Z. Li;Gabriel Stanovsky;Robert L Logan;Andrew McCallum;Sameer Singh - 通讯作者:
Sameer Singh
What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions
您的数据对 GPT 有何价值?
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Sang Keun Choe;Hwijeen Ahn;Juhan Bae;Kewen Zhao;Minsoo Kang;Youngseog Chung;Adithya Pratapa;W. Neiswanger;Emma Strubell;Teruko Mitamura;Jeff Schneider;Eduard Hovy;Roger Grosse;Eric Xing - 通讯作者:
Eric Xing
Cooperative Semi-Supervised Transfer Learning of Machine Reading Comprehension
机器阅读理解的协作半监督迁移学习
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Oliver Bender;F. Och;Y. Bengio;Réjean Ducharme;Pascal Vincent;Kevin Clark;Quoc Minh;V. Le;J. Devlin;Ming;Kenton Lee;Adam Fisch;Alon Talmor;Robin Jia;Minjoon Seo;Michael R. Glass;A. Gliozzo;Rishav Chakravarti;Ian Goodfellow;Jean Pouget;Mehdi Mirza;Serhii Havrylov;Ivan Titov. 2017;Emergence;Jun;Jiatao Gu;Jiajun Shen;Marc’Aurelio;Matthew Henderson;I. Casanueva;Nikola Mrkˇsi´c;Pei;Tsung;Ivan Vuli´c;Yikang Shen;Yi Tay;Che Zheng;Dara Bahri;Donald;Metzler Aaron;Courville;Structformer;Ashish Vaswani;Noam M. Shazeer;Niki Parmar;Thomas Wolf;Lysandre Debut;Julien Victor Sanh;Clement Chaumond;Anthony Delangue;Pier;Tim ric Cistac;Rémi Rault;Morgan Louf;Qizhe Xie;Eduard Hovy;Silei Xu;Sina J. Semnani;Giovanni Campagna - 通讯作者:
Giovanni Campagna
Eduard Hovy的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Eduard Hovy', 18)}}的其他基金
EAGER: A Method to Retrieve Non-Textual Data from Widespread Repositories
EAGER:一种从广泛存储库中检索非文本数据的方法
- 批准号:
1450545 - 财政年份:2014
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
III: EAGER: Automatically Building Test Collections Using Implicit Relevance Signals from the Web
III:EAGER:使用来自 Web 的隐式相关信号自动构建测试集合
- 批准号:
1304939 - 财政年份:2012
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
EAGER: Constructing, Indexing, and Searching Super-Enriched Document Representations in the Cloud
EAGER:在云中构建、索引和搜索超级丰富的文档表示
- 批准号:
1265301 - 财政年份:2012
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
EAGER: Constructing, Indexing, and Searching Super-Enriched Document Representations in the Cloud
EAGER:在云中构建、索引和搜索超级丰富的文档表示
- 批准号:
1143703 - 财政年份:2011
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Collaborative Research III-COR: From a Pile of Documents to a Collection of Information: A Framework for Multi-Dimensional Text Analysis
协作研究III-COR:从一堆文档到信息集合:多维文本分析框架
- 批准号:
0705091 - 财政年份:2007
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Collaborative Research: Language Processing Technology for Electronic Rulemaking
合作研究:电子规则制定的语言处理技术
- 批准号:
0429360 - 财政年份:2004
- 资助金额:
$ 15万 - 项目类别:
Continuing Grant
Automating the Integration of EPA Databases
自动集成 EPA 数据库
- 批准号:
0306899 - 财政年份:2003
- 资助金额:
$ 15万 - 项目类别:
Continuing Grant
SGER COLLABORATIVE: A Testbed for eRulemaking Data
SGER Collaborative:电子规则制定数据的测试平台
- 批准号:
0328175 - 财政年份:2003
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Collaborative Research:Interlingual Annotation of Multilingual Text Corporation
合作研究:多语言文本公司的语间标注
- 批准号:
0325021 - 财政年份:2003
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
ITR: Information Discovery in Digital Government: Self-extending Topic Maps and Ontologies (GrowOnto)
ITR:数字政府中的信息发现:自扩展主题图和本体(GrowOnto)
- 批准号:
0205111 - 财政年份:2002
- 资助金额:
$ 15万 - 项目类别:
Continuing Grant
相似海外基金
EAGER: A Genome Wide HDR Enhancement Screen in Maize
EAGER:玉米全基因组 HDR 增强屏幕
- 批准号:
2409037 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Collaborative Research: EAGER: IMPRESS-U: Groundwater Resilience Assessment through iNtegrated Data Exploration for Ukraine (GRANDE-U)
合作研究:EAGER:IMPRESS-U:通过乌克兰综合数据探索进行地下水恢复力评估 (GRANDE-U)
- 批准号:
2409395 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
EAGER: Integrating Pathological Image and Biomedical Text Data for Clinical Outcome Prediction
EAGER:整合病理图像和生物医学文本数据进行临床结果预测
- 批准号:
2412195 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
EAGER: Generalizing Monin-Obukhov Similarity Theory (MOST)-based Surface Layer Parameterizations for Turbulence Resolving Earth System Models (ESMs)
EAGER:将基于 Monin-Obukhov 相似理论 (MOST) 的表面层参数化推广到湍流解析地球系统模型 (ESM)
- 批准号:
2414424 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
EAGER: Creating a Composite EL Nino Record from the Lowland Neotropics
EAGER:创造低地新热带区综合厄尔尼诺记录
- 批准号:
2417794 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
EAGER/Collaborative Research: An LLM-Powered Framework for G-Code Comprehension and Retrieval
EAGER/协作研究:LLM 支持的 G 代码理解和检索框架
- 批准号:
2347624 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
EAGER: Innovation in Society Study Group
EAGER:社会创新研究小组
- 批准号:
2348836 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
EAGER: Artificial Intelligence to Understand Engineering Cultural Norms
EAGER:人工智能理解工程文化规范
- 批准号:
2342384 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
EAGER/Collaborative Research: Revealing the Physical Mechanisms Underlying the Extraordinary Stability of Flying Insects
EAGER/合作研究:揭示飞行昆虫非凡稳定性的物理机制
- 批准号:
2344215 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Collaborative Research: EAGER: Designing Nanomaterials to Reveal the Mechanism of Single Nanoparticle Photoemission Intermittency
合作研究:EAGER:设计纳米材料揭示单纳米粒子光电发射间歇性机制
- 批准号:
2345581 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant