权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

EAGER: Constructing, Indexing, and Searching Super-Enriched Document Representations in the Cloud

EAGER：在云中构建、索引和搜索超级丰富的文档表示

基本信息

批准号：
1143703
负责人：
Eduard Hovy
金额：
$ 25万
依托单位：
University of Southern California
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2011
资助国家：
美国
起止时间：
2011-09-01 至 2012-12-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1143703&HistoricalAwards=false
关键词：
EAGER Constructing Indexing Searching Super

项目摘要

There are billions of new digital documents created around the world every day. Examples include emails, blog posts, legal documents, and news articles. To enable effective information management, many of these documents are processed by information retrieval systems, such as desktop search tools or Web search engines. Most existing technologies represent documents digitally. To a computer, these representations are nothing more than a sequence of bits, completely devoid of any explicit meaning. Since most modern search engines utilize such basic representations, they often fail to properly account for the meaning of the words found in the documents, thereby diminishing the quality of their results. Despite the importance of this fundamental problem, there have been surprisingly few attempts to build, and subsequently search, document representations that encode the deeply rich meaning of text, especially for data sets that contain millions or billions of text documents.This research investigates how to automatically construct, index, and search next-generation super-enriched document representations. The approach relies on the careful integration of traditional text representations with natural language processing-based sources (e.g., named entities, synonyms, and paraphrases), rich knowledge sources (e.g., Wikipedia and Freebase), contextual sources, and other value-added sources of content. Constructing such representations for large document collections requires computationally intensive batch processing to mine, aggregate, and join data across disparate sources. To overcome these challenges, a scalable, massively distributed cloud computing solution is adopted. The resulting enriched document representations can be effectively applied to a wide variety of information retrieval, natural language processing, and data mining tasks.

世界各地每天都有数十亿份新的数字文档产生。示例包括电子邮件、博客文章、法律的文档和新闻文章。为了实现有效的信息管理，这些文件中有许多是由信息检索系统处理的，如桌面搜索工具或Web搜索引擎。大多数现有技术都以数字方式表示文档。对于计算机来说，这些表示只不过是一个比特序列，完全没有任何明确的意义。由于大多数现代搜索引擎都使用这种基本表示法，它们往往无法正确解释文档中单词的含义，从而降低了搜索结果的质量。尽管这个基本问题的重要性，有令人惊讶的是，很少有人尝试建立，并随后搜索，文档表示，编码的文本的深刻丰富的含义，特别是对于包含数百万或数十亿的文本documents.This研究的数据集，探讨如何自动构建，索引和搜索下一代超丰富的文档表示。该方法依赖于传统文本表示与基于自然语言处理的源（例如，命名实体、同义词和释义），丰富的知识源（例如，Wikipedia和Freebase）、上下文来源和其他增值内容来源。为大型文档集合构建这样的表示需要计算密集型批处理来挖掘、聚合和连接不同来源的数据。为了克服这些挑战，采用了可扩展的大规模分布式云计算解决方案。由此产生的丰富的文档表示可以有效地应用于各种各样的信息检索，自然语言处理和数据挖掘任务。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Eduard Hovy其他文献

A Framework for Effective Annotation of Information from Closed Captions Using Ontologies

DOI：
10.1007/s10844-005-0188-9
发表时间：
2005-09-01
期刊：
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS
影响因子：
3.400
作者：
Latifur Khan;Dennis McLeod;Eduard Hovy
通讯作者：
Eduard Hovy

A Sentiment Consolidation Framework for Meta-Review Generation

用于生成元评论的情绪巩固框架

DOI：
发表时间：
2024
期刊：
影响因子：
0
作者：
Miao Li;Jey Han Lau;Eduard Hovy
通讯作者：
Eduard Hovy

ezCoref : A Scalable Approach for Collecting Crowdsourced Annotations for Coreference Resolution

ezCoref：一种收集众包注释以进行共指解析的可扩展方法

DOI：
发表时间：
2022
期刊：
影响因子：
0
作者：
A. Crowdsourced;David Bamman;Olivia Lewke;Rachel Bawden;Rico Sennrich;Alexandra Birch;Ari Bornstein;Arie Cattan;Ido Dagan;Hong Chen;Zhenhua Fan;Hao Lu;Alan Yuille;Eduard Hovy;Mitch Marcus;M. Palmer;Lance;Rodney Huddleston. 2002;Frédéric Landragin;T. Poibeau;Bernard Vic;Belinda Z. Li;Gabriel Stanovsky;Robert L Logan;Andrew McCallum;Sameer Singh
通讯作者：
Sameer Singh

What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions

您的数据对 GPT 有何价值？

DOI：
发表时间：
2024
期刊：
arXiv.org
影响因子：
0
作者：
Sang Keun Choe;Hwijeen Ahn;Juhan Bae;Kewen Zhao;Minsoo Kang;Youngseog Chung;Adithya Pratapa;W. Neiswanger;Emma Strubell;Teruko Mitamura;Jeff Schneider;Eduard Hovy;Roger Grosse;Eric Xing
通讯作者：
Eric Xing

Cooperative Semi-Supervised Transfer Learning of Machine Reading Comprehension

机器阅读理解的协作半监督迁移学习

DOI：
发表时间：
2021
期刊：
影响因子：
0
作者：
Oliver Bender;F. Och;Y. Bengio;Réjean Ducharme;Pascal Vincent;Kevin Clark;Quoc Minh;V. Le;J. Devlin;Ming;Kenton Lee;Adam Fisch;Alon Talmor;Robin Jia;Minjoon Seo;Michael R. Glass;A. Gliozzo;Rishav Chakravarti;Ian Goodfellow;Jean Pouget;Mehdi Mirza;Serhii Havrylov;Ivan Titov. 2017;Emergence;Jun;Jiatao Gu;Jiajun Shen;Marc’Aurelio;Matthew Henderson;I. Casanueva;Nikola Mrkˇsi´c;Pei;Tsung;Ivan Vuli´c;Yikang Shen;Yi Tay;Che Zheng;Dara Bahri;Donald;Metzler Aaron;Courville;Structformer;Ashish Vaswani;Noam M. Shazeer;Niki Parmar;Thomas Wolf;Lysandre Debut;Julien Victor Sanh;Clement Chaumond;Anthony Delangue;Pier;Tim ric Cistac;Rémi Rault;Morgan Louf;Qizhe Xie;Eduard Hovy;Silei Xu;Sina J. Semnani;Giovanni Campagna
通讯作者：
Giovanni Campagna