EAGER: Constructing, Indexing, and Searching Super-Enriched Document Representations in the Cloud
EAGER:在云中构建、索引和搜索超级丰富的文档表示
基本信息
- 批准号:1143703
- 负责人:
- 金额:$ 25万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2011
- 资助国家:美国
- 起止时间:2011-09-01 至 2012-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
There are billions of new digital documents created around the world every day. Examples include emails, blog posts, legal documents, and news articles. To enable effective information management, many of these documents are processed by information retrieval systems, such as desktop search tools or Web search engines. Most existing technologies represent documents digitally. To a computer, these representations are nothing more than a sequence of bits, completely devoid of any explicit meaning. Since most modern search engines utilize such basic representations, they often fail to properly account for the meaning of the words found in the documents, thereby diminishing the quality of their results. Despite the importance of this fundamental problem, there have been surprisingly few attempts to build, and subsequently search, document representations that encode the deeply rich meaning of text, especially for data sets that contain millions or billions of text documents.This research investigates how to automatically construct, index, and search next-generation super-enriched document representations. The approach relies on the careful integration of traditional text representations with natural language processing-based sources (e.g., named entities, synonyms, and paraphrases), rich knowledge sources (e.g., Wikipedia and Freebase), contextual sources, and other value-added sources of content. Constructing such representations for large document collections requires computationally intensive batch processing to mine, aggregate, and join data across disparate sources. To overcome these challenges, a scalable, massively distributed cloud computing solution is adopted. The resulting enriched document representations can be effectively applied to a wide variety of information retrieval, natural language processing, and data mining tasks.
世界各地每天都有数十亿份新的数字文档产生。示例包括电子邮件、博客文章、法律的文档和新闻文章。为了实现有效的信息管理,这些文件中有许多是由信息检索系统处理的,如桌面搜索工具或Web搜索引擎。大多数现有技术都以数字方式表示文档。对于计算机来说,这些表示只不过是一个比特序列,完全没有任何明确的意义。由于大多数现代搜索引擎都使用这种基本表示法,它们往往无法正确解释文档中单词的含义,从而降低了搜索结果的质量。尽管这个基本问题的重要性,有令人惊讶的是,很少有人尝试建立,并随后搜索,文档表示,编码的文本的深刻丰富的含义,特别是对于包含数百万或数十亿的文本documents.This研究的数据集,探讨如何自动构建,索引和搜索下一代超丰富的文档表示。该方法依赖于传统文本表示与基于自然语言处理的源(例如,命名实体、同义词和释义),丰富的知识源(例如,Wikipedia和Freebase)、上下文来源和其他增值内容来源。为大型文档集合构建这样的表示需要计算密集型批处理来挖掘、聚合和连接不同来源的数据。为了克服这些挑战,采用了可扩展的大规模分布式云计算解决方案。由此产生的丰富的文档表示可以有效地应用于各种各样的信息检索,自然语言处理和数据挖掘任务。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Eduard Hovy其他文献
A Framework for Effective Annotation of Information from Closed Captions Using Ontologies
- DOI:
10.1007/s10844-005-0188-9 - 发表时间:
2005-09-01 - 期刊:
- 影响因子:3.400
- 作者:
Latifur Khan;Dennis McLeod;Eduard Hovy - 通讯作者:
Eduard Hovy
A Sentiment Consolidation Framework for Meta-Review Generation
用于生成元评论的情绪巩固框架
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Miao Li;Jey Han Lau;Eduard Hovy - 通讯作者:
Eduard Hovy
ezCoref : A Scalable Approach for Collecting Crowdsourced Annotations for Coreference Resolution
ezCoref:一种收集众包注释以进行共指解析的可扩展方法
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
A. Crowdsourced;David Bamman;Olivia Lewke;Rachel Bawden;Rico Sennrich;Alexandra Birch;Ari Bornstein;Arie Cattan;Ido Dagan;Hong Chen;Zhenhua Fan;Hao Lu;Alan Yuille;Eduard Hovy;Mitch Marcus;M. Palmer;Lance;Rodney Huddleston. 2002;Frédéric Landragin;T. Poibeau;Bernard Vic;Belinda Z. Li;Gabriel Stanovsky;Robert L Logan;Andrew McCallum;Sameer Singh - 通讯作者:
Sameer Singh
What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions
您的数据对 GPT 有何价值?
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Sang Keun Choe;Hwijeen Ahn;Juhan Bae;Kewen Zhao;Minsoo Kang;Youngseog Chung;Adithya Pratapa;W. Neiswanger;Emma Strubell;Teruko Mitamura;Jeff Schneider;Eduard Hovy;Roger Grosse;Eric Xing - 通讯作者:
Eric Xing
Cooperative Semi-Supervised Transfer Learning of Machine Reading Comprehension
机器阅读理解的协作半监督迁移学习
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Oliver Bender;F. Och;Y. Bengio;Réjean Ducharme;Pascal Vincent;Kevin Clark;Quoc Minh;V. Le;J. Devlin;Ming;Kenton Lee;Adam Fisch;Alon Talmor;Robin Jia;Minjoon Seo;Michael R. Glass;A. Gliozzo;Rishav Chakravarti;Ian Goodfellow;Jean Pouget;Mehdi Mirza;Serhii Havrylov;Ivan Titov. 2017;Emergence;Jun;Jiatao Gu;Jiajun Shen;Marc’Aurelio;Matthew Henderson;I. Casanueva;Nikola Mrkˇsi´c;Pei;Tsung;Ivan Vuli´c;Yikang Shen;Yi Tay;Che Zheng;Dara Bahri;Donald;Metzler Aaron;Courville;Structformer;Ashish Vaswani;Noam M. Shazeer;Niki Parmar;Thomas Wolf;Lysandre Debut;Julien Victor Sanh;Clement Chaumond;Anthony Delangue;Pier;Tim ric Cistac;Rémi Rault;Morgan Louf;Qizhe Xie;Eduard Hovy;Silei Xu;Sina J. Semnani;Giovanni Campagna - 通讯作者:
Giovanni Campagna
Eduard Hovy的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Eduard Hovy', 18)}}的其他基金
EAGER: A Method to Retrieve Non-Textual Data from Widespread Repositories
EAGER:一种从广泛存储库中检索非文本数据的方法
- 批准号:
1450545 - 财政年份:2014
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
III: EAGER: Automatically Building Test Collections Using Implicit Relevance Signals from the Web
III:EAGER:使用来自 Web 的隐式相关信号自动构建测试集合
- 批准号:
1304939 - 财政年份:2012
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
EAGER: Constructing, Indexing, and Searching Super-Enriched Document Representations in the Cloud
EAGER:在云中构建、索引和搜索超级丰富的文档表示
- 批准号:
1265301 - 财政年份:2012
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
III: EAGER: Automatically Building Test Collections Using Implicit Relevance Signals from the Web
III:EAGER:使用来自 Web 的隐式相关信号自动构建测试集合
- 批准号:
1147810 - 财政年份:2011
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
Collaborative Research III-COR: From a Pile of Documents to a Collection of Information: A Framework for Multi-Dimensional Text Analysis
协作研究III-COR:从一堆文档到信息集合:多维文本分析框架
- 批准号:
0705091 - 财政年份:2007
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
Collaborative Research: Language Processing Technology for Electronic Rulemaking
合作研究:电子规则制定的语言处理技术
- 批准号:
0429360 - 财政年份:2004
- 资助金额:
$ 25万 - 项目类别:
Continuing Grant
Automating the Integration of EPA Databases
自动集成 EPA 数据库
- 批准号:
0306899 - 财政年份:2003
- 资助金额:
$ 25万 - 项目类别:
Continuing Grant
SGER COLLABORATIVE: A Testbed for eRulemaking Data
SGER Collaborative:电子规则制定数据的测试平台
- 批准号:
0328175 - 财政年份:2003
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
Collaborative Research:Interlingual Annotation of Multilingual Text Corporation
合作研究:多语言文本公司的语间标注
- 批准号:
0325021 - 财政年份:2003
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
ITR: Information Discovery in Digital Government: Self-extending Topic Maps and Ontologies (GrowOnto)
ITR:数字政府中的信息发现:自扩展主题图和本体(GrowOnto)
- 批准号:
0205111 - 财政年份:2002
- 资助金额:
$ 25万 - 项目类别:
Continuing Grant
相似海外基金
Constructing and Classifying Pre-Tannakian Categories
前坦纳克阶范畴的构建和分类
- 批准号:
2401515 - 财政年份:2024
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
CBOIAO: Constructing the 'Barbarian Other' in Attic Oratory of the Fourth Century B.C.E.
CBIOIAO:在公元前四世纪的阁楼演讲中构建“野蛮的他者”
- 批准号:
EP/Y025172/1 - 财政年份:2024
- 资助金额:
$ 25万 - 项目类别:
Fellowship
Voices from the Periphery: (De-)Constructing and Contesting Public Narratives about Post-Industrial Marginalization (VOICES)
来自外围的声音:关于后工业边缘化的公共叙事的(去)建构和争论(VOICES)
- 批准号:
AH/Y007603/1 - 财政年份:2024
- 资助金额:
$ 25万 - 项目类别:
Research Grant
Constructing a 1.5-million-year time series of magmatic and hydrothermal activity at the Juan de Fuca ridge
构建胡安德富卡海岭 150 万年的岩浆和热液活动时间序列
- 批准号:
2323102 - 财政年份:2024
- 资助金额:
$ 25万 - 项目类别:
Continuing Grant
Constructing Valid, Equitable, and Flexible Kinematics and Dynamics Assessment Scales with Evidence-Centered Design
通过以证据为中心的设计构建有效、公平、灵活的运动学和动力学评估量表
- 批准号:
2235595 - 财政年份:2023
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
Constructing a framework for early childhood teachers' cultural wellbeing
构建幼儿教师文化福祉框架
- 批准号:
DE230100691 - 财政年份:2023
- 资助金额:
$ 25万 - 项目类别:
Discovery Early Career Researcher Award
An investigation into the process of constructing the social problems surrounding tattoos in Japan, focusing on its discourse as a fashion
调查日本纹身社会问题的构建过程,重点关注纹身作为一种时尚的话语
- 批准号:
23K12596 - 财政年份:2023
- 资助金额:
$ 25万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Constructing a causal model of ADHD from an embodied perspective
从具身视角构建ADHD因果模型
- 批准号:
23KJ0064 - 财政年份:2023
- 资助金额:
$ 25万 - 项目类别:
Grant-in-Aid for JSPS Fellows
Creating harmonised and scalable methods and tools for constructing households in large diverse administrative and health research datasets
创建统一且可扩展的方法和工具,用于在大型多样化的行政和健康研究数据集中构建家庭
- 批准号:
ES/X00046X/1 - 财政年份:2023
- 资助金额:
$ 25万 - 项目类别:
Research Grant
Glass Beads in the Boundary Region of Japan -Fundamental Research for Constructing the History of Japanese Glass-
日本边境地区的玻璃珠 -构建日本玻璃史的基础研究-
- 批准号:
23K00955 - 财政年份:2023
- 资助金额:
$ 25万 - 项目类别:
Grant-in-Aid for Scientific Research (C)