From Text Corpora to Text Databases: Research in Text Processing and Retrieval
从文本语料库到文本数据库:文本处理与检索研究
基本信息
- 批准号:9302615
- 负责人:
- 金额:$ 20.35万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:1993
- 资助国家:美国
- 起止时间:1993-08-01 至 1997-01-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
9302615 Strzalkowski From Text Corpora to Text Databases: Research in Text Processing and Retrieval This is the first year funding of a three-year continuing award. The goal of this research is to explore the potential of natural language processing in automated information retrieval from large, minimally structured text libraries. This effort includes development of more effective techniques of indexing, routing, contents approximation, abstracting and creation of hierarchical domain maps from textual data. Both linguistic and statistical methods are used. The main trust is to find satisfactory solutions to the following problems: (1) obtaining an accurate and versatile representation of database contents for search purposes; and (2) devising algorithms that can accomplish this task with the speed and robustness to match or exceed that of statistical systems. In order to create an accurate representation of database contents that would be able to support various types of search, an extensive natural language processing component is created. Linguistic processing includes stochastic part of speech tagging, dictionary- assisted stemming, syntactic parsing, phrase extraction and disambiguation, and semantic correlation of concepts underlying the database domain. This research is based extensively on empirical experiments with large text collections. It is expected to produce technologies that will significantly improve the expected performance levels for top-of-the-line full-text information retrieval systems. ***
9302615斯特扎科夫斯基从文本语料库到文本数据库:文本处理和检索研究这是为期三年的持续奖项的第一年资助。这项研究的目标是探索自然语言处理在从大型、最小结构的文本库中进行自动信息检索的潜力。这项工作包括开发更有效的索引、路由、内容近似、抽象和从文本数据创建层次域映射的技术。本文采用语言学和统计学相结合的方法。主要的任务是找到以下问题的满意解决方案:(1)为搜索目的获得准确和通用的数据库内容表示;(2)设计能够以与统计系统相媲美或超过的速度和稳健性来完成这一任务的算法。为了创建能够支持各种类型的搜索的数据库内容的准确表示,创建了扩展的自然语言处理组件。语言处理包括随机词性标注、词典辅助词干分析、句法分析、短语提取和歧义消除,以及数据库领域潜在概念的语义关联。这项研究广泛地基于对大量文本集合的实证实验。预计它将产生显著提高一流全文信息检索系统预期性能水平的技术。***
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Ralph Grishman其他文献
Distributed Representation Learning for Knowledge Bases with Entity Descriptions
具有实体描述的知识库的分布式表示学习
- DOI:
- 发表时间:
2017 - 期刊:
- 影响因子:5.1
- 作者:
Fan Miao;Zhou Qiang;Thomas Fang Zheng;Ralph Grishman - 通讯作者:
Ralph Grishman
Viterbi Algorithm
- DOI:
10.1007/978-0-387-30164-8_878 - 发表时间:
2010 - 期刊:
- 影响因子:0
- 作者:
Ralph Grishman - 通讯作者:
Ralph Grishman
COMLEX Syntax – A Large Syntactic Dictionary for Natural Language Processing
- DOI:
10.1023/a:1001142417369 - 发表时间:
1997-11-01 - 期刊:
- 影响因子:1.800
- 作者:
Catherine MacLeod;Ralph Grishman;Adam Meyers - 通讯作者:
Adam Meyers
Ralph Grishman的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Ralph Grishman', 18)}}的其他基金
ITR: Automated Structuring of Text Information
ITR:文本信息的自动构建
- 批准号:
0081962 - 财政年份:2000
- 资助金额:
$ 20.35万 - 项目类别:
Standard Grant
Collaborative Research on Knowledge Aquisition for Japanese-English Machine Translation
日英机器翻译知识获取协同研究
- 批准号:
9303013 - 财政年份:1993
- 资助金额:
$ 20.35万 - 项目类别:
Continuing Grant
A Sublanguage Approach to Japanese-English Machine Translation
日英机器翻译的子语言方法
- 批准号:
8902304 - 财政年份:1989
- 资助金额:
$ 20.35万 - 项目类别:
Continuing Grant
Industry-University Co-operative Research Program: Acquisition and Use of Semantic Information for Natural Language Processing
产学合作研究项目:自然语言处理语义信息的获取和利用
- 批准号:
8501843 - 财政年份:1985
- 资助金额:
$ 20.35万 - 项目类别:
Continuing Grant
Conference on Sublanguage Description and Processing (Computer Research) - New York University, New York, Ny, January, 1984
子语言描述和处理会议(计算机研究) - 纽约大学,纽约州纽约市,1984 年 1 月
- 批准号:
8301197 - 财政年份:1983
- 资助金额:
$ 20.35万 - 项目类别:
Standard Grant
Industry/University Cooperative Research Activity: Robust Natural Language Parsing Using Graded Acceptability (Computer Research)
产学合作研究活动:使用分级可接受性的鲁棒自然语言解析(计算机研究)
- 批准号:
8202373 - 财政年份:1982
- 资助金额:
$ 20.35万 - 项目类别:
Continuing Grant
Natural Language Interfaces Using Limited Semantic Information
使用有限语义信息的自然语言界面
- 批准号:
8002453 - 财政年份:1980
- 资助金额:
$ 20.35万 - 项目类别:
Continuing Grant
Research Into Natural Language Interfaces For Data Base Retrieval
数据库检索自然语言接口的研究
- 批准号:
7803118 - 财政年份:1978
- 资助金额:
$ 20.35万 - 项目类别:
Standard Grant
相似海外基金
Collaborative Research: Syntactically-annotated corpora for endangered languages in areal contact
合作研究:区域接触中濒危语言的句法注释语料库
- 批准号:
2319247 - 财政年份:2023
- 资助金额:
$ 20.35万 - 项目类别:
Standard Grant
Creation of naturalistic child language corpora of Tongan as a mother tongue and Tongan as a heritage language
创建汤加语作为母语和汤加语作为遗产语言的自然主义儿童语言语料库
- 批准号:
23K17507 - 财政年份:2023
- 资助金额:
$ 20.35万 - 项目类别:
Grant-in-Aid for Challenging Research (Exploratory)
RII Track-1: Harnessing the Data Revolution for Vermont: The Science of Online Corpora, Knowledge, and Stories (SOCKS)
RII Track-1:利用佛蒙特州的数据革命:在线语料库、知识和故事的科学 (SOCKS)
- 批准号:
2242829 - 财政年份:2023
- 资助金额:
$ 20.35万 - 项目类别:
Cooperative Agreement
Collaborative Research: Syntactically-annotated corpora for endangered languages in areal contact
合作研究:区域接触中濒危语言的句法注释语料库
- 批准号:
2319246 - 财政年份:2023
- 资助金额:
$ 20.35万 - 项目类别:
Standard Grant
CAREER: Knowledge Extraction and Discovery from Massive Text Corpora via Extremely Weak Supervision
职业:通过极弱监督从海量文本语料库中提取和发现知识
- 批准号:
2239440 - 财政年份:2023
- 资助金额:
$ 20.35万 - 项目类别:
Continuing Grant
Enhancing Automated Software Evolution via Building and Utilizing Large-Scale Software Evolution Corpora
通过构建和利用大规模软件演进语料库增强自动化软件演进
- 批准号:
22H03567 - 财政年份:2022
- 资助金额:
$ 20.35万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
EAGER: DCL: SaTC: Enabling Interdisciplinary Collaboration: Efficient Human-in-the-Loop Redaction of Language Development Corpora
EAGER:DCL:SaTC:实现跨学科协作:语言开发语料库的高效人机交互编辑
- 批准号:
2210193 - 财政年份:2022
- 资助金额:
$ 20.35万 - 项目类别:
Standard Grant
New approaches to computational natural language processing using ancient Near Eastern text corpora
使用古代近东文本语料库进行计算自然语言处理的新方法
- 批准号:
547773-2020 - 财政年份:2022
- 资助金额:
$ 20.35万 - 项目类别:
Alexander Graham Bell Canada Graduate Scholarships - Doctoral
Beyond parallel corpora: Enriching low-resource machine translation by leveraging language documentation data
超越并行语料库:利用语言文档数据丰富低资源机器翻译
- 批准号:
570119-2022 - 财政年份:2022
- 资助金额:
$ 20.35万 - 项目类别:
Postgraduate Scholarships - Doctoral
Learning and inference with large image corpora
使用大型图像语料库进行学习和推理
- 批准号:
RGPIN-2020-06848 - 财政年份:2022
- 资助金额:
$ 20.35万 - 项目类别:
Discovery Grants Program - Individual