权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

From Text Corpora to Text Databases: Research in Text Processing and Retrieval

从文本语料库到文本数据库：文本处理与检索研究

基本信息

批准号：
9302615
负责人：
Ralph Grishman
金额：
$ 20.35万
依托单位：
New York University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
1993
资助国家：
美国
起止时间：
1993-08-01 至 1997-01-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=9302615&HistoricalAwards=false
关键词：
Text Corpora Databases Research Processing

项目摘要

9302615 Strzalkowski From Text Corpora to Text Databases: Research in Text Processing and Retrieval This is the first year funding of a three-year continuing award. The goal of this research is to explore the potential of natural language processing in automated information retrieval from large, minimally structured text libraries. This effort includes development of more effective techniques of indexing, routing, contents approximation, abstracting and creation of hierarchical domain maps from textual data. Both linguistic and statistical methods are used. The main trust is to find satisfactory solutions to the following problems: (1) obtaining an accurate and versatile representation of database contents for search purposes; and (2) devising algorithms that can accomplish this task with the speed and robustness to match or exceed that of statistical systems. In order to create an accurate representation of database contents that would be able to support various types of search, an extensive natural language processing component is created. Linguistic processing includes stochastic part of speech tagging, dictionary- assisted stemming, syntactic parsing, phrase extraction and disambiguation, and semantic correlation of concepts underlying the database domain. This research is based extensively on empirical experiments with large text collections. It is expected to produce technologies that will significantly improve the expected performance levels for top-of-the-line full-text information retrieval systems. ***

9302615斯特扎科夫斯基从文本语料库到文本数据库：文本处理和检索研究这是为期三年的持续奖项的第一年资助。这项研究的目标是探索自然语言处理在从大型、最小结构的文本库中进行自动信息检索的潜力。这项工作包括开发更有效的索引、路由、内容近似、抽象和从文本数据创建层次域映射的技术。本文采用语言学和统计学相结合的方法。主要的任务是找到以下问题的满意解决方案：(1)为搜索目的获得准确和通用的数据库内容表示；(2)设计能够以与统计系统相媲美或超过的速度和稳健性来完成这一任务的算法。为了创建能够支持各种类型的搜索的数据库内容的准确表示，创建了扩展的自然语言处理组件。语言处理包括随机词性标注、词典辅助词干分析、句法分析、短语提取和歧义消除，以及数据库领域潜在概念的语义关联。这项研究广泛地基于对大量文本集合的实证实验。预计它将产生显著提高一流全文信息检索系统预期性能水平的技术。***