权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Adding Domain Knowledge to Inductive Learning Methods for Classifying Texts

将领域知识添加到归纳学习方法中以对文本进行分类

基本信息

批准号：
9987869
负责人：
Kevin Ashley
金额：
$ 20万
依托单位：
University of Pittsburgh
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2000
资助国家：
美国
起止时间：
2000-09-01 至 2004-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=9987869&HistoricalAwards=false
关键词：
Adding Domain Knowledge Inductive Learning

项目摘要

The objective of this research is to investigate the integration of background knowledge into a machine learning approach for automatically indexing text documents. Case-Based Reasoning models for utilizing past experiences have been developed for domains where the cases are text. The prohibitive cost of manually indexing cases has hindered the development and maintenance of large systems for applications in the law, ethics, or help-desk settings. New methods that learn a text classifier from a small collection of annotated case summaries, which will classify large numbers of cases automatically, can help overcome this knowledge-acquisition bottleneck. Text learning algorithms used elsewhere are not applicable because they require large training sets. Here, background knowledge about the domain and a linguistic analysis of the examples is employed to develop a better representation of the examples, which will allow learning algorithms to better generalize from small collections of text cases. The project will also yield a better understanding of what makes a good text representation for learning and classification, and the effects of adding background knowledge and natural language processing tools. The experiments are based on a relatively small collection in a well-defined domain, in which the PI and his group have accumulated significant expertise. This unique background allows a more thorough analysis of the experimental results than generally performed. The classifier is evaluated both on a set of marked-up summaries and the corresponding full-length documents. Further experiments explore the use of unseen and unlabeled cases, and explain the observed behavior. The results and the analysis of the experiments will enable researchers in other domains to improve the representation of text cases. Thus, the research results will not only be relevant for case-based reasoning and machine learning, but also for information retrieval and other text-based applications.

本研究的目的是探讨背景知识的整合到机器学习方法自动索引文本文档。基于案例的推理模型，利用过去的经验已经开发的领域的情况下，文本。人工编制案件索引的高昂费用阻碍了大型系统在法律、道德或服务台环境中的应用的开发和维护。新的方法，从一个小集合的注释的情况下，自动分类大量的情况下，学习文本分类器，可以帮助克服这种知识获取瓶颈。其他地方使用的文本学习算法不适用，因为它们需要大量的训练集。在这里，关于域的背景知识和示例的语言分析被用来开发示例的更好的表示，这将允许学习算法更好地从文本案例的小集合中概括。该项目还将更好地理解什么是学习和分类的良好文本表示，以及添加背景知识和自然语言处理工具的效果。这些实验是基于一个定义明确的领域中相对较小的集合，PI和他的团队在这个领域积累了大量的专业知识。这种独特的背景允许比通常执行的实验结果更彻底的分析。该分类器的评估上的一组标记的摘要和相应的全长文档。进一步的实验探索使用看不见的和未标记的情况下，并解释观察到的行为。实验的结果和分析将有助于其他领域的研究者改进文本案例的表示。因此，研究结果不仅将是相关的基于案例的推理和机器学习，而且对信息检索和其他基于文本的应用。