权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

DATA MINING AND MODEL BUILDING IN MEDICAL INFORMATICS

医疗信息学中的数据挖掘和模型构建

基本信息

批准号：
6391275
负责人：
BRUCE G. BUCHANAN
金额：
$ 21.55万
依托单位：
UNIVERSITY OF PITTSBURGH AT PITTSBURGH
依托单位国家：
美国
项目类别：
财政年份：
1999
资助国家：
美国
起止时间：
1999-05-01 至 2003-04-30
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/6391275
关键词：
artificial intelligence classification computer assisted instruction computer simulation computer system design /evaluation human data informatics information retrieval model design /development pneumonia

项目摘要

Our long-term goal is to assist biomedical scientists by extracting and codifying new knowledge from large biomedical databases routinely by computer. As large collections of data become more readily accessibly, the opportunities for discovering new information increase. We propose here to work toward this goal by extending our prior research on machine learning in two important directions: (1) codification of disparate pieces of knowledge into a coherent model (model building), and (2) discovery of new information in medical databases (data mining). Machine learning programs find classification rules (or decision trees or networks) that separate members of a target class from other individuals. They have emphasized predictive accuracy, with some attention to tradeoffs between accuracy and cost of errors or between accuracy and simplicity. We propose a framework in which these, and other, tradeoffs are explicit and the criteria by which tradeoffs are made are available for modification. We also include semantic considerations among the criteria to control the internal coherence of models. "Data mining" is a recently-coined term for using computers to explore large databases, with a goal of discovering new relationships but usually with no specific target defined at the outset. In addition to accuracy, simplicity, coherence, and cost, a program that purports to discover new relationships must be able to assess novelty. We propose to measure the extent to which proposed relationships are novel by comparing them against existing knowledge in the domain of discourse, and to look for unusual rules (and other relations) that would be very interesting if true. The computer program we are primarily building on, RL, is a knowledge- based learning program that learns classification rules from a collection of data. RL has been demonstrated to be flexible enough to allow guidance from prior knowledge, and powerful enough to learn publishable information for scientists working in several different domains. Both parts of the research will requires extending the RL system in new ways detailed in the research plan, which are consistent with the overall design philosophy of the present system. We will primarily work with data already collected on pneumonia patients with with which we have considerable. We will test the generality of the criteria used to evaluate models and discoveries with a Baynesian Net learning. We will test the generality of the generality of the criteria used to evaluate models and discoveries with Bayesian Net learning system, K2.

我们的长期目标是帮助生物医学科学家通过提取和通过常规方式从大型生物医学数据库中编纂新知识电脑。随着大量数据集合变得更容易访问，发现新信息的机会增加了。我们建议在此通过扩展我们先前对机器的研究来努力实现这一目标两个重要方向的学习：(1)不同的将知识片段转化为连贯的模型(模型构建)，以及(2) 在医学数据库中发现新信息(数据挖掘)。机器学习程序查找分类规则(或决策树或网络)将目标类的成员与其他成员分开个人。他们强调预测的准确性，一些注意精度和误差成本之间或两者之间的权衡准确性和简单性。我们提出了一个框架，在这个框架中，其他，权衡是明确的，权衡的标准是制造的可供修改。我们还包括语义关于控制内部连贯性标准的几点思考模特们。 “数据挖掘”是最近发明的一个术语，指的是使用计算机进行探索大型数据库，目标是发现新的关系，但通常在一开始就没有明确的目标。除了……之外精确度、简单性、连贯性和成本，这一计划旨在发现新的关系必须能够评估新鲜感。我们建议通过以下方式衡量提出的关系的新奇程度将它们与话语领域中的现有知识进行比较，寻找不寻常的规则(和其他关系)，这将是非常如果是真的，那就很有趣了。我们主要建立的计算机程序，RL，是一种知识- 基于学习程序，该程序从数据的收集。RL已被证明具有足够的灵活性允许来自先前知识的指导，并具有足够强大的学习能力为在多个不同领域工作的科学家提供的可发布信息域名。研究的两个部分都需要延长RL 研究计划中详细说明的新方法的系统，这些方法是一致的与本系统的总体设计理念相一致。我们会主要使用已经收集的肺炎患者的数据我们有相当多的。我们将测试使用贝叶斯网络评估模型和发现的标准学习。我们将测试标准的一般性用于通过贝叶斯网络学习评估模型和发现系统，K2。