ITR: Optimal Support Set Selection in Data Analysis with Applications to Bioinformatics

ITR:数据分析中的最佳支持集选择及其在生物信息学中的应用

基本信息

  • 批准号:
    0312953
  • 负责人:
  • 金额:
    --
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2003
  • 资助国家:
    美国
  • 起止时间:
    2003-08-01 至 2007-07-31
  • 项目状态:
    已结题

项目摘要

This research will develop systematic procedures which take advantage of computer-related developments and advanced combinatorial optimization techniques, to build on previously successful ad-hoc methods for optimizing feature selection in data analysis, with special attention to bioinformatics. Knowledge extraction from data represents a fundamental challenge in information technology research. A very frequent type of knowledge extraction problem is that of analyzing archives of records or observations in order to discover hidden structural relationships. Problems of this type appear in numerous areas of science, technology, medicine, management, and in countless other areas of activity. The advent of the computer and of the Internet have radically increased the role of data analysis, by allowing not only the creation of large, meaningful datasets, but also by making them accessible to researchers all over the globe.One of the most prominent areas of applications of data analysis is in systems biology and bioinformatics. In contrast to molecular biology investigations, which typically focus on single molecules, systems biology pays attention to tens or even hundreds of thousands of biological attributes at the same time. Moreover, the number of attributes included in a dataset is predicted to increase dramatically in the very near future. The new global approach to systems biology has been enabled by new technologies that have allowed the simultaneous measurement of large numbers of attributes and the generation of large multiparameter datasets. These biological datasets represent the domain of bioinformatics. Beside the classic methods of statistics, new approaches to the analysis of data are required. Among others, these methodologies prompt the development of entirely new research areas, including e.g., machine learning, data mining, neural networks, support vector machines. In all these areas of data analysis, the knowledge of a known (finite) subset of observations is used to derive conclusions about the entire set of possible observations.While powerful analytic tools have been developed within the framework of statistics and of newer research areas based heavily on combinatorics, logic and optimization, the size of the problems in the area of bioinformatics (as well as in some other areas), leads to major computational difficulties, and raises the challenge of developing new approaches in which systematic heuristic procedures are integrated into solution algorithms, in order to intelligently reduce the size of the problems to be solved. By using ad-hoc combinations of heuristics and of combinatorics, logic, and optimization based algorithms, this project aims to achieve spectacular reductions of problem size without significant loss in the accuracy of the resulting models.This project will introduce new concepts for evaluating the role and the impact of features in data analysis problems. These concepts combine elements of statistics, combinatorics, information theory, the theory of Boolean functions, the theories of games and of voting. On the algorithmic side, they open possibilities for systematic heuristic approaches to the elimination of redundant features. The project will also introduce new concepts for the comparative evaluation of pairs of features, including those of similarity and domination. These concepts combine elements from statistics, the theory of partially ordered sets, and that of Boolean algebra, and can add to the arsenal of tools available for the elimination of unnecessary features. As opposed to previous studies which view the sets of attributes just as collections of individual attributes, the project proposes a study of the combined efforts of attributes. The research plan includes the development of a local optimality criterion for a set of attributes which should combine the two desired characteristics of a "support set": it should allow the construction of accurate models, and it should, at the same time, be of a computationally manageable size. In addition, the project will introduce new, synthetic "logical" attributes which allow, on the one hand, the compression of the dataset and, on the other hand, the possibility of finding clearly understandable and practical usable "logical" discriminants, which distinguish the positive observations from the negative ones. A very significant application of these ideas is the proposed algorithmic framework for optimizing feature selection. In the field of biological research and bioinformatics, the project introduces: (i) the concept of "groups of biomarkers" (obtainable through combinatorial optimization techniques), (ii) an algorithmic hypothesis generator for biological research, and (iii) a new approach for discovering new classes of observations with highly similar characteristics. A publicly available software package will result from the work and is expected to stimulate substantial research in the computational analysis of data, and in bioinformatics.By combining elements of statistics, combinatorial optimization, information theory, the theories of Boolean functions, games, voting, and partially ordered sets, the project is expected to attract researchers from a variety of areas to collaborative studies. The publicly available software for the analysis of bioinformatics data, as well as the hypothesis generation system proposed, will provide major tools for stimulating research in biology and bioinformatics.
本研究将开发系统的程序,利用计算机相关的发展和先进的组合优化技术,以先前成功的临时方法为基础,在数据分析中优化特征选择,特别关注生物信息学。从数据中提取知识是信息技术研究的一个基本挑战。一种非常常见的知识提取问题是分析记录或观察的档案,以发现隐藏的结构关系。这类问题出现在科学、技术、医学、管理和无数其他活动领域的许多领域。计算机和互联网的出现从根本上提高了数据分析的作用,不仅允许创建大量有意义的数据集,而且使全球的研究人员都可以访问这些数据集。数据分析最突出的应用领域之一是系统生物学和生物信息学。与分子生物学研究通常关注单个分子不同,系统生物学同时关注数万甚至数十万个生物属性。此外,预计在不久的将来,数据集中包含的属性数量将急剧增加。系统生物学的新全球方法是通过新技术实现的,这些新技术允许同时测量大量属性和生成大型多参数数据集。这些生物数据集代表了生物信息学领域。除了经典的统计方法外,还需要新的数据分析方法。除此之外,这些方法促进了全新研究领域的发展,包括机器学习、数据挖掘、神经网络、支持向量机。在所有这些数据分析领域中,已知(有限)观测子集的知识用于得出关于整个可能观测集的结论。虽然在统计学和基于组合学、逻辑学和优化的新研究领域的框架内已经开发出强大的分析工具,但生物信息学领域(以及其他一些领域)问题的规模导致了主要的计算困难,并提出了开发新方法的挑战,其中将系统启发式程序集成到解决算法中。以便智能地减少待解决问题的规模。通过使用启发式和组合学、逻辑和基于优化的算法的特别组合,该项目旨在实现问题规模的显著减少,而不会显著降低结果模型的准确性。这个项目将引入新的概念来评估特征在数据分析问题中的作用和影响。这些概念结合了统计学、组合学、信息论、布尔函数理论、博弈论和投票理论的元素。在算法方面,它们为消除冗余特征的系统启发式方法提供了可能性。该项目还将引入对特征对进行比较评价的新概念,包括相似性和支配性。这些概念结合了统计学、部分有序集理论和布尔代数的元素,可以添加到消除不必要特征的可用工具库中。与以往的研究相反,这些研究将属性集视为单个属性的集合,该项目提出了对属性组合作用的研究。研究计划包括开发一组属性的局部最优性标准,该标准应该结合“支持集”的两个期望特征:它应该允许构建准确的模型,同时它应该具有计算上可管理的大小。此外,该项目将引入新的、综合的“逻辑”属性,一方面允许对数据集进行压缩,另一方面,有可能找到清晰易懂、实际可用的“逻辑”判别符,将积极的观察结果与消极的观察结果区分开来。这些思想的一个非常重要的应用是所提出的优化特征选择的算法框架。在生物研究和生物信息学领域,该项目引入了:(i)“生物标记组”的概念(可通过组合优化技术获得),(ii)用于生物研究的算法假设生成器,以及(iii)用于发现具有高度相似特征的新类别观察的新方法。这项工作将产生一个公开可用的软件包,预计将刺激数据计算分析和生物信息学方面的大量研究。通过结合统计学、组合优化、信息论、布尔函数理论、博弈论、投票和部分有序集的元素,该项目有望吸引来自各个领域的研究人员进行合作研究。公开可用的分析生物信息学数据的软件,以及提出的假设生成系统,将为促进生物学和生物信息学的研究提供主要工具。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Peter Hammer其他文献

Amine functionalization of carbon nanotubes with solid urea using different plasma treatments
  • DOI:
    10.1016/j.apsusc.2022.152493
  • 发表时间:
    2022-05-01
  • 期刊:
  • 影响因子:
  • 作者:
    Teresa Tromm Steffen;Luis César Fontana;Peter Hammer;Daniela Becker
  • 通讯作者:
    Daniela Becker
Addition of molybdate ions to the anodizing bath to improve the corrosion resistance of clad 2024-T3 alloy anodized in tartaric-sulfuric acid
  • DOI:
    10.1016/j.surfcoat.2024.130682
  • 发表时间:
    2024-04-30
  • 期刊:
  • 影响因子:
  • 作者:
    Thassia Felix de Almeida;Oscar Mauricio Prada Ramirez;Alex Lanzutti;Cleber Lima Rodrigues;Manfred Brabetz;Thomas M. Kremmer;Peter Hammer;Hercilio Gomes de Melo
  • 通讯作者:
    Hercilio Gomes de Melo
COMPUTATIONAL FLUID DYNAMICS MODELING OF HEPATIC-PULMONARY BLOOD FLOW FOR FONTAN PLANNING IN PATIENTS WITH INTERRUPTED INFERIOR VENA CAVA
  • DOI:
    10.1016/s0735-1097(21)01809-x
  • 发表时间:
    2021-05-11
  • 期刊:
  • 影响因子:
  • 作者:
    David Monroe Hoganson;Vijay Govindarajan;Noah E. Schulz;Emily R. Eickhoff;Roger Breitbart;Gerald Marx;Pedro del Nido;Peter Hammer
  • 通讯作者:
    Peter Hammer
Removal of metal ions from aqueous solution by chelating polymeric hydrogel
  • DOI:
    10.1007/s10311-009-0231-0
  • 发表时间:
    2009-07-25
  • 期刊:
  • 影响因子:
    20.400
  • 作者:
    Hudson Wallace Pereira Carvalho;Ana P. L. Batista;Peter Hammer;Gustavo H. P. Luz;Teodorico C. Ramalho
  • 通讯作者:
    Teodorico C. Ramalho
ZnO surface modification with maleic anhydride using plasma treatment
使用等离子体处理用马来酸酐对 ZnO 进行表面改性
  • DOI:
    10.1002/ppap.202300165
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    3.5
  • 作者:
    Larissa A. Klok;T. T. Steffen;Henrique R. Sabedra;Luis C. Fontana;Peter Hammer;Felippe M. Marega;Lidiane C. Costa;L. Pessan;Daniela Becker
  • 通讯作者:
    Daniela Becker

Peter Hammer的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Peter Hammer', 18)}}的其他基金

Workshop on Discrete Optimization '99
离散优化研讨会99
  • 批准号:
    9976754
  • 财政年份:
    1999
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Pseudo-Boolean Functions: Representations and Optimization
伪布尔函数:表示和优化
  • 批准号:
    9806389
  • 财政年份:
    1998
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
U.S.-Belgium Cooperative Research: Nonlinear 0-1 Optimization
美国-比利时合作研究:非线性0-1优化
  • 批准号:
    9321811
  • 财政年份:
    1995
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Mathematical Sciences Computing Research Environments
数学科学计算研究环境
  • 批准号:
    9406327
  • 财政年份:
    1994
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Mathematical Sciences: Functions of Binary Variables
数学科学:二元变量的函数
  • 批准号:
    8906870
  • 财政年份:
    1989
  • 资助金额:
    --
  • 项目类别:
    Continuing Grant
Nonlinear Binary Optimization
非线性二元优化
  • 批准号:
    8503212
  • 财政年份:
    1985
  • 资助金额:
    --
  • 项目类别:
    Continuing Grant
Mathematical Sciences: Structural and Algorithmic Aspects of Nonlinear Discrete Optimization
数学科学:非线性离散优化的结构和算法方面
  • 批准号:
    8305569
  • 财政年份:
    1984
  • 资助金额:
    --
  • 项目类别:
    Standard Grant

相似海外基金

Decision support systems based on heterogeneous data driven models for a safe and optimal operation of industrial process systems
基于异构数据驱动模型的决策支持系统,用于工业过程系统的安全和优化运行
  • 批准号:
    RGPIN-2021-02929
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    Discovery Grants Program - Individual
Decision support systems based on heterogeneous data driven models for a safe and optimal operation of industrial process systems
基于异构数据驱动模型的决策支持系统,用于工业过程系统的安全和优化运行
  • 批准号:
    DGDND-2021-02929
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    DND/NSERC Discovery Grant Supplement
Development of a novel computational framework to support therapeutic-planning in selecting the optimal thromboembolic prevention treatment for AF
开发一种新型计算框架,支持选择最佳房颤血栓栓塞预防治疗的治疗计划
  • 批准号:
    2719105
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    Studentship
Collaborative Research: SCH: Optimal Desensitization Protocol in Support of a Kidney Paired Donation (KPD) System
合作研究:SCH:支持肾脏配对捐赠 (KPD) 系统的最佳脱敏方案
  • 批准号:
    2123685
  • 财政年份:
    2021
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Decision support systems based on heterogeneous data driven models for a safe and optimal operation of industrial process systems
基于异构数据驱动模型的决策支持系统,用于工业过程系统的安全和优化运行
  • 批准号:
    DGDND-2021-02929
  • 财政年份:
    2021
  • 资助金额:
    --
  • 项目类别:
    DND/NSERC Discovery Grant Supplement
Decision support systems based on heterogeneous data driven models for a safe and optimal operation of industrial process systems
基于异构数据驱动模型的决策支持系统,用于工业过程系统的安全和优化运行
  • 批准号:
    RGPIN-2021-02929
  • 财政年份:
    2021
  • 资助金额:
    --
  • 项目类别:
    Discovery Grants Program - Individual
Collaborative Research: SCH: Optimal Desensitization Protocol in Support of a Kidney Paired Donation (KPD) System
合作研究:SCH:支持肾脏配对捐赠 (KPD) 系统的最佳脱敏方案
  • 批准号:
    2123684
  • 财政年份:
    2021
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Collaborative Research: SCH: Optimal Desensitization Protocol in Support of a Kidney Paired Donation (KPD) System
合作研究:SCH:支持肾脏配对捐赠 (KPD) 系统的最佳脱敏方案
  • 批准号:
    2123683
  • 财政年份:
    2021
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Decision support systems based on heterogeneous data driven models for a safe and optimal operation of industrial process systems
基于异构数据驱动模型的决策支持系统,用于工业过程系统的安全和优化运行
  • 批准号:
    DGECR-2021-00180
  • 财政年份:
    2021
  • 资助金额:
    --
  • 项目类别:
    Discovery Launch Supplement
Evaluation of optimal pharmacologic haemodynamic support strategies in patients presenting with shock
休克患者最佳药物血流动力学支持策略的评估
  • 批准号:
    nhmrc : 2003944
  • 财政年份:
    2021
  • 资助金额:
    --
  • 项目类别:
    Postgraduate Scholarships
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了