ITR Collaborative Research: Combinatorial Algorithms for Biological Data Clustering

ITR 协作研究:生物数据聚类的组合算法

基本信息

  • 批准号:
    0325386
  • 负责人:
  • 金额:
    $ 129.5万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2003
  • 资助国家:
    美国
  • 起止时间:
    2003-09-15 至 2004-05-31
  • 项目状态:
    已结题

项目摘要

Project SummaryThe Human Genome Project has opened the flood-gate of biological data, which has resulted in the generation of enormous amount of sequence, structure, expression, and interaction data at rates that far exceed our current capability of analyzing and interpreting them. New ideas and approaches are urgently needed to establish greatly improved capabilities for biological data analysis. Data clustering is fundamental to mining a large quantity of biological data. The goals of this project are (a) to develop a highly effective and general framework for biological data clustering, which is applicable to a large class of biological data analysis problems; (b) to demonstrate the effectiveness of this framework as a general-purpose clustering tool, through application to four challenging biological data analysis problems; (c) to implement this clustering framework as a set of library functions, in a similar fashion to LINPACK/LAPACK, with which other researchers can build their own clustering capabilities more efficiently; (d) to provide insight on several biological problems through clustering analysis; and (e) to train students/postdocs how to build biological data analysis tools, using our clustering framework as a training ground. The foundation of our framework is a minimum spanning tree (MST) representation of a data set and its relationships with clustering. Our preliminary studies have revealed that (i) there is a natural connection between MSTs and the concept of clustering, which can help to reduce a multi-dimensional data clustering problem to a tree-partitioning problem; (ii) clustering problems with general objective functions, defined on (minimum spanning) trees, can be solved optimally and efficiently; and (iii) MSTs provide a natural framework for solving a more general class of clustering problems, i.e., extracting data clusters from a noisy background. Additional preliminary studies have also revealed that MSTs have such rich properties related to clustering that further investigation could lead to significantly more effective ways of clustering and analyzing biological data. Our research will be organized and carried out in five tasks.o Investigation of fundamental properties of MSTs versus clustering: We will investigate fundamental relationships between MSTs and clustering. New insights and discoveries about their relationships will be used to lay the foundation for development of more effective ways of clustering.o Investigation and development of MST-based clustering algorithms and statistical analysis methods: We will investigate and develop a large class of MST-based algorithms for several clustering related problems. In addition, we will investigate and develop effective statistical analysis tools for assessing statistical significance and robustness of clustering results.o Development of improved analysis capabilities for four selected application problems: We will apply our clustering framework to four biological data analysis problems: (1) gene expression data analysis, (2) regulatory binding site identification, (3) two-hybrid data analysis, and (4) phylogenetic tree clustering analysis.o Implementation of our MST-based clustering framework as library functions: We will implement our MST-based clustering-related algorithms as APIs (Application Programming Interface), which can be used easily by other researchers in their own data analysis software. In addition, we will implement our clustering tools as a Web server for community service.o Training and education: As MST provides such a rich set of attractive properties relevant to clustering, we will use our MST-based clustering framework as a training platform to teach students/postdocs how to develop biological data analysis tools.Our proposed study and development directly address the research challenges of the ITR program in the following areas:o providing new computational, simulation and data-analysis methods and tools to model physical, biological,social, behavioral and mathematical phenomena, ando improving our ability to understand, model and control the behavior of complex systems.
项目摘要人类基因组计划打开了生物数据的闸门,导致大量序列、结构、表达和相互作用数据的生成,其速度远远超出了我们目前分析和解释这些数据的能力。迫切需要新的想法和方法来大大提高生物数据分析的能力。数据聚类是挖掘大量生物数据的基础。该项目的目标是(a)开发一个高效且通用的生物数据聚类框架,适用于一大类生物数据分析问题; (b) 通过应用于四个具有挑战性的生物数据分析问题,证明该框架作为通用聚类工具的有效性; (c) 以类似于 LINPACK/LAPACK 的方式将这个聚类框架实现为一组库函数,其他研究人员可以使用它更有效地构建自己的聚类功能; (d) 通过聚类分析提供对若干生物学问题的见解; (e) 使用我们的聚类框架作为训练场,培训学生/博士后如何构建生物数据分析工具。我们框架的基础是数据集及其与聚类关系的最小生成树(MST)表示。我们的初步研究表明:(i)MST 与聚类概念之间存在天然联系,有助于将多维数据聚类问题简化为树划分问题; (ii) 可以最优且高效地解决具有在(最小生成)树上定义的一般目标函数的聚类问题; (iii) MST 提供了一个自然的框架来解决更一般的聚类问题,即从噪声背景中提取数据聚类。其他初步研究还表明,MST 具有与聚类相关的丰富属性,进一步的研究可能会带来更加有效的聚类和分析生物数据的方法。我们的研究将分为五个任务来组织和开展。 o 研究 MST 与聚类的基本属性:我们将研究 MST 与聚类之间的基本关系。关于它们之间关系的新见解和发现将为开发更有效的聚类方法奠定基础。 o 基于 MST 的聚类算法和统计分析方法的研究和开发:我们将针对几个聚类相关问题研究和开发一大类基于 MST 的算法。此外,我们将研究和开发有效的统计分析工具,用于评估聚类结果的统计显着性和稳健性。 o 开发针对四个选定应用问题的改进分析能力:我们将把我们的聚类框架应用于四个生物数据分析问题:(1) 基因表达数据分析,(2) 调控结合位点识别,(3) 双杂交数据分析,以及 (4) 系统发育树聚类分析。 o 实施基于 MST 的分析 聚类框架作为库函数:我们将把基于 MST 的聚类相关算法实现为 API(应用程序编程接口),其他研究人员可以在自己的数据分析软件中轻松使用。此外,我们将把我们的聚类工具作为社区服务的 Web 服务器来实现。 o 培训和教育:由于 MST 提供了如此丰富的与聚类相关的有吸引力的属性,我们将使用我们基于 MST 的聚类框架作为培训平台,教学生/博士后如何开发生物数据分析工具。我们提出的研究和开发直接解决 ITR 计划在以下领域的研究挑战: o 为以下领域提供新的计算、模拟和数据分析方法和工具: 对物理、生物、社会、行为和数学现象进行建模,并提高我们理解、建模和控制复杂系统行为的能力。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Ying Xu其他文献

Synergistic inhibitory effects of transplatin and beta-hydroxyisovalerylshikon on carcinoma A431 cells involve epidermal growth factor receptor.
转铂和β-羟基异戊酰紫草对癌A431细胞的协同抑制作用涉及表皮生长因子受体。
  • DOI:
    10.1016/s0304-3835(02)00457-3
  • 发表时间:
    2002
  • 期刊:
  • 影响因子:
    9.7
  • 作者:
    Ying Xu;S. Nakajo;K. Nakaya
  • 通讯作者:
    K. Nakaya
Mesh convergence for turbulent combustion
湍流燃烧的网格收敛
  • DOI:
    10.3934/dcds.2016.36.4383
  • 发表时间:
    2016
  • 期刊:
  • 影响因子:
    1.1
  • 作者:
    Xiaoxue Gong;Ying Xu;Vinay Mahadeo;T. Kaman;J. Larsson;J. Glimm
  • 通讯作者:
    J. Glimm
Elucidation of Cancer Drivers Through Comparative Omic Data Analyses
通过比较组学数据分析阐明癌症驱动因素
  • DOI:
  • 发表时间:
    2014
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Ying Xu;J. Cui;D. Puett
  • 通讯作者:
    D. Puett
Improving Lithium‐Ion Diffusion Kinetics in Nano‐Si@C Anode Materials with Hierarchical MoS2 Decoration for High‐Performance Lithium‐Ion Batteries
利用分级 MoS2 修饰改善高性能锂离子电池纳米 Si@C 负极材料中的锂离子扩散动力学
  • DOI:
    10.1002/celc.202100263
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    4
  • 作者:
    Xiongbiao Ye;Chuanhai Gan;Liuqing Huang;Yiwei Qiu;Ying Xu;Liuying Huang;Xuetao Luo
  • 通讯作者:
    Xuetao Luo
Evolution of Arginine Biosynthesis in the Bacterial Domain: Novel Gene-Enzyme Relationships from Psychrophilic Moritella Strains (Vibrionaceae) and Evolutionary Significance of N-α-Acetyl Ornithinase
细菌领域精氨酸生物合成的进化:嗜冷Moritella菌株(弧菌科)的新基因-酶关系和N-α-乙酰鸟氨酸酶的进化意义
  • DOI:
    10.1128/jb.182.6.1609-1615.2000
  • 发表时间:
    2000
  • 期刊:
  • 影响因子:
    3.2
  • 作者:
    Ying Xu;Ziyuan Liang;C. Legrain;H. Rüger;N. Glansdorff
  • 通讯作者:
    N. Glansdorff

Ying Xu的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Ying Xu', 18)}}的其他基金

Building A Teacher-AI Collaborative System for Personalized Instruction and Assessment of Comprehension Skills
构建教师-AI协作系统,进行个性化教学和理解能力评估
  • 批准号:
    2302730
  • 财政年份:
    2023
  • 资助金额:
    $ 129.5万
  • 项目类别:
    Standard Grant
UNS: Organophosphates and Phthalates in Sleep Microenvironments: Emission, Transport, and Infants' Exposure
UNS:睡眠微环境中的有机磷酸酯和邻苯二甲酸盐:排放、运输和婴儿接触
  • 批准号:
    1512610
  • 财政年份:
    2015
  • 资助金额:
    $ 129.5万
  • 项目类别:
    Continuing Grant
CAREER: Emission and Transport of PBDEs in Indoor Environments
职业:室内环境中多溴联苯醚的排放和传输
  • 批准号:
    1150713
  • 财政年份:
    2012
  • 资助金额:
    $ 129.5万
  • 项目类别:
    Standard Grant
Collaborative Research: Phthalate Plasticizers: Temperature Dependence of Material/Air Equilibria and Consequences for Emissions, Exposure and Risk
合作研究:邻苯二甲酸酯增塑剂:材料/空气平衡的温度依赖性以及对排放、暴露和风险的影响
  • 批准号:
    1066642
  • 财政年份:
    2011
  • 资助金额:
    $ 129.5万
  • 项目类别:
    Continuing Grant
MRI: Acquisition of a Computer Cluster for Bioinformatics Research at UGA
MRI:在佐治亚大学购买用于生物信息学研究的计算机集群
  • 批准号:
    0821263
  • 财政年份:
    2008
  • 资助金额:
    $ 129.5万
  • 项目类别:
    Standard Grant
Computational Prediction of Biological Networks in Microbes and Applications to Cyanobacteria
微生物生物网络的计算预测及其在蓝藻中的应用
  • 批准号:
    0542119
  • 财政年份:
    2006
  • 资助金额:
    $ 129.5万
  • 项目类别:
    Continuing Grant
CompBio: A New Paradigm of Protein Threading: simultaneous backbone threading and side-chain packing prediction.
CompBio:蛋白质线程的新范式:同时主链线程和侧链包装预测。
  • 批准号:
    0621700
  • 财政年份:
    2006
  • 资助金额:
    $ 129.5万
  • 项目类别:
    Standard Grant
A Computational Capability for Fast and Reliable Characterization of Protein Complexes
快速可靠地表征蛋白质复合物的计算能力
  • 批准号:
    0354771
  • 财政年份:
    2003
  • 资助金额:
    $ 129.5万
  • 项目类别:
    Continuing Grant
ITR Collaborative Research: Combinatorial Algorithms for Biological Data Clustering
ITR 协作研究:生物数据聚类的组合算法
  • 批准号:
    0407204
  • 财政年份:
    2003
  • 资助金额:
    $ 129.5万
  • 项目类别:
    Continuing Grant
A Computational Capability for Fast and Reliable Characterization of Protein Complexes
快速可靠地表征蛋白质复合物的计算能力
  • 批准号:
    0213840
  • 财政年份:
    2002
  • 资助金额:
    $ 129.5万
  • 项目类别:
    Continuing Grant

相似海外基金

ITR Collaborative Research: Pervasively Secure Infrastructures (PSI): Integrating Smart Sensing, Data Mining, Pervasive Networking, and Community Computing
ITR 协作研究:普遍安全基础设施 (PSI):集成智能传感、数据挖掘、普遍网络和社区计算
  • 批准号:
    1404694
  • 财政年份:
    2013
  • 资助金额:
    $ 129.5万
  • 项目类别:
    Continuing Grant
ITR-SCOTUS: A Resource for Collaborative Research in Speech Technology, Linguistics, Decision Processes, and the Law
ITR-SCOTUS:语音技术、语言学、决策过程和法律合作研究的资源
  • 批准号:
    1139735
  • 财政年份:
    2011
  • 资助金额:
    $ 129.5万
  • 项目类别:
    Continuing Grant
ITR/NGS: Collaborative Research: DDDAS: Data Dynamic Simulation for Disaster Management
ITR/NGS:合作研究:DDDAS:灾害管理数据动态模拟
  • 批准号:
    0963973
  • 财政年份:
    2009
  • 资助金额:
    $ 129.5万
  • 项目类别:
    Continuing Grant
ITR/NGS: Collaborative Research: DDDAS: Data Dynamic Simulation for Disaster Management
ITR/NGS:合作研究:DDDAS:灾害管理数据动态模拟
  • 批准号:
    1018072
  • 财政年份:
    2009
  • 资助金额:
    $ 129.5万
  • 项目类别:
    Continuing Grant
ITR Collaborative Research: A Reusable, Extensible, Optimizing Back End
ITR 协作研究:可重用、可扩展、优化的后端
  • 批准号:
    0838899
  • 财政年份:
    2008
  • 资助金额:
    $ 129.5万
  • 项目类别:
    Continuing Grant
ITR Collaborative Research: Pervasively Secure Infrastructures (PSI): Integrating Smart Sensing, Data Mining, Pervasive Networking, and Community Computing
ITR 协作研究:普遍安全基础设施 (PSI):集成智能传感、数据挖掘、普遍网络和社区计算
  • 批准号:
    0833849
  • 财政年份:
    2008
  • 资助金额:
    $ 129.5万
  • 项目类别:
    Continuing Grant
ITR/NGS: Collaborative Research: DDDAS: Data Dynamic Simulation for Disaster Management
ITR/NGS:合作研究:DDDAS:灾害管理数据动态模拟
  • 批准号:
    0808419
  • 财政年份:
    2007
  • 资助金额:
    $ 129.5万
  • 项目类别:
    Continuing Grant
ITR: Collaborative Research - ASE - (sim+dmc): Image-based Biophysical Modeling: Scalable Registration and Inversion Algorithms and Distributed Computing
ITR:协作研究 - ASE - (sim dmc):基于图像的生物物理建模:可扩展配准和反演算法以及分布式计算
  • 批准号:
    0849301
  • 财政年份:
    2007
  • 资助金额:
    $ 129.5万
  • 项目类别:
    Continuing Grant
ITR: Collaborative Research: Modeling and Display of Haptic Information for Enhanced Performance of Computer-Integrated Surgery
ITR:协作研究:触觉信息建模和显示,以提高计算机集成手术的性能
  • 批准号:
    0711040
  • 财政年份:
    2007
  • 资助金额:
    $ 129.5万
  • 项目类别:
    Standard Grant
Collaborative Research: ITR-(ASE)-(dmc): Overcoming Fractionation Errors in Cancer Treatement Planning
合作研究:ITR-(ASE)-(dmc):克服癌症治疗计划中的分割错误
  • 批准号:
    0749671
  • 财政年份:
    2006
  • 资助金额:
    $ 129.5万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了