Development of a graph-theoretic approach to predict protein function by integrating large scale heterogeneous data

开发通过整合大规模异质数据来预测蛋白质功能的图论方法

基本信息

  • 批准号:
    BB/F00964X/1
  • 负责人:
  • 金额:
    $ 53.49万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2008
  • 资助国家:
    英国
  • 起止时间:
    2008 至 无数据
  • 项目状态:
    已结题

项目摘要

The list of organisms with completed genome sequence is continuously growing and this has led to the identification of thousands of genes whose function is still unknown. These genes could potentially be involved in important biological cell functions and could represent important targets for diagnostic and pharmacogenomics studies and be of industrial and agronomical importance. A major undertaking for biology is therefore that of identifying the function of these uncharacterized genes on a genomic scale. The challenge for bioinformatics is then to devise algorithmic methods that, given a gene, can predict a hypothesis for its function that can then be validated by wet-lab assays. Luckily, new experimental techniques have become available, producing data which offer clues about protein function and can therefore be employed for function prediction, e.g. protein interaction data, gene expression data. Some experimental and computational data have a natural representation as networks (e.g. protein interaction data), others are inherently 'one-dimensional' (e.g. sequence patterns). Three facts have recently become clear: while each data type contains important information that can help in determining the function of a protein, no single data type by itself suffices; large-scale functional inference greatly improves by integrating evidence from different sources; for those data types which can be represented as networks, the best results are obtained by algorithms that take advantage of the networks' topologies. So far, methods that make functional inferences on networks are very limited in the type of data they can integrate, while methods that can integrate a greater variety of data do not take advantage of the networks' topologies. I intend to investigate a general method that can integrate essentially any data type currently available taking into account its intrinsic structure: it takes advantage of the graph topology for network data, and it can integrate this evidence together with one-dimensional information. I shall develop graph-theoretical methods that use the diffusion of information over graphs to generate functional evidence from network data. This evidence is then combined with other one-dimensional information using machine learning techniques. The strength of the methodology lies in its ability to use diverse sets of noisy data, and to combine them to obtain sound statistical inferences; the weak signals contained in each dataset is enhanced by integrating the data. The methodology will be first developed on Yeast, and I shall then transfer this approach to higher organisms such as C. elegans, D. melanogaster, A. thaliana, and H. sapiens. For all these organisms the performance of the algorithms will then be evaluated 'in silico' by means of test sets; that is I shall verify the accuracy of the methods at predicting the function for genes whose annotation is known. The approach will then be tested 'in vivo' on a sub-network of genes that form signalling pathways (MAPK signalling) and function to transmit information from receptors to gene expression. MAPK pathway components are highly diversified in the model plant, Arabidopsis thaliana, with 123 components. For many of these we do not know how they connect up and what their biological functions are. These will be predicted by the algorithms and then functionally tested by silencing their expression using RNA interference and in mutant lines. I shall also design and implement stand-alone and web-based software tools incorporating the algorithms developed. The applications will enable the biologist to easily apply the algorithms through a user-friendly interface; to visualize the relevant biological networks thus making the inference process transparent and providing an explanation for the functional annotation predicted by the system. A web tool will also be created. All these tools will be made freely available to the scientific community.
The list of organisms with completed genome sequence is continuously growing and this has led to the identification of thousands of genes whose function is still unknown. These genes could potentially be involved in important biological cell functions and could represent important targets for diagnostic and pharmacogenomics studies and be of industrial and agronomical importance. A major undertaking for biology is therefore that of identifying the function of these uncharacterized genes on a genomic scale. The challenge for bioinformatics is then to devise algorithmic methods that, given a gene, can predict a hypothesis for its function that can then be validated by wet-lab assays. Luckily, new experimental techniques have become available, producing data which offer clues about protein function and can therefore be employed for function prediction, e.g. protein interaction data, gene expression data. Some experimental and computational data have a natural representation as networks (e.g. protein interaction data), others are inherently 'one-dimensional' (e.g. sequence patterns). Three facts have recently become clear: while each data type contains important information that can help in determining the function of a protein, no single data type by itself suffices; large-scale functional inference greatly improves by integrating evidence from different sources; for those data types which can be represented as networks, the best results are obtained by algorithms that take advantage of the networks' topologies. So far, methods that make functional inferences on networks are very limited in the type of data they can integrate, while methods that can integrate a greater variety of data do not take advantage of the networks' topologies. I intend to investigate a general method that can integrate essentially any data type currently available taking into account its intrinsic structure: it takes advantage of the graph topology for network data, and it can integrate this evidence together with one-dimensional information. I shall develop graph-theoretical methods that use the diffusion of information over graphs to generate functional evidence from network data. This evidence is then combined with other one-dimensional information using machine learning techniques. The strength of the methodology lies in its ability to use diverse sets of noisy data, and to combine them to obtain sound statistical inferences; the weak signals contained in each dataset is enhanced by integrating the data. The methodology will be first developed on Yeast, and I shall then transfer this approach to higher organisms such as C. elegans, D. melanogaster, A. thaliana, and H. sapiens. For all these organisms the performance of the algorithms will then be evaluated 'in silico' by means of test sets; that is I shall verify the accuracy of the methods at predicting the function for genes whose annotation is known. The approach will then be tested 'in vivo' on a sub-network of genes that form signalling pathways (MAPK signalling) and function to transmit information from receptors to gene expression. MAPK pathway components are highly diversified in the model plant, Arabidopsis thaliana, with 123 components. For many of these we do not know how they connect up and what their biological functions are. These will be predicted by the algorithms and then functionally tested by silencing their expression using RNA interference and in mutant lines. I shall also design and implement stand-alone and web-based software tools incorporating the algorithms developed. The applications will enable the biologist to easily apply the algorithms through a user-friendly interface; to visualize the relevant biological networks thus making the inference process transparent and providing an explanation for the functional annotation predicted by the system. A web tool will also be created. All these tools will be made freely available to the scientific community.

项目成果

期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
A method for comparing multiple imputation techniques: A case study on the U.S. national COVID cohort collaborative.
  • DOI:
    10.1016/j.jbi.2023.104295
  • 发表时间:
    2023-03
  • 期刊:
  • 影响因子:
    4.5
  • 作者:
    Casiraghi, Elena;Wong, Rachel;Hall, Margaret;Coleman, Ben;Notaro, Marco;Evans, Michael D.;Tronieri, Jena S.;Blau, Hannah;Laraway, Bryan;Callahan, Tiffany J.;Chan, Lauren E.;Bramante, Carolyn T.;Buse, John B.;Moffitt, Richard A.;Sturmer, Til;Johnson, Steven G.;Shao, Yu Raymond;Reese, Justin;Robinson, Peter N.;Paccanaro, Alberto;Valentini, Giorgio;Huling, Jared D.;Wilkins, Kenneth J.
  • 通讯作者:
    Wilkins, Kenneth J.
Combining interactomes from multiple organisms: A case study on human-mouse
结合多种生物体的相互作用组:人鼠案例研究
  • DOI:
    10.1109/clei.2016.7833324
  • 发表时间:
    2016
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Caceres J
  • 通讯作者:
    Caceres J
Additional file 1 of LUMI-PCR: an Illumina platform ligation-mediated PCR protocol for integration site cloning, provides molecular quantitation of integration sites
LUMI-PCR 的附加文件 1:用于整合位点克隆的 Illumina 平台连接介导的 PCR 方案,提供整合位点的分子定量
  • DOI:
    10.6084/m9.figshare.11805027
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Dawes J
  • 通讯作者:
    Dawes J
LanDis: the disease landscape explorer
  • DOI:
    10.1038/s41431-023-01511-9
  • 发表时间:
    2024-01-10
  • 期刊:
  • 影响因子:
    5.2
  • 作者:
    Caniza,Horacio;Caceres,Juan J.;Paccanaro,Alberto
  • 通讯作者:
    Paccanaro,Alberto
A network medicine approach to quantify distance between hereditary disease modules on the interactome.
  • DOI:
    10.1038/srep17658
  • 发表时间:
    2015-12-03
  • 期刊:
  • 影响因子:
    4.6
  • 作者:
    Caniza H;Romero AE;Paccanaro A
  • 通讯作者:
    Paccanaro A
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Alberto Paccanaro其他文献

Spectral clustering of protein sequences
蛋白质序列的光谱聚类
Subclonal mutation selection in mouse lymphomagenesis identifies known cancer loci and suggests novel candidates
小鼠淋巴瘤发生中的亚克隆突变选择确定了已知的癌症位点并提出了新的候选基因
  • DOI:
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    16.6
  • 作者:
    Philip Webster;Joanna C. Dawes;H. Dewchand;K. Takács;Barbara Iadarola;B. J. Bolt;Juan J. Caceres;Jakub Kaczor;G. Dharmalingam;Marian H. Dore;L. Game;Thomas Adejumo;James Elliott;K. Naresh;Mohammad M. Karimi;Katerina Rekopoulou;Ge Tan;Alberto Paccanaro;A. Uren
  • 通讯作者:
    A. Uren
Inferring protein-protein interactions using interaction network topologies
使用相互作用网络拓扑推断蛋白质-蛋白质相互作用

Alberto Paccanaro的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Alberto Paccanaro', 18)}}的其他基金

A GPU-based high performance system for discovering consensus domain architecture and functional annotation of protein families
基于 GPU 的高性能系统,用于发现蛋白质家族的共识域架构和功能注释
  • 批准号:
    BB/K004131/1
  • 财政年份:
    2012
  • 资助金额:
    $ 53.49万
  • 项目类别:
    Research Grant

相似国自然基金

基于Graph-PINN的层结稳定度参数化建模与沙尘跨介质耦合传输模拟研
  • 批准号:
  • 批准年份:
    2025
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
平面三角剖分flip graph的强凸性研究
  • 批准号:
    12301432
  • 批准年份:
    2023
  • 资助金额:
    30.00 万元
  • 项目类别:
    青年科学基金项目
基于graph的多对比度磁共振图像重建方法
  • 批准号:
    61901188
  • 批准年份:
    2019
  • 资助金额:
    24.5 万元
  • 项目类别:
    青年科学基金项目
基于de bruijn graph梳理的宏基因组拼接算法开发
  • 批准号:
    61771009
  • 批准年份:
    2017
  • 资助金额:
    50.0 万元
  • 项目类别:
    面上项目
基于Graph和ISA的红外目标分割与识别方法研究
  • 批准号:
    61101246
  • 批准年份:
    2011
  • 资助金额:
    22.0 万元
  • 项目类别:
    青年科学基金项目
固定参数可解算法在平面图问题的应用以及和整数线性规划的关系
  • 批准号:
    60973026
  • 批准年份:
    2009
  • 资助金额:
    32.0 万元
  • 项目类别:
    面上项目
图的一般染色数与博弈染色数
  • 批准号:
    10771035
  • 批准年份:
    2007
  • 资助金额:
    18.0 万元
  • 项目类别:
    面上项目
中国Web Graph的挖掘与应用研究
  • 批准号:
    60473122
  • 批准年份:
    2004
  • 资助金额:
    23.0 万元
  • 项目类别:
    面上项目
组合设计及其大集
  • 批准号:
    10371031
  • 批准年份:
    2003
  • 资助金额:
    20.0 万元
  • 项目类别:
    面上项目

相似海外基金

Support for Safe Driving Using Graph Theoretic Functional Brain Network Analysis and Non-invasive Brain Stimulation
使用图论功能脑网络分析和非侵入性脑刺激支持安全驾驶
  • 批准号:
    23KJ1643
  • 财政年份:
    2023
  • 资助金额:
    $ 53.49万
  • 项目类别:
    Grant-in-Aid for JSPS Fellows
Artificial Intelligence and Network Science: Solution Concepts, Graph-Theoretic Characterizations, and Their Societal Aspects
人工智能和网络科学:解决方案概念、图论特征及其社会方面
  • 批准号:
    RGPIN-2019-04904
  • 财政年份:
    2022
  • 资助金额:
    $ 53.49万
  • 项目类别:
    Discovery Grants Program - Individual
Collaborative Research: Towards a Theoretic Foundation for Optimal Deep Graph Learning
协作研究:为最优深度图学习奠定理论基础
  • 批准号:
    2134081
  • 财政年份:
    2022
  • 资助金额:
    $ 53.49万
  • 项目类别:
    Continuing Grant
Collaborative Research: Towards a Theoretic Foundation for Optimal Deep Graph Learning
协作研究:为最优深度图学习奠定理论基础
  • 批准号:
    2134079
  • 财政年份:
    2022
  • 资助金额:
    $ 53.49万
  • 项目类别:
    Continuing Grant
Collaborative Research: Towards a Theoretic Foundation for Optimal Deep Graph Learning
协作研究:为最优深度图学习奠定理论基础
  • 批准号:
    2134080
  • 财政年份:
    2022
  • 资助金额:
    $ 53.49万
  • 项目类别:
    Continuing Grant
Artificial Intelligence and Network Science: Solution Concepts, Graph-Theoretic Characterizations, and Their Societal Aspects
人工智能和网络科学:解决方案概念、图论特征及其社会方面
  • 批准号:
    RGPIN-2019-04904
  • 财政年份:
    2021
  • 资助金额:
    $ 53.49万
  • 项目类别:
    Discovery Grants Program - Individual
Artificial Intelligence and Network Science: Solution Concepts, Graph-Theoretic Characterizations, and Their Societal Aspects
人工智能和网络科学:解决方案概念、图论特征及其社会方面
  • 批准号:
    RGPIN-2019-04904
  • 财政年份:
    2020
  • 资助金额:
    $ 53.49万
  • 项目类别:
    Discovery Grants Program - Individual
Artificial Intelligence and Network Science: Solution Concepts, Graph-Theoretic Characterizations, and Their Societal Aspects
人工智能和网络科学:解决方案概念、图论特征及其社会方面
  • 批准号:
    RGPIN-2019-04904
  • 财政年份:
    2019
  • 资助金额:
    $ 53.49万
  • 项目类别:
    Discovery Grants Program - Individual
Estimation of neural interactions by information-geometric and graph-theoretic approaches
通过信息几何和图论方法估计神经相互作用
  • 批准号:
    541558-2019
  • 财政年份:
    2019
  • 资助金额:
    $ 53.49万
  • 项目类别:
    University Undergraduate Student Research Awards
An operator-theoretic approach to graph rigidity
图刚性的算子理论方法
  • 批准号:
    EP/S00940X/1
  • 财政年份:
    2019
  • 资助金额:
    $ 53.49万
  • 项目类别:
    Research Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了