Efficient Algorithms for Motif Search

用于基序搜索的高效算法

基本信息

项目摘要

DESCRIPTION (provided by applicant): Multiple genome projects have generated large volumes of DNA, RNA and protein sequence data. While computational pattern searching techniques such as BLAST (a program for measuring sequence similarities) have enabled major discoveries such as the modularity of proteins, much more information can be gained from biological sequence data. However, due to the length and complexity of the patterns, we are limited by the computational algorithms used for motif discovery. Simple Motif Search (SMS), Planted Motif Search (PMS) and Edit-distance Motif Search (EMS) are the three principal paradigms that have been previously used for identifying short functional peptide motifs, transcriptional regulatory elements, composite regulatory patterns, DNA motifs, similarity between families of proteins, etc. Our group has been instrumental in developing algorithms for these problems. Existing pattern-search algorithms for motif search have two major shortcomings: 1) Approximate algorithms do not always identify the correct pattern, but have the advantage that they can be used to look for short and relatively large patterns in large data sets such as genomes. 2) In contrast, an exact algorithm always identifies the correct pattern, but cannot be used to identify complex data patterns in large datasets. To extract more sophisticated patterns from genomic data we need exact algorithms that can be used to analyze genomes for complex patterns with reasonable computational resources. Exact algorithms are currently limited because the run times of these algorithms are exponentially dependent on the parameters involved. For example, the currently best known algorithms for PMS and EMS (on a PC) are expected to take more than a month for patterns of length 27 and more than 5.67 years for patterns of length 31. In this project, we propose to develop the next generation SMS, PMS and EMS algorithms that identify more complex patterns in larger datasets using less computation time and memory. We also propose to develop a web based system that will incorporate PMS and EMS algorithms. Open source versions of all the algorithms and data developed will be made available to users. In addition, the web system will support online processing of queries that involve the solution of PMS and EMS.
描述(申请人提供):多个基因组计划已经产生了大量的DNA、RNA和蛋白质序列数据。虽然像BLAST(一种测量序列相似性的程序)这样的计算模式搜索技术已经使蛋白质的模块化等重大发现成为可能,但从生物序列数据中可以获得更多的信息。然而,由于模式的长度和复杂性,我们受到用于基序发现的计算算法的限制。简单基序搜索(SMS)、种植基序搜索(PMS)和编辑距离基序搜索(EMS)是以前用于识别短功能肽基序、转录调控元件、复合调控模式、DNA基序、蛋白质家族之间的相似性等三种主要范式。我们的团队为解决这些问题开发了算法。 现有的模式搜索算法有两个主要的缺点:1)近似算法并不总是识别正确的模式,但它们的优点是可以在基因组等大型数据集中寻找较短和相对较大的模式。2)相反,精确的算法总是识别正确的模式,但不能用于识别大型数据集中的复杂数据模式。为了从基因组数据中提取更复杂的模式,我们需要能够用合理的计算资源来分析复杂模式的精确算法。精确的算法目前是有限的,因为这些算法的运行时间以指数方式依赖于所涉及的参数。例如,目前最著名的PMS和EMS算法(在PC上)预计对于长度为27的图案需要一个多月的时间,而对于长度为31的图案则需要超过5.67年的时间。在这个项目中,我们建议开发下一代SMS、PMS和EMS算法,这些算法使用更少的计算时间和内存在更大的数据集中识别更复杂的模式。我们还建议开发一个基于Web的系统,该系统将结合PMS和EMS算法。所有开发的算法和数据的开源版本将向用户提供。此外,网络系统将支持在线处理涉及PMS和EMS解决方案的查询。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

SANGUTHEVAR RAJASEKARAN其他文献

SANGUTHEVAR RAJASEKARAN的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('SANGUTHEVAR RAJASEKARAN', 18)}}的其他基金

Efficient Algorithms for Motif Search
用于基序搜索的高效算法
  • 批准号:
    8142235
  • 财政年份:
    2010
  • 资助金额:
    $ 34.76万
  • 项目类别:
Efficient Algorithms for Motif Search
用于基序搜索的高效算法
  • 批准号:
    7878215
  • 财政年份:
    2010
  • 资助金额:
    $ 34.76万
  • 项目类别:

相似海外基金

CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
  • 批准号:
    2337776
  • 财政年份:
    2024
  • 资助金额:
    $ 34.76万
  • 项目类别:
    Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
  • 批准号:
    2338816
  • 财政年份:
    2024
  • 资助金额:
    $ 34.76万
  • 项目类别:
    Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
  • 批准号:
    2338846
  • 财政年份:
    2024
  • 资助金额:
    $ 34.76万
  • 项目类别:
    Continuing Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
  • 批准号:
    2348261
  • 财政年份:
    2024
  • 资助金额:
    $ 34.76万
  • 项目类别:
    Standard Grant
CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
  • 批准号:
    2348346
  • 财政年份:
    2024
  • 资助金额:
    $ 34.76万
  • 项目类别:
    Standard Grant
CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms
CRII:CSR:从布隆过滤器到降噪流算法
  • 批准号:
    2348457
  • 财政年份:
    2024
  • 资助金额:
    $ 34.76万
  • 项目类别:
    Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
  • 批准号:
    2404989
  • 财政年份:
    2024
  • 资助金额:
    $ 34.76万
  • 项目类别:
    Standard Grant
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
  • 批准号:
    2339310
  • 财政年份:
    2024
  • 资助金额:
    $ 34.76万
  • 项目类别:
    Continuing Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
  • 批准号:
    2339669
  • 财政年份:
    2024
  • 资助金额:
    $ 34.76万
  • 项目类别:
    Continuing Grant
DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
  • 批准号:
    EP/Y029089/1
  • 财政年份:
    2024
  • 资助金额:
    $ 34.76万
  • 项目类别:
    Research Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了