AF: Medium: Collaborative Research: Sequential and Parallel Algorithms for Approximate Sequence Matching with Applications to Computational Biology
AF:媒介:协作研究:近似序列匹配的顺序和并行算法及其在计算生物学中的应用
基本信息
- 批准号:1703489
- 负责人:
- 金额:$ 29万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-07-01 至 2021-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Sequence matching problems are central to the field of genomics, both in analyzing naturally occurring sequences such as genomes and in analyzing data from sequencing instruments. Often, methods that can accommodate a small number of differences within the matching regions suffice in practice. Such methods, described as alignment-free or approximate sequence matching methods, have typically relied on heuristics. This project work is advancing the field by creating a mathematical framework and solving multiple approximate sequence matching problems with provably efficient run-time guarantees. Project work is also supporting the development of practical heuristics inspired and supported by the mathematical framework, development of parallel methods for solving large-scale problems on high performance parallel computers, and studying the impact of these methods on important applications. Project results are made available through open source software for use by practitioners. Results from this research will be incorporated into courses taught by the PIs, and disseminated more broadly through book chapters and tutorials and accompanying slides. The project will support research scientist and Ph.D. students in interdisciplinary training for launching them into productive careers focused on important problems of current relevance. Undergraduate participation is planned through course projects.Project work builds upon recent progress in alignment-free genome comparison methods, and exploits the controlled error characteristics of data generated by high-throughput sequencers, and the many bioinformatics applications enabled by them. Project objectives include developing a robust algorithmic framework for designing newer alignment-free methods based on approximate substring composition, and developing sequential and parallel algorithms for pairwise approximate sequence matching among large sequence data sets. The goal is to develop algorithms that are asymptotically superior to quadratic alignment-based approaches, and achieve good practical performance either directly or through further development of practical heuristic that rely on the underlying theory. The developed techniques will be further investigated in the context of important applications such as read error correction, genome mapping, and assembly. Though conducted in the context of computational biology, some of the methods are potentially applicable to other areas such as text processing and information retrieval. Broader research community will be impacted through release of software modules and project work in important application areas.
序列匹配问题是基因组学领域的核心,无论是在分析天然存在的序列,如基因组,并在分析来自测序仪器的数据。通常,可以在匹配区域内容纳少量差异的方法在实践中就足够了。这种方法,被描述为无干扰或近似序列匹配方法,通常依赖于免疫学。这个项目的工作是通过创建一个数学框架和解决多个近似序列匹配问题,可证明有效的运行时保证推进该领域。项目工作还支持由数学框架启发和支持的实用数学的发展,在高性能并行计算机上解决大规模问题的并行方法的发展,以及研究这些方法对重要应用的影响。项目成果通过开放源码软件提供给从业人员使用。这项研究的结果将被纳入PI教授的课程,并通过书籍章节和教程以及附带的幻灯片更广泛地传播。该项目将支持研究科学家和博士。参加跨学科培训以使学生进入生产性职业的学生侧重于当前相关的重要问题。本科生的参与计划通过课程项目。项目工作建立在无干扰基因组比较方法的最新进展,并利用高通量测序仪产生的数据的受控错误特性,以及它们所支持的许多生物信息学应用。项目目标包括开发一个强大的算法框架,用于设计基于近似子串组成的新的无约束方法,以及开发大型序列数据集之间成对近似序列匹配的顺序和并行算法。我们的目标是开发算法,渐近上级二次启发式的方法,并直接或通过进一步发展的实际启发式依赖于底层理论,实现良好的实际性能。开发的技术将在重要应用的背景下进一步研究,如读取错误校正,基因组作图和组装。虽然是在计算生物学的背景下进行的,但其中一些方法可能适用于其他领域,如文本处理和信息检索。通过发布软件模块和重要应用领域的项目工作,更广泛的研究社区将受到影响。
项目成果
期刊论文数量(17)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
The Heaviest Induced Ancestors Problem: Better Data Structures and Applications
最严重的诱发祖先问题:更好的数据结构和应用程序
- DOI:10.1007/s00453-022-00955-7
- 发表时间:2022
- 期刊:
- 影响因子:1.1
- 作者:Abedin, Paniz;Hooshmand, Sahar;Ganguly, Arnab;Thankachan, Sharma V.
- 通讯作者:Thankachan, Sharma V.
On the Complexity of Recognizing Wheeler Graphs
论识别惠勒图的复杂性
- DOI:10.1007/s00453-021-00917-5
- 发表时间:2022
- 期刊:
- 影响因子:1.1
- 作者:Gibney, Daniel;Thankachan, Sharma V.
- 通讯作者:Thankachan, Sharma V.
A Linear-Space Data Structure for Range-LCP Queries in Poly-Logarithmic Time
多对数时间内范围LCP查询的线性空间数据结构
- DOI:10.1007/978-3-319-94776
- 发表时间:2018
- 期刊:
- 影响因子:0
- 作者:Abedin, P.;Ganguly, A.;Hon, W. K.;Nekrich, Y.;Sadakane, K.;Shah, R.;Thankachan, S. V.
- 通讯作者:Thankachan, S. V.
The Heaviest Induced Ancestors Problem Revisited
重温最重的诱发祖先问题
- DOI:10.4230/lipics.cpm.2018.20
- 发表时间:2018
- 期刊:
- 影响因子:0
- 作者:Abedin, P.;Hooshmand, S.;Ganguly, A.;Thankachan, S.V.
- 通讯作者:Thankachan, S.V.
On the Complexity of BWT-runs Minimization via Alphabet Reordering
- DOI:10.4230/lipics.esa.2020.15
- 发表时间:2019-11
- 期刊:
- 影响因子:0
- 作者:Jason Bentley;Daniel Gibney;Sharma V. Thankachan
- 通讯作者:Jason Bentley;Daniel Gibney;Sharma V. Thankachan
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Sharma Thankachan其他文献
Sharma Thankachan的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Sharma Thankachan', 18)}}的其他基金
REU Site: Algorithm Design --- Theory and Engineering
REU网站:算法设计---理论与工程
- 批准号:
2349179 - 财政年份:2024
- 资助金额:
$ 29万 - 项目类别:
Standard Grant
AF: Small: Theoretical Aspects of Repetition-Aware Text Compression and Indexing
AF:小:重复感知文本压缩和索引的理论方面
- 批准号:
2315822 - 财政年份:2023
- 资助金额:
$ 29万 - 项目类别:
Standard Grant
CAREER: Algorithmic Aspects of Pan-genomic Data Modeling, Indexing and Querying
职业:泛基因组数据建模、索引和查询的算法方面
- 批准号:
2316691 - 财政年份:2023
- 资助金额:
$ 29万 - 项目类别:
Continuing Grant
CAREER: Algorithmic Aspects of Pan-genomic Data Modeling, Indexing and Querying
职业:泛基因组数据建模、索引和查询的算法方面
- 批准号:
2146003 - 财政年份:2022
- 资助金额:
$ 29万 - 项目类别:
Continuing Grant
AF: Small: Theoretical Aspects of Repetition-Aware Text Compression and Indexing
AF:小:重复感知文本压缩和索引的理论方面
- 批准号:
2112643 - 财政年份:2021
- 资助金额:
$ 29万 - 项目类别:
Standard Grant
NSF Student Travel Grant for Workshop on String Algorithms in Bioinformatics (StringBio), 2019
NSF 学生生物信息学字符串算法研讨会 (StringBio) 旅行补助金,2019
- 批准号:
1946289 - 财政年份:2019
- 资助金额:
$ 29万 - 项目类别:
Standard Grant
NSF Student Travel Grant for 2018 International Workshop on String Algorithms in Bioinformatics (StringBio)
NSF 学生旅费资助 2018 年生物信息学字符串算法国际研讨会 (StringBio)
- 批准号:
1849136 - 财政年份:2018
- 资助金额:
$ 29万 - 项目类别:
Standard Grant
相似海外基金
Collaborative Research: AF: Medium: The Communication Cost of Distributed Computation
合作研究:AF:媒介:分布式计算的通信成本
- 批准号:
2402836 - 财政年份:2024
- 资助金额:
$ 29万 - 项目类别:
Continuing Grant
Collaborative Research: AF: Medium: Foundations of Oblivious Reconfigurable Networks
合作研究:AF:媒介:遗忘可重构网络的基础
- 批准号:
2402851 - 财政年份:2024
- 资助金额:
$ 29万 - 项目类别:
Continuing Grant
Collaborative Research: AF: Medium: Algorithms Meet Machine Learning: Mitigating Uncertainty in Optimization
协作研究:AF:媒介:算法遇见机器学习:减轻优化中的不确定性
- 批准号:
2422926 - 财政年份:2024
- 资助金额:
$ 29万 - 项目类别:
Continuing Grant
Collaborative Research: AF: Medium: Fast Combinatorial Algorithms for (Dynamic) Matchings and Shortest Paths
合作研究:AF:中:(动态)匹配和最短路径的快速组合算法
- 批准号:
2402283 - 财政年份:2024
- 资助金额:
$ 29万 - 项目类别:
Continuing Grant
Collaborative Research: AF: Medium: Foundations of Oblivious Reconfigurable Networks
合作研究:AF:媒介:遗忘可重构网络的基础
- 批准号:
2402852 - 财政年份:2024
- 资助金额:
$ 29万 - 项目类别:
Continuing Grant
Collaborative Research: AF: Medium: Fast Combinatorial Algorithms for (Dynamic) Matchings and Shortest Paths
合作研究:AF:中:(动态)匹配和最短路径的快速组合算法
- 批准号:
2402284 - 财政年份:2024
- 资助金额:
$ 29万 - 项目类别:
Continuing Grant
Collaborative Research: AF: Medium: The Communication Cost of Distributed Computation
合作研究:AF:媒介:分布式计算的通信成本
- 批准号:
2402837 - 财政年份:2024
- 资助金额:
$ 29万 - 项目类别:
Continuing Grant
Collaborative Research: AF: Medium: The Communication Cost of Distributed Computation
合作研究:AF:媒介:分布式计算的通信成本
- 批准号:
2402835 - 财政年份:2024
- 资助金额:
$ 29万 - 项目类别:
Continuing Grant
Collaborative Research: AF: Medium: Adventures in Flatland: Algorithms for Modern Memories
合作研究:AF:媒介:平地历险记:现代记忆算法
- 批准号:
2423105 - 财政年份:2024
- 资助金额:
$ 29万 - 项目类别:
Continuing Grant
Collaborative Research: AF: Medium: Sketching for privacy and privacy for sketching
合作研究:AF:中:为隐私而素描和为素描而隐私
- 批准号:
2311649 - 财政年份:2023
- 资助金额:
$ 29万 - 项目类别:
Continuing Grant