Novel domain-specific languages and compiler optimization methods for computational biology
计算生物学的新颖的特定领域语言和编译器优化方法
基本信息
- 批准号:RGPIN-2019-04973
- 负责人:
- 金额:$ 2.04万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2019
- 资助国家:加拿大
- 起止时间:2019-01-01 至 2020-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Motivation: The vast scale of data generated by next-generation sequencing (NGS) experiments necessitates the development of efficient computational methods, as the computational aspect is currently the biggest bottleneck of NGS pipelines. However, many promising methods cannot handle the scale of current and emerging NGS technologies and are too hard to use and replicate. Root cause of this problem lies in widely used general-purpose development environments that cannot efficiently express and optimize biological data workflows. Users are forced to use either high-level but slow languages such as Python, or low-level languages such as C that produce efficient tools but at a significant time and maintainability costs. ***Approach: A domain-specific language (DSL) and associated suite of compiler optimization techniques specifically tailored for biological and sequencing data would provide flexibility, simplicity and modularity for experimenting with new computational algorithms, while generating high-performance code and making the programs portable from resource-constrained architectures to the biggest supercomputers.***Objectives: We propose a novel DSL and associated compiler named Seq that enables rapid and easy development of high-performance sequencing pipelines. To achieve this, we will: ***(i) design a programming language and a compiler that allows ease of development of high-level languages such as Python, while providing raw performance of low-level languages such as C (Objective 1);***(ii) explore data access patterns in genomic workflows and devise methods that can exploit these patterns at the compiler level for automatic low-level optimizations across various computational environments, such as multicore CPUs, GPUs and handheld devices (Objective 2); and ***(iii) provide means to easily integrate Seq into popular bioinformatics and scientific environments and develop a curated library of algorithmic primitives for NGS data (Objective 3). ***The short-term goal is to develop a DSL that can efficiently handle various kinds of NGS data on common architectures. The long-term goal is to build a comprehensive and widely used infrastructure that allows rapid and easy method development for biological data. As computational biology HQP are in high demand in Canada, one of the key goals of this proposal is to train HQP over the course of five years.***Impact: We envision our DSL to significantly boost Canadian genomics and health research by enabling researchers to express their ideas in a more natural way and by allowing them to use the best algorithmic methods for the job. Furthermore, we expect our DSL to aid large-scale scientific Canadian health projects by providing huge time and cost savings. We also anticipate Seq to become a key building block in the wide specter of widely used bioinformatics tools. Finally, we expect that HQP trained by this program will contribute to the Canadian knowledge-based economy.**
动机:下一代测序(NGS)实验产生的大量数据需要开发有效的计算方法,因为计算方面当前是NGS管道的最大瓶颈。但是,许多有前途的方法无法处理当前和新兴的NGS技术的规模,并且难以使用和复制。该问题的根本原因在于广泛使用的通用开发环境,这些环境无法有效地表达和优化生物数据工作流程。用户被迫使用高级但缓慢的语言,例如Python或低级语言,例如产生有效工具但在很大的时间和可维护性成本的C中。 ***方法:专门针对生物学和测序数据量身定制的特定领域的语言(DSL)和相关的编译器优化技术套件将提供灵活性,简单性和模块化,以使用新的计算算法进行新的计算算法,同时生成高表象的代码,同时从资源限制的构造架构中,并使命名为“最大相关的构造者”和“最大的超级”拟定者:Quepsive and Impercose Aripsive and the Excorpositive and the Excepoctive and the Implective and Improfiveists and Impocive and to *** All *** *** *** ***。 SEQ可以快速简便地开发高性能测序管道。 To achieve this, we will: ***(i) design a programming language and a compiler that allows ease of development of high-level languages such as Python, while providing raw performance of low-level languages such as C (Objective 1);***(ii) explore data access patterns in genomic workflows and devise methods that can exploit these patterns at the compiler level for automatic low-level optimizations across various computational environments, such as多核CPU,GPU和手持设备(目标2); ***(iii)提供了轻松将SEQ轻松整合到流行的生物信息学和科学环境中的手段,并为NGS数据开发了精选的算法原始库(目标3)。 ***短期目标是开发一个可以有效处理各种常见架构数据的DSL。长期目标是建立一个全面且广泛使用的基础架构,该基础设施允许生物学数据的快速简便方法开发。由于加拿大计算生物学HQP的需求量很高,因此该提案的主要目标之一是在五年的时间内培训HQP。此外,我们预计我们的DSL通过提供大量时间和成本节省来帮助加拿大科学的大规模科学健康项目。我们还希望SEQ成为广泛使用的生物信息学工具的广泛幽灵中的关键构建块。最后,我们希望通过该计划培训的HQP将有助于加拿大知识的经济。**
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Numanagic, Ibrahim其他文献
ORMAN: Optimal resolution of ambiguous RNA-Seq multimappings in the presence of novel isoforms
- DOI:
10.1093/bioinformatics/btt591 - 发表时间:
2014-03-01 - 期刊:
- 影响因子:5.8
- 作者:
Dao, Phuong;Numanagic, Ibrahim;Sahinalp, S. Cenk - 通讯作者:
Sahinalp, S. Cenk
Seq: A High-Performance Language for Bioinformatics
- DOI:
10.1145/3360551 - 发表时间:
2019-10-01 - 期刊:
- 影响因子:1.8
- 作者:
Shajii, Ariya;Numanagic, Ibrahim;Amarasinghe, Saman - 通讯作者:
Amarasinghe, Saman
Allelic decomposition and exact genotyping of highly polymorphic and structurally variant genes
- DOI:
10.1038/s41467-018-03273-1 - 发表时间:
2018-02-26 - 期刊:
- 影响因子:16.6
- 作者:
Numanagic, Ibrahim;Malikic, Salem;Sahinalp, S. Cenk - 通讯作者:
Sahinalp, S. Cenk
SCALCE: boosting sequence compression algorithms using locally consistent encoding
- DOI:
10.1093/bioinformatics/bts593 - 发表时间:
2012-12-01 - 期刊:
- 影响因子:5.8
- 作者:
Hach, Faraz;Numanagic, Ibrahim;Sahinalp, S. Cenk - 通讯作者:
Sahinalp, S. Cenk
Cypiripi: exact genotyping of CYP2D6 using high-throughput sequencing data
- DOI:
10.1093/bioinformatics/btv232 - 发表时间:
2015-06-15 - 期刊:
- 影响因子:5.8
- 作者:
Numanagic, Ibrahim;Malikic, Salem;Sahinalp, S. Cenk - 通讯作者:
Sahinalp, S. Cenk
Numanagic, Ibrahim的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Numanagic, Ibrahim', 18)}}的其他基金
Novel domain-specific languages and compiler optimization methods for computational biology
计算生物学的新颖的特定领域语言和编译器优化方法
- 批准号:
RGPIN-2019-04973 - 财政年份:2022
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
Novel domain-specific languages and compiler optimization methods for computational biology
计算生物学的新颖的特定领域语言和编译器优化方法
- 批准号:
RGPIN-2019-04973 - 财政年份:2021
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
Novel domain-specific languages and compiler optimization methods for computational biology
计算生物学的新颖的特定领域语言和编译器优化方法
- 批准号:
RGPIN-2019-04973 - 财政年份:2020
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
Novel domain-specific languages and compiler optimization methods for computational biology
计算生物学的新颖的特定领域语言和编译器优化方法
- 批准号:
DGECR-2019-00329 - 财政年份:2019
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Launch Supplement
Boosting compression of sequencing data using reordering
使用重新排序增强测序数据的压缩
- 批准号:
452424-2013 - 财政年份:2015
- 资助金额:
$ 2.04万 - 项目类别:
Vanier Canada Graduate Scholarship Tri-Council - Doctoral 3 years
Boosting compression of sequencing data using reordering
使用重新排序增强测序数据的压缩
- 批准号:
452424-2013 - 财政年份:2014
- 资助金额:
$ 2.04万 - 项目类别:
Vanier Canada Graduate Scholarship Tri-Council - Doctoral 3 years
相似国自然基金
深度神经网络可解释分析度量及视觉高风险领域应用研究
- 批准号:62372215
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
风格-内容-模型联合增强的单源领域泛化方法研究
- 批准号:62306008
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于物理启发领域泛化的跨装置等离子体破裂预测方法研究
- 批准号:12375219
- 批准年份:2023
- 资助金额:53 万元
- 项目类别:面上项目
面向多个关键领域的质子交换膜燃料电池智能优化方法
- 批准号:52367024
- 批准年份:2023
- 资助金额:31 万元
- 项目类别:地区科学基金项目
战略与管理研究类:工程与材料领域工业软件共性平台发展战略研究
- 批准号:52342301
- 批准年份:2023
- 资助金额:30.00 万元
- 项目类别:专项项目
相似海外基金
Popliteal Pterygium syndrome, IRf6, and the periderm
腘胬肉综合征、IRf6 和周皮
- 批准号:
10727050 - 财政年份:2023
- 资助金额:
$ 2.04万 - 项目类别:
Evaluation of a specific LXR/PPAR agonist for treatment of Alzheimer's disease
特定 LXR/PPAR 激动剂治疗阿尔茨海默病的评估
- 批准号:
10578068 - 财政年份:2023
- 资助金额:
$ 2.04万 - 项目类别:
Chromatin regulators of stemness and therapy resistance in rhabdomyosarcoma
横纹肌肉瘤干性和治疗耐药性的染色质调节因子
- 批准号:
10622041 - 财政年份:2023
- 资助金额:
$ 2.04万 - 项目类别:
H2AJ as a regulator of placental senescence and genome organization
H2AJ 作为胎盘衰老和基因组组织的调节剂
- 批准号:
10677156 - 财政年份:2023
- 资助金额:
$ 2.04万 - 项目类别:
Engineered T cell-based imaging for glioblastoma and CAR-T cell tracking
基于工程 T 细胞的胶质母细胞瘤成像和 CAR-T 细胞追踪
- 批准号:
10826124 - 财政年份:2023
- 资助金额:
$ 2.04万 - 项目类别: