CAREER: Algorithms and Tools for Allele-Specific Transcript Assembly
职业:等位基因特异性转录本组装的算法和工具
基本信息
- 批准号:2145171
- 负责人:
- 金额:$ 74.99万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-07-01 至 2027-06-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
This award is funded in whole or in part under the American Rescue Plan Act of 2021 (Public Law 117-2).Many organisms, including humans, have two sets of chromosomes, one from the mother and one from the father, that exist as homologous pairs. It is commonly observed that, the two distinct genes (or alleles) at the same location of two homologous chromosomes, may produce imbalanced gene products (i.e., mRNAs). This phenomenon is called allele-specific expression (ASE). ASE has been known to be closely related to multiple phenotypes and can contribute to cancer susceptibility. ASEs offer an important source of biomarkers that could be potentially used as phenotypic biomarkers or for disease diagnosis. Additionally, ASE analysis serves as a powerful analytical tool to determine expression quantitative trait locus (eQTL) and to study a variety of biological processes such as imprinting, protein-truncating variants, and X-chromosome inactivation. The recently established RNA-sequencing technology (RNA-seq) provides an accurate and efficient way to quantitatively measure ASE. However, the sequencing reads generated from current technologies are not full-length. Hence, computational methods are needed to reconstruct the full-length mRNAs expressed from the two different alleles that exist on homologous chromosomes, a problem referred to as allele-specific transcript assembly. Allele-specific transcript assembly is exceedingly difficult. Allel-specific transcript assembly is difficult because it requires simultaneously threading mutations and splice junctions while inferring unknown number of full-length transcripts and their abundances. This project aims to develop accurate allele-specific transcript assembly methods that are applicable to short-reads, long-reads, and single-cell RNA-seq data. Specifically, these investigators first tackle how to use phased SNPs (i.e., mutations) to improve allele-specific assembly. they show that, phased SNPs can be equivalently represented as incompatible pairs of vertices in a so-called variant splice graph. Then heuristics are proposed to solve a formulation with incompatible pairs included. Long-range information will be used in paired-/multi-end RNA-seq data to improve allele-specific assembly, with an algorithm that decomposes the variant splice graph into paths while fully preserving the paired-/multi-end constraints. The allele-specific assembly will be solved in the presence of structure variations, a crucial scenario in studying cancer. A new data structure will model SNPs, alternative splicing, and structure variations all together. A new algorithms is also proposed to identify allele-specific structure variations, which is of independent interests but also leads to a two-step algorithm for allele-specific assembly. Covariate-adaptive multiple hypothesis testing will control false positive rates. Implementing these algorithms result in accurate allele-specific transcript assemblers for various types of data and a new ASE analysis pipeline for broader use. The proposed research is well integrated with educational activities. High-school curricula will be developed that focus on using graph structure--a key abstract in mathematics and computer science--to model biological data. High school teachers will be provided opportunities to conduct interdisciplinary research related to transcript assembly. New undergraduate course will be developed with a focus to enhance students’ ability in modeling and solving real-world problems. Efforts will also be made to engage undergraduates and graduate students from underrepresented groups in research opportunities. The results of the project can be found at the PI’s website: https://sites.psu.edu/mxs2589.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
该奖项全部或部分由《2021年美国救援计划法案》(公法117-2)资助。许多生物体,包括人类,都有两套染色体,一套来自母亲,一套来自父亲,它们以同源对的形式存在。通常观察到,在两个同源染色体的相同位置处的两个不同基因(或等位基因)可能产生不平衡的基因产物(即,mRNA)。这种现象被称为等位基因特异性表达(ASE)。已知ASE与多种表型密切相关,并可能导致癌症易感性。ASE提供了一个重要的生物标志物来源,可以潜在地用作表型生物标志物或用于疾病诊断。此外,ASE分析作为一个强大的分析工具,以确定表达数量性状位点(eQTL)和研究各种生物学过程,如印记,蛋白质截短变异,和X染色体失活。最近建立的RNA测序技术(RNA-seq)提供了一种准确和有效的方法来定量测量ASE。然而,由当前技术产生的测序读段不是全长的。因此,需要计算方法来重建从同源染色体上存在的两个不同等位基因表达的全长mRNA,这是一个被称为等位基因特异性转录物组装的问题。等位基因特异性转录本组装极其困难。等位基因特异性转录本组装是困难的,因为它需要同时线程突变和剪接点,同时推断未知数量的全长转录本和它们的丰度。该项目旨在开发适用于短读段、长读段和单细胞RNA-seq数据的精确等位基因特异性转录本组装方法。具体来说,这些研究人员首先解决如何使用分阶段SNP(即,突变)以改善等位基因特异性组装。他们表明,定相的SNP可以等效地表示为所谓的变体剪接图中的不相容的顶点对。然后提出了求解包含不相容对的公式的方法。远程信息将用于配对/多末端RNA-seq数据中,以改善等位基因特异性组装,其算法将变体剪接图分解为路径,同时完全保留配对/多末端约束。等位基因特异性组装将在存在结构变异的情况下得到解决,这是研究癌症的关键场景。一种新的数据结构将对SNP、选择性剪接和结构变异进行建模。还提出了一种新的算法来识别等位基因特异性的结构变化,这是独立的利益,但也导致等位基因特异性组装的两步算法。协变量自适应多重假设检验将控制假阳性率。实现这些算法的结果在准确的等位基因特异性转录汇编为各种类型的数据和一个新的ASE分析管道更广泛的使用。拟议的研究与教育活动很好地结合在一起。高中课程将侧重于使用图形结构-数学和计算机科学的关键抽象-来模拟生物数据。高中教师将有机会进行与成绩单汇编有关的跨学科研究。新的本科课程将重点提高学生在建模和解决现实世界问题的能力。还将努力使代表性不足群体的本科生和研究生参与研究机会。该项目的结果可以在PI的网站上找到:https://sites.psu.edu/mxs2589.This奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Mingfu Shao其他文献
Anchorage accurately assembles anchor-flanked synthetic long reads
- DOI:
10.1186/s13015-025-00288-4 - 发表时间:
2025-07-06 - 期刊:
- 影响因子:1.700
- 作者:
Xiaofei Carl Zang;Xiang Li;Kyle Metcalfe;Tuval Ben-Yehezkel;Ryan Kelley;Mingfu Shao - 通讯作者:
Mingfu Shao
Context-aware seeds for read mapping
- DOI:
10.1186/s13015-020-00172-3 - 发表时间:
2020-05-23 - 期刊:
- 影响因子:1.700
- 作者:
Hongyi Xin;Mingfu Shao;Carl Kingsford - 通讯作者:
Carl Kingsford
Differentiation of the Seven Major Lyssavirus Species by Oligonucleotide Microarray
通过寡核苷酸微阵列区分七种主要狂犬病病毒属物种
- DOI:
- 发表时间:
2011 - 期刊:
- 影响因子:9.4
- 作者:
J. Xi;Huancheng Guo;Ye Feng;Yunbin Xu;Mingfu Shao;N. Su;Jiayu Wan;Jiping Li;C. Tu - 通讯作者:
C. Tu
The Pennsylvania State University The Graduate School USING FEMALE ALIGNMENT FEATURES TO IDENTIFY READS FROM THE Y CHROMOSOME IN NANOPORE WHOLE GENOME SEQUENCING DATA
宾夕法尼亚州立大学研究生院使用女性比对特征来识别纳米孔全基因组测序数据中 Y 染色体的读数
- DOI:
- 发表时间:
2020 - 期刊:
- 影响因子:0
- 作者:
Natasha Stopa;Mingfu Shao - 通讯作者:
Mingfu Shao
An Exact Algorithm to Compute the DCJ Distance for Genomes with Duplicate Genes
计算具有重复基因的基因组 DCJ 距离的精确算法
- DOI:
- 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
Mingfu Shao;Yu Lin;Bernard M. E. Moret - 通讯作者:
Bernard M. E. Moret
Mingfu Shao的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Mingfu Shao', 18)}}的其他基金
BBSRC-NSF/BIO: IIBR Informatics: Collaborative Research: Inference of isoform-level regulatory infrastructures with studies in steroid-producing cells
BBSRC-NSF/BIO:IIBR 信息学:合作研究:通过对类固醇生成细胞的研究推断异构体水平的监管基础设施
- 批准号:
2019797 - 财政年份:2020
- 资助金额:
$ 74.99万 - 项目类别:
Standard Grant
相似海外基金
CAREER: Foundations, Algorithms, and Tools for Browser Invalidation
职业:浏览器失效的基础、算法和工具
- 批准号:
2340192 - 财政年份:2024
- 资助金额:
$ 74.99万 - 项目类别:
Continuing Grant
CAREER: Toward Real-Time, Constraint-Aware Control of Complex Dynamical Systems: from Theory and Algorithms to Software Tools
职业:实现复杂动力系统的实时、约束感知控制:从理论和算法到软件工具
- 批准号:
2238424 - 财政年份:2023
- 资助金额:
$ 74.99万 - 项目类别:
Standard Grant
CAREER: Exact Optimal and Data-Adaptive Algorithms and Tools for Differential Privacy
职业:用于差异隐私的精确最优和数据自适应算法和工具
- 批准号:
2048091 - 财政年份:2021
- 资助金额:
$ 74.99万 - 项目类别:
Continuing Grant
CAREER: Pursuing New Tools for Approximation Algorithms
职业:追求近似算法的新工具
- 批准号:
1552097 - 财政年份:2016
- 资助金额:
$ 74.99万 - 项目类别:
Continuing Grant
CAREER: Robustness Analysis of Uncertain Programs: Theory, Algorithms, and Tools
职业:不确定程序的鲁棒性分析:理论、算法和工具
- 批准号:
1156059 - 财政年份:2011
- 资助金额:
$ 74.99万 - 项目类别:
Continuing Grant
CAREER: Robustness Analysis of Uncertain Programs: Theory, Algorithms, and Tools
职业:不确定程序的鲁棒性分析:理论、算法和工具
- 批准号:
0953507 - 财政年份:2010
- 资助金额:
$ 74.99万 - 项目类别:
Continuing Grant
CAREER: Multithreaded Algorithms, Models, and Runtime System Tools for Multimedia Applications
职业:多媒体应用程序的多线程算法、模型和运行时系统工具
- 批准号:
0196365 - 财政年份:2000
- 资助金额:
$ 74.99万 - 项目类别:
Continuing Grant
CAREER: Geometric Tools for Algorithms
职业:算法的几何工具
- 批准号:
9875024 - 财政年份:1999
- 资助金额:
$ 74.99万 - 项目类别:
Continuing Grant
CAREER: Multithreaded Algorithms, Models, and Runtime System Tools for Multimedia Applications
职业:多媒体应用程序的多线程算法、模型和运行时系统工具
- 批准号:
9875662 - 财政年份:1999
- 资助金额:
$ 74.99万 - 项目类别:
Continuing Grant
CAREER: Algorithms and Tools for Fault Tolerance and Migration in Distributed Computing Environments
职业:分布式计算环境中容错和迁移的算法和工具
- 批准号:
9703390 - 财政年份:1997
- 资助金额:
$ 74.99万 - 项目类别:
Continuing Grant