权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Algorithms and Tools for Allele-Specific Transcript Assembly

职业：等位基因特异性转录本组装的算法和工具

基本信息

批准号：
2145171
负责人：
Mingfu Shao
金额：
$ 74.99万
依托单位：
Pennsylvania State Univ University Park
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-07-01 至 2027-06-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2145171&HistoricalAwards=false
关键词：
CAREER Algorithms Tools Allele Specific

项目摘要

This award is funded in whole or in part under the American Rescue Plan Act of 2021 (Public Law 117-2).Many organisms, including humans, have two sets of chromosomes, one from the mother and one from the father, that exist as homologous pairs. It is commonly observed that, the two distinct genes (or alleles) at the same location of two homologous chromosomes, may produce imbalanced gene products (i.e., mRNAs). This phenomenon is called allele-specific expression (ASE). ASE has been known to be closely related to multiple phenotypes and can contribute to cancer susceptibility. ASEs offer an important source of biomarkers that could be potentially used as phenotypic biomarkers or for disease diagnosis. Additionally, ASE analysis serves as a powerful analytical tool to determine expression quantitative trait locus (eQTL) and to study a variety of biological processes such as imprinting, protein-truncating variants, and X-chromosome inactivation. The recently established RNA-sequencing technology (RNA-seq) provides an accurate and efficient way to quantitatively measure ASE. However, the sequencing reads generated from current technologies are not full-length. Hence, computational methods are needed to reconstruct the full-length mRNAs expressed from the two different alleles that exist on homologous chromosomes, a problem referred to as allele-specific transcript assembly. Allele-specific transcript assembly is exceedingly difficult. Allel-specific transcript assembly is difficult because it requires simultaneously threading mutations and splice junctions while inferring unknown number of full-length transcripts and their abundances. This project aims to develop accurate allele-specific transcript assembly methods that are applicable to short-reads, long-reads, and single-cell RNA-seq data. Specifically, these investigators first tackle how to use phased SNPs (i.e., mutations) to improve allele-specific assembly. they show that, phased SNPs can be equivalently represented as incompatible pairs of vertices in a so-called variant splice graph. Then heuristics are proposed to solve a formulation with incompatible pairs included. Long-range information will be used in paired-/multi-end RNA-seq data to improve allele-specific assembly, with an algorithm that decomposes the variant splice graph into paths while fully preserving the paired-/multi-end constraints. The allele-specific assembly will be solved in the presence of structure variations, a crucial scenario in studying cancer. A new data structure will model SNPs, alternative splicing, and structure variations all together. A new algorithms is also proposed to identify allele-specific structure variations, which is of independent interests but also leads to a two-step algorithm for allele-specific assembly. Covariate-adaptive multiple hypothesis testing will control false positive rates. Implementing these algorithms result in accurate allele-specific transcript assemblers for various types of data and a new ASE analysis pipeline for broader use. The proposed research is well integrated with educational activities. High-school curricula will be developed that focus on using graph structure--a key abstract in mathematics and computer science--to model biological data. High school teachers will be provided opportunities to conduct interdisciplinary research related to transcript assembly. New undergraduate course will be developed with a focus to enhance students’ ability in modeling and solving real-world problems. Efforts will also be made to engage undergraduates and graduate students from underrepresented groups in research opportunities. The results of the project can be found at the PI’s website: https://sites.psu.edu/mxs2589.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

该奖项全部或部分由《2021年美国救援计划法案》（公法117-2）资助。许多生物体，包括人类，都有两套染色体，一套来自母亲，一套来自父亲，它们以同源对的形式存在。通常观察到，在两个同源染色体的相同位置处的两个不同基因（或等位基因）可能产生不平衡的基因产物（即，mRNA）。这种现象被称为等位基因特异性表达（ASE）。已知ASE与多种表型密切相关，并可能导致癌症易感性。ASE提供了一个重要的生物标志物来源，可以潜在地用作表型生物标志物或用于疾病诊断。此外，ASE分析作为一个强大的分析工具，以确定表达数量性状位点（eQTL）和研究各种生物学过程，如印记，蛋白质截短变异，和X染色体失活。最近建立的RNA测序技术（RNA-seq）提供了一种准确和有效的方法来定量测量ASE。然而，由当前技术产生的测序读段不是全长的。因此，需要计算方法来重建从同源染色体上存在的两个不同等位基因表达的全长mRNA，这是一个被称为等位基因特异性转录物组装的问题。等位基因特异性转录本组装极其困难。等位基因特异性转录本组装是困难的，因为它需要同时线程突变和剪接点，同时推断未知数量的全长转录本和它们的丰度。该项目旨在开发适用于短读段、长读段和单细胞RNA-seq数据的精确等位基因特异性转录本组装方法。具体来说，这些研究人员首先解决如何使用分阶段SNP（即，突变）以改善等位基因特异性组装。他们表明，定相的SNP可以等效地表示为所谓的变体剪接图中的不相容的顶点对。然后提出了求解包含不相容对的公式的方法。远程信息将用于配对/多末端RNA-seq数据中，以改善等位基因特异性组装，其算法将变体剪接图分解为路径，同时完全保留配对/多末端约束。等位基因特异性组装将在存在结构变异的情况下得到解决，这是研究癌症的关键场景。一种新的数据结构将对SNP、选择性剪接和结构变异进行建模。还提出了一种新的算法来识别等位基因特异性的结构变化，这是独立的利益，但也导致等位基因特异性组装的两步算法。协变量自适应多重假设检验将控制假阳性率。实现这些算法的结果在准确的等位基因特异性转录汇编为各种类型的数据和一个新的ASE分析管道更广泛的使用。拟议的研究与教育活动很好地结合在一起。高中课程将侧重于使用图形结构-数学和计算机科学的关键抽象-来模拟生物数据。高中教师将有机会进行与成绩单汇编有关的跨学科研究。新的本科课程将重点提高学生在建模和解决现实世界问题的能力。还将努力使代表性不足群体的本科生和研究生参与研究机会。该项目的结果可以在PI的网站上找到：https://sites.psu.edu/mxs2589.This奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Mingfu Shao其他文献

Anchorage accurately assembles anchor-flanked synthetic long reads

DOI：
10.1186/s13015-025-00288-4
发表时间：
2025-07-06
期刊：
Algorithms for Molecular Biology
影响因子：
1.700
作者：
Xiaofei Carl Zang;Xiang Li;Kyle Metcalfe;Tuval Ben-Yehezkel;Ryan Kelley;Mingfu Shao
通讯作者：
Mingfu Shao

Context-aware seeds for read mapping

DOI：
10.1186/s13015-020-00172-3
发表时间：
2020-05-23
期刊：
Algorithms for Molecular Biology
影响因子：
1.700
作者：
Hongyi Xin;Mingfu Shao;Carl Kingsford
通讯作者：
Carl Kingsford

Differentiation of the Seven Major Lyssavirus Species by Oligonucleotide Microarray

通过寡核苷酸微阵列区分七种主要狂犬病病毒属物种

DOI：
发表时间：
2011
期刊：
Journal of Clinical Microbiology
影响因子：
9.4
作者：
J. Xi;Huancheng Guo;Ye Feng;Yunbin Xu;Mingfu Shao;N. Su;Jiayu Wan;Jiping Li;C. Tu
通讯作者：
C. Tu

The Pennsylvania State University The Graduate School USING FEMALE ALIGNMENT FEATURES TO IDENTIFY READS FROM THE Y CHROMOSOME IN NANOPORE WHOLE GENOME SEQUENCING DATA

宾夕法尼亚州立大学研究生院使用女性比对特征来识别纳米孔全基因组测序数据中 Y 染色体的读数

DOI：
发表时间：
2020
期刊：
影响因子：
0
作者：
Natasha Stopa;Mingfu Shao
通讯作者：
Mingfu Shao

An Exact Algorithm to Compute the DCJ Distance for Genomes with Duplicate Genes

计算具有重复基因的基因组 DCJ 距离的精确算法

DOI：
发表时间：
2014
期刊：
Annual International Conference on Research in Computational Molecular Biology
影响因子：
0
作者：
Mingfu Shao;Yu Lin;Bernard M. E. Moret
通讯作者：
Bernard M. E. Moret