CAREER: A Comprehensive and Lightweight Framework for Transcriptome Analysis

职业生涯:全面、轻量级的转录组分析框架

基本信息

  • 批准号:
    2029424
  • 负责人:
  • 金额:
    $ 61.67万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2020
  • 资助国家:
    美国
  • 起止时间:
    2020-04-13 至 2024-01-31
  • 项目状态:
    已结题

项目摘要

Over the past decade, sequencing technologies have been developed that enable the profiling of gene expression across a wide variety of organisms and tissue types. These technologies allow the investigation, on a transcriptome-wide scale, of how gene expression changes in different conditions, under various stimuli, and in different disease states. These technologies are transformative in progressing basic science (e.g., understanding cell biology) and applied science (e.g., approaches to drug development). However, the deluge of data produced by these technologies brings with it a host of computational challenges, such as discovering if samples contain genes previously not annotated, accurately determining the sequence of these genes, and quantifying the abundance of all the genes expressed in a sample. Much effort has been dedicated to developing reliable computational methods for processing this data. Yet, even the best existing solutions are sometimes unsatisfactory in terms of their accuracy, and are becoming computationally burdensome given the rapid rate at which new data is being produced. The goal of this project is to develop a new generation of accurate and lightweight methods for analyzing gene and transcript expression using sequencing data. These tools will apply new data structures and algorithmic ideas to the problems of mapping sequencing reads, discovering and assembling new transcripts, and accurately and robustly quantifying gene expression. Further, these methods will work in the context of both established technologies and the newly-emerging protocols that allow measuring cell-specific gene expression across thousands of individual cells. The methods and software produced as a result of this project will help enable new discoveries by being more sensitive and accurate than existing approaches, will reduce costs by decreasing computational demands, and will speed up analyses by producing results more quickly than existing approaches. The outreach goals of this project include the creation of educational media including videos and a podcast series that will help convey key insights and benefits of new computational genomics methods to both practicing biologists as well as to the scientifically-interested public at large.Lightweight quantification methods streamline many common transcriptomic analyses, like differential expression testing in well-annotated organisms and common tissue types. Yet, substantial challenges remain that prevent the use of lightweight methods in many analysis tasks, e.g., when novel transcripts should be considered, or when events such as intron retention may play an important role. This work will advance the accuracy and fundamental capabilities of lightweight transcriptome analysis methods. Specifically, a new graph-based data structure will be developed for indexing a collection of reference sequences. A lightweight alignment tool will be built around this index that will incorporate a statistical model that allows sharing of splicing contexts across large collections of samples to guide and inform difficult alignment problems. A multi-sample methodology for joint transcript discovery and quantification will also be developed, based on new approaches to modeling the joint likelihood of transcript sequences and their abundances. Efficient likelihood factorizations will allow this approach to remain computationally convenient. Finally, a suite of tools for processing and quantifying high-throughput, single-cell RNA-seq data will be developed. These tools will adopt a novel approach for solving the cell barcoding, UMI deduplication, and gene expression estimation problems jointly, and in a unified statistical framework. The underlying model will share statistical information between cells to improve clustering and quantification, and to analyze expression at the resolution supported by the data, i.e., as groups of distinguishable isoforms. All of these tools will be released as high-quality, open-source software.
在过去的十年中,测序技术已经发展到能够分析各种生物和组织类型的基因表达。这些技术允许在转录组范围内调查基因表达在不同条件下、不同刺激下和不同疾病状态下的变化。这些技术在推进基础科学(例如,理解细胞生物学)和应用科学(例如,药物开发方法)方面具有变革性。然而,这些技术产生的大量数据带来了大量的计算挑战,例如发现样本是否包含以前未注释的基因,准确确定这些基因的序列,以及量化样本中表达的所有基因的丰度。为了开发可靠的计算方法来处理这些数据,已经付出了很多努力。然而,即使是现有最好的解决方案,有时在准确性方面也不能令人满意,而且考虑到新数据产生的速度很快,计算负担也越来越重。该项目的目标是开发新一代准确和轻量级的方法来分析基因和转录物表达使用测序数据。这些工具将应用新的数据结构和算法思想来绘制测序读数,发现和组装新的转录本,以及准确而稳健地定量基因表达。此外,这些方法将在现有技术和新出现的协议的背景下工作,这些协议允许在数千个单个细胞中测量细胞特异性基因表达。该项目产生的方法和软件将比现有方法更灵敏、更准确,有助于新发现,将通过减少计算需求来降低成本,并将通过比现有方法更快地产生结果来加快分析速度。该项目的推广目标包括创建教育媒体,包括视频和播客系列,这将有助于向实践生物学家和对科学感兴趣的广大公众传达新的计算基因组学方法的关键见解和好处。轻量级量化方法简化了许多常见的转录组学分析,如在注释良好的生物体和常见组织类型中的差异表达测试。然而,在许多分析任务中,大量的挑战仍然阻碍了轻量级方法的使用,例如,当应该考虑新的转录本时,或者当内含子保留等事件可能起重要作用时。这项工作将提高轻量级转录组分析方法的准确性和基本能力。具体地说,将开发一种新的基于图的数据结构,用于索引一组参考序列。将围绕该索引构建一个轻量级的比对工具,该工具将包含一个统计模型,该模型允许跨大量样本集合共享拼接上下文,以指导和告知困难的比对问题。基于对转录序列及其丰度的联合可能性建模的新方法,还将开发用于联合转录发现和量化的多样本方法。高效的似然分解将使这种方法在计算上保持方便性。最后,将开发一套用于处理和定量高通量单细胞RNA-seq数据的工具。这些工具将采用一种新的方法来共同解决细胞条形码、UMI重复数据删除和基因表达估计问题,并在统一的统计框架内。底层模型将在细胞之间共享统计信息,以提高聚类和量化,并在数据支持的分辨率下分析表达,即作为可区分的同种异构体组。所有这些工具都将作为高质量的开源软件发布。

项目成果

期刊论文数量(20)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Compression of quantification uncertainty for scRNA-seq counts.
  • DOI:
    10.1093/bioinformatics/btab001
  • 发表时间:
    2021-07-19
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Van Buren S;Sarkar H;Srivastava A;Rashid NU;Patro R;Love MI
  • 通讯作者:
    Love MI
An incrementally updatable and scalable system for large-scale sequence search using the Bentley–Saxe transformation
使用 Bentley Saxe 变换进行大规模序列搜索的增量更新和可扩展系统
  • DOI:
    10.1093/bioinformatics/btac142
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    5.8
  • 作者:
    Almodaresi, Fatemeh;Khan, Jamshed;Madaminov, Sergey;Ferdman, Michael;Johnson, Rob;Pandey, Prashant;Patro, Rob;Boeva, ed., Valentina
  • 通讯作者:
    Boeva, ed., Valentina
SEESAW: detecting isoform-level allelic imbalance accounting for inferential uncertainty.
SEESAW:检测同工级等位基因不平衡,这些不平衡解决了推理不确定性。
  • DOI:
    10.1186/s13059-023-03003-x
  • 发表时间:
    2023-07-12
  • 期刊:
  • 影响因子:
    12.3
  • 作者:
  • 通讯作者:
Airpart: interpretable statistical models for analyzing allelic imbalance in single-cell datasets.
Tximeta: Reference sequence checksums for provenance identification in RNA-seq
  • DOI:
    10.1371/journal.pcbi.1007664
  • 发表时间:
    2020-02-01
  • 期刊:
  • 影响因子:
    4.3
  • 作者:
    Love, Michael I.;Soneson, Charlotte;Patro, Rob
  • 通讯作者:
    Patro, Rob
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Robert Patro其他文献

Social Snapshot: A System for Temporally Coupled Social Photography
社交快照:时间耦合社交摄影系统
Detecting isoform-level allelic imbalance accounting for inferential uncertainty
检测异构体水平的等位基因不平衡以解释推论的不确定性
  • DOI:
    10.1101/2022.08.12.503785
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Euphy Y. Wu;N. P. Singh;Kwangbom Choi;Mohsen Zakeri;Matt Vincent;G. Churchill;Cheryl L. Ackert;Robert Patro;M. Love
  • 通讯作者:
    M. Love
MDMap: A system for data-driven layout and exploration of molecular dynamics simulations
MDMap:数据驱动布局和分子动力学模拟探索系统
ChromoVis : Feature-Rich Layouts of Chromosome Conformation Graphs
ChromoVis:功能丰富的染色体构象图布局
  • DOI:
  • 发表时间:
    2013
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Darya Filippova;Geet Duggal;Robert Patro;Carl Kingsford
  • 通讯作者:
    Carl Kingsford
Modeling and Visualization of Human Activities for Multicamera Networks
  • DOI:
    10.1155/2009/259860
  • 发表时间:
    2009-10-22
  • 期刊:
  • 影响因子:
    1.800
  • 作者:
    Aswin C. Sankaranarayanan;Robert Patro;Pavan Turaga;Amitabh Varshney;Rama Chellappa
  • 通讯作者:
    Rama Chellappa

Robert Patro的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Robert Patro', 18)}}的其他基金

CSR: Medium: Approximate Membership Query Data Structures in Computational Biology and Storage
CSR:中:计算生物学和存储中的近似成员资格查询数据结构
  • 批准号:
    2317838
  • 财政年份:
    2022
  • 资助金额:
    $ 61.67万
  • 项目类别:
    Continuing Grant
CAREER: A Comprehensive and Lightweight Framework for Transcriptome Analysis
职业生涯:全面、轻量级的转录组分析框架
  • 批准号:
    1750472
  • 财政年份:
    2018
  • 资助金额:
    $ 61.67万
  • 项目类别:
    Continuing Grant
CSR: Medium: Approximate Membership Query Data Structures in Computational Biology and Storage
CSR:中:计算生物学和存储中的近似成员资格查询数据结构
  • 批准号:
    1763680
  • 财政年份:
    2018
  • 资助金额:
    $ 61.67万
  • 项目类别:
    Continuing Grant
Bilateral BBSRC-NSF/BIO: ABI Innovation: Data-driven hierarchical analysis of de novo transcriptomes
双边 BBSRC-NSF/BIO:ABI 创新:数据驱动的从头转录组分层分析
  • 批准号:
    1564917
  • 财政年份:
    2016
  • 资助金额:
    $ 61.67万
  • 项目类别:
    Standard Grant

相似海外基金

CAREER: Towards a comprehensive model of seismicity throughout the seismic cycle
职业:建立整个地震周期地震活动的综合模型
  • 批准号:
    2339556
  • 财政年份:
    2024
  • 资助金额:
    $ 61.67万
  • 项目类别:
    Continuing Grant
C-NEWTRAL: smart CompreheNsive training to mainstrEam neW approaches for climaTe-neutRal cities through citizen engAgement and decision-making support
C-NEWTRAL:智能综合培训,通过公民参与和决策支持将气候中和城市的新方法纳入主流
  • 批准号:
    EP/Y032640/1
  • 财政年份:
    2024
  • 资助金额:
    $ 61.67万
  • 项目类别:
    Research Grant
RAPID: Enhancing WUI Fire Assessment through Comprehensive Data and High-Fidelity Simulation
RAPID:通过综合数据和高保真模拟增强 WUI 火灾评估
  • 批准号:
    2401876
  • 财政年份:
    2024
  • 资助金额:
    $ 61.67万
  • 项目类别:
    Standard Grant
NSF Convergence Accelerator Track K: COMPASS: Comprehensive Prediction, Assessment, and Equitable Solutions for Storm-Induced Contamination of Freshwater Systems
NSF 融合加速器轨道 K:COMPASS:风暴引起的淡水系统污染的综合预测、评估和公平解决方案
  • 批准号:
    2344357
  • 财政年份:
    2024
  • 资助金额:
    $ 61.67万
  • 项目类别:
    Standard Grant
Development of a comprehensive microbial immunotherapy platform with immuno-transcriptomic monitoring for treatment of bladder cancer (DOCMI-BC)
开发用于治疗膀胱癌的具有免疫转录组监测的综合微生物免疫治疗平台(DOCMI-BC)
  • 批准号:
    10087336
  • 财政年份:
    2024
  • 资助金额:
    $ 61.67万
  • 项目类别:
    Collaborative R&D
Postdoctoral Fellowship: SPRF: A Comprehensive Modeling Framework for Semantic Memory Search
博士后奖学金:SPRF:语义记忆搜索综合建模框架
  • 批准号:
    2313985
  • 财政年份:
    2024
  • 资助金额:
    $ 61.67万
  • 项目类别:
    Fellowship Award
A Longitudinal Study of the Relationship between Participation in a Comprehensive Exercise Program and Academic Achievement
参加综合锻炼计划与学业成绩之间关系的纵向研究
  • 批准号:
    24K14615
  • 财政年份:
    2024
  • 资助金额:
    $ 61.67万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
SBIR Phase I: A web portal for artificial intelligence (AI)-based comprehensive discovery of repositioning drugs
SBIR 第一阶段:基于人工智能 (AI) 的重新定位药物综合发现门户网站
  • 批准号:
    2334510
  • 财政年份:
    2024
  • 资助金额:
    $ 61.67万
  • 项目类别:
    Standard Grant
Forest Conservation by Payment for Ecosystem Services (PES): A Comprehensive Analysis of the Policy Outcome to Subsidize the Cooking Fuel in Teknaf-Ukhia, Bangladesh
通过生态系统服务付费 (PES) 进行森林保护:孟加拉国 Teknaf-Ukhia 烹饪燃料补贴政策结果的综合分析
  • 批准号:
    24K20975
  • 财政年份:
    2024
  • 资助金额:
    $ 61.67万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Comprehensive numerical analysis of ICRF heating with fast-ion-driven instabilities in toroidal plasmas
对环形等离子体中快速离子驱动不稳定性的 ICRF 加热进行全面数值分析
  • 批准号:
    24K17032
  • 财政年份:
    2024
  • 资助金额:
    $ 61.67万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了