CAREER: A Comprehensive and Lightweight Framework for Transcriptome Analysis

职业生涯:全面、轻量级的转录组分析框架

基本信息

  • 批准号:
    1750472
  • 负责人:
  • 金额:
    $ 62.5万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2018
  • 资助国家:
    美国
  • 起止时间:
    2018-02-01 至 2020-06-30
  • 项目状态:
    已结题

项目摘要

Over the past decade, sequencing technologies have been developed that enable the profiling of gene expression across a wide variety of organisms and tissue types. These technologies allow the investigation, on a transcriptome-wide scale, of how gene expression changes in different conditions, under various stimuli, and in different disease states. These technologies are transformative in progressing basic science (e.g., understanding cell biology) and applied science (e.g., approaches to drug development). However, the deluge of data produced by these technologies brings with it a host of computational challenges, such as discovering if samples contain genes previously not annotated, accurately determining the sequence of these genes, and quantifying the abundance of all the genes expressed in a sample. Much effort has been dedicated to developing reliable computational methods for processing this data. Yet, even the best existing solutions are sometimes unsatisfactory in terms of their accuracy, and are becoming computationally burdensome given the rapid rate at which new data is being produced. The goal of this project is to develop a new generation of accurate and lightweight methods for analyzing gene and transcript expression using sequencing data. These tools will apply new data structures and algorithmic ideas to the problems of mapping sequencing reads, discovering and assembling new transcripts, and accurately and robustly quantifying gene expression. Further, these methods will work in the context of both established technologies and the newly-emerging protocols that allow measuring cell-specific gene expression across thousands of individual cells. The methods and software produced as a result of this project will help enable new discoveries by being more sensitive and accurate than existing approaches, will reduce costs by decreasing computational demands, and will speed up analyses by producing results more quickly than existing approaches. The outreach goals of this project include the creation of educational media including videos and a podcast series that will help convey key insights and benefits of new computational genomics methods to both practicing biologists as well as to the scientifically-interested public at large.Lightweight quantification methods streamline many common transcriptomic analyses, like differential expression testing in well-annotated organisms and common tissue types. Yet, substantial challenges remain that prevent the use of lightweight methods in many analysis tasks, e.g., when novel transcripts should be considered, or when events such as intron retention may play an important role. This work will advance the accuracy and fundamental capabilities of lightweight transcriptome analysis methods. Specifically, a new graph-based data structure will be developed for indexing a collection of reference sequences. A lightweight alignment tool will be built around this index that will incorporate a statistical model that allows sharing of splicing contexts across large collections of samples to guide and inform difficult alignment problems. A multi-sample methodology for joint transcript discovery and quantification will also be developed, based on new approaches to modeling the joint likelihood of transcript sequences and their abundances. Efficient likelihood factorizations will allow this approach to remain computationally convenient. Finally, a suite of tools for processing and quantifying high-throughput, single-cell RNA-seq data will be developed. These tools will adopt a novel approach for solving the cell barcoding, UMI deduplication, and gene expression estimation problems jointly, and in a unified statistical framework. The underlying model will share statistical information between cells to improve clustering and quantification, and to analyze expression at the resolution supported by the data, i.e., as groups of distinguishable isoforms. All of these tools will be released as high-quality, open-source software.
在过去的十年中,测序技术已经发展,使得能够在各种生物体和组织类型中分析基因表达。 这些技术允许在转录组范围内调查基因表达在不同条件下,在各种刺激下以及在不同疾病状态下如何变化。这些技术在基础科学的进步中具有变革性(例如,理解细胞生物学)和应用科学(例如,药物开发方法)。 然而,这些技术产生的大量数据带来了大量的计算挑战,例如发现样本是否包含先前未注释的基因,准确确定这些基因的序列,以及量化样本中表达的所有基因的丰度。许多努力致力于开发可靠的计算方法来处理这些数据。然而,即使是最好的现有解决方案有时在其准确性方面也不令人满意,并且考虑到新数据产生的快速速率,计算负担变得越来越重。该项目的目标是开发新一代精确和轻量级的方法,用于使用测序数据分析基因和转录本表达。这些工具将应用新的数据结构和算法思想来解决测序读数映射,发现和组装新的转录本以及准确和稳健地量化基因表达的问题。此外,这些方法将在已建立的技术和新出现的协议的背景下工作,这些协议允许测量数千个单个细胞中的细胞特异性基因表达。 该项目产生的方法和软件将比现有方法更敏感和准确,有助于实现新的发现,通过减少计算需求来降低成本,并通过比现有方法更快地产生结果来加快分析。该项目的推广目标包括创建教育媒体,包括视频和播客系列,这将有助于向执业生物学家以及对科学感兴趣的公众传达新的计算基因组学方法的关键见解和好处。轻量级定量方法简化了许多常见的转录组学分析,如在注释良好的生物体和常见组织类型中的差异表达测试。然而,仍然存在大量的挑战,这些挑战阻止了在许多分析任务中使用轻量级方法,例如,何时应该考虑新的转录本,或者何时诸如内含子保留等事件可能起重要作用。这项工作将提高轻量级转录组分析方法的准确性和基本能力。具体而言,将开发一种新的基于图形的数据结构,用于索引参考序列的集合。一个轻量级的比对工具将围绕该索引构建,该索引将包含一个统计模型,该模型允许在大量样本集合中共享拼接上下文,以指导和告知困难的比对问题。一个多样本的联合转录本发现和量化的方法也将开发,基于新的方法来模拟转录本序列及其丰度的联合可能性。有效的似然因子分解将允许这种方法在计算上保持方便。最后,将开发一套用于处理和量化高通量单细胞RNA-seq数据的工具。 这些工具将采用一种新的方法,在统一的统计框架中联合解决细胞条形码、UMI去重和基因表达估计问题。 基础模型将在细胞之间共享统计信息,以改善聚类和定量,并以数据支持的分辨率分析表达,即,作为可区分的同种型的组。所有这些工具都将作为高质量的开源软件发布。

项目成果

期刊论文数量(11)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
RNA Sequencing Data: Hitchhiker's Guide to Expression Analysis
Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification.
  • DOI:
    10.12688/f1000research.15398.3
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Love MI;Soneson C;Patro R
  • 通讯作者:
    Patro R
A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs
连接覆盖兼容性评分,用于量化转录本丰度估计和注释目录的可靠性
  • DOI:
    10.26508/lsa.201800175
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    4.4
  • 作者:
    Soneson, Charlotte;Love, Michael I;Patro, Rob;Hussain, Shobbir;Malhotra, Dheeraj;Robinson, Mark D
  • 通讯作者:
    Robinson, Mark D
An Efficient, Scalable, and Exact Representation of High-Dimensional Color Information Enabled Using de Bruijn Graph Search
  • DOI:
    10.1089/cmb.2019.0322
  • 发表时间:
    2020-03-16
  • 期刊:
  • 影响因子:
    1.7
  • 作者:
    Almodaresi, Fatemeh;Pandey, Prashant;Patro, Rob
  • 通讯作者:
    Patro, Rob
Sketching and Sublinear Data Structures in Genomics
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Robert Patro其他文献

Social Snapshot: A System for Temporally Coupled Social Photography
社交快照:时间耦合社交摄影系统
Detecting isoform-level allelic imbalance accounting for inferential uncertainty
检测异构体水平的等位基因不平衡以解释推论的不确定性
  • DOI:
    10.1101/2022.08.12.503785
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Euphy Y. Wu;N. P. Singh;Kwangbom Choi;Mohsen Zakeri;Matt Vincent;G. Churchill;Cheryl L. Ackert;Robert Patro;M. Love
  • 通讯作者:
    M. Love
MDMap: A system for data-driven layout and exploration of molecular dynamics simulations
MDMap:数据驱动布局和分子动力学模拟探索系统
ChromoVis : Feature-Rich Layouts of Chromosome Conformation Graphs
ChromoVis:功能丰富的染色体构象图布局
  • DOI:
  • 发表时间:
    2013
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Darya Filippova;Geet Duggal;Robert Patro;Carl Kingsford
  • 通讯作者:
    Carl Kingsford
Modeling and Visualization of Human Activities for Multicamera Networks
  • DOI:
    10.1155/2009/259860
  • 发表时间:
    2009-10-22
  • 期刊:
  • 影响因子:
    1.800
  • 作者:
    Aswin C. Sankaranarayanan;Robert Patro;Pavan Turaga;Amitabh Varshney;Rama Chellappa
  • 通讯作者:
    Rama Chellappa

Robert Patro的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Robert Patro', 18)}}的其他基金

CSR: Medium: Approximate Membership Query Data Structures in Computational Biology and Storage
CSR:中:计算生物学和存储中的近似成员资格查询数据结构
  • 批准号:
    2317838
  • 财政年份:
    2022
  • 资助金额:
    $ 62.5万
  • 项目类别:
    Continuing Grant
CAREER: A Comprehensive and Lightweight Framework for Transcriptome Analysis
职业生涯:全面、轻量级的转录组分析框架
  • 批准号:
    2029424
  • 财政年份:
    2020
  • 资助金额:
    $ 62.5万
  • 项目类别:
    Continuing Grant
CSR: Medium: Approximate Membership Query Data Structures in Computational Biology and Storage
CSR:中:计算生物学和存储中的近似成员资格查询数据结构
  • 批准号:
    1763680
  • 财政年份:
    2018
  • 资助金额:
    $ 62.5万
  • 项目类别:
    Continuing Grant
Bilateral BBSRC-NSF/BIO: ABI Innovation: Data-driven hierarchical analysis of de novo transcriptomes
双边 BBSRC-NSF/BIO:ABI 创新:数据驱动的从头转录组分层分析
  • 批准号:
    1564917
  • 财政年份:
    2016
  • 资助金额:
    $ 62.5万
  • 项目类别:
    Standard Grant

相似海外基金

CAREER: Towards a comprehensive model of seismicity throughout the seismic cycle
职业:建立整个地震周期地震活动的综合模型
  • 批准号:
    2339556
  • 财政年份:
    2024
  • 资助金额:
    $ 62.5万
  • 项目类别:
    Continuing Grant
C-NEWTRAL: smart CompreheNsive training to mainstrEam neW approaches for climaTe-neutRal cities through citizen engAgement and decision-making support
C-NEWTRAL:智能综合培训,通过公民参与和决策支持将气候中和城市的新方法纳入主流
  • 批准号:
    EP/Y032640/1
  • 财政年份:
    2024
  • 资助金额:
    $ 62.5万
  • 项目类别:
    Research Grant
RAPID: Enhancing WUI Fire Assessment through Comprehensive Data and High-Fidelity Simulation
RAPID:通过综合数据和高保真模拟增强 WUI 火灾评估
  • 批准号:
    2401876
  • 财政年份:
    2024
  • 资助金额:
    $ 62.5万
  • 项目类别:
    Standard Grant
NSF Convergence Accelerator Track K: COMPASS: Comprehensive Prediction, Assessment, and Equitable Solutions for Storm-Induced Contamination of Freshwater Systems
NSF 融合加速器轨道 K:COMPASS:风暴引起的淡水系统污染的综合预测、评估和公平解决方案
  • 批准号:
    2344357
  • 财政年份:
    2024
  • 资助金额:
    $ 62.5万
  • 项目类别:
    Standard Grant
Development of a comprehensive microbial immunotherapy platform with immuno-transcriptomic monitoring for treatment of bladder cancer (DOCMI-BC)
开发用于治疗膀胱癌的具有免疫转录组监测的综合微生物免疫治疗平台(DOCMI-BC)
  • 批准号:
    10087336
  • 财政年份:
    2024
  • 资助金额:
    $ 62.5万
  • 项目类别:
    Collaborative R&D
Postdoctoral Fellowship: SPRF: A Comprehensive Modeling Framework for Semantic Memory Search
博士后奖学金:SPRF:语义记忆搜索综合建模框架
  • 批准号:
    2313985
  • 财政年份:
    2024
  • 资助金额:
    $ 62.5万
  • 项目类别:
    Fellowship Award
A Longitudinal Study of the Relationship between Participation in a Comprehensive Exercise Program and Academic Achievement
参加综合锻炼计划与学业成绩之间关系的纵向研究
  • 批准号:
    24K14615
  • 财政年份:
    2024
  • 资助金额:
    $ 62.5万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
SBIR Phase I: A web portal for artificial intelligence (AI)-based comprehensive discovery of repositioning drugs
SBIR 第一阶段:基于人工智能 (AI) 的重新定位药物综合发现门户网站
  • 批准号:
    2334510
  • 财政年份:
    2024
  • 资助金额:
    $ 62.5万
  • 项目类别:
    Standard Grant
Forest Conservation by Payment for Ecosystem Services (PES): A Comprehensive Analysis of the Policy Outcome to Subsidize the Cooking Fuel in Teknaf-Ukhia, Bangladesh
通过生态系统服务付费 (PES) 进行森林保护:孟加拉国 Teknaf-Ukhia 烹饪燃料补贴政策结果的综合分析
  • 批准号:
    24K20975
  • 财政年份:
    2024
  • 资助金额:
    $ 62.5万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Comprehensive numerical analysis of ICRF heating with fast-ion-driven instabilities in toroidal plasmas
对环形等离子体中快速离子驱动不稳定性的 ICRF 加热进行全面数值分析
  • 批准号:
    24K17032
  • 财政年份:
    2024
  • 资助金额:
    $ 62.5万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了