A Modular Framework for Accurate, Efficient, and Reproducible Analysis of RNA-Seq Data
用于准确、高效和可重复分析 RNA-Seq 数据的模块化框架
基本信息
- 批准号:10238765
- 负责人:
- 金额:$ 29.5万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-03-12 至 2023-06-30
- 项目状态:已结题
- 来源:
- 关键词:AddressAdoptedAdoptionAlgorithmsAllelesArchivesAreaAttentionBiologicalBiological AssayBiomedical ResearchCharacteristicsCommunitiesDataData SetDatabasesDevelopmentDiseaseEventFollow-Up StudiesGene Expression ProfilingGenerationsGenesGeneticGenomeGenomicsGoalsHealthHumanHybridsInfrastructureKnowledgeLeadLocationMeasurementMetadataMethodsModelingNucleotidesOrganismPhenotypeProcessProtein IsoformsRNARNA EditingRNA analysisReportingReproducibilityReproducibility of ResultsResearch PersonnelResourcesSalmonSamplingScienceSequence AlignmentSourceSpeedStatistical Data InterpretationTestingTimeTranscriptUncertaintyVariantVisionVisualizationVisualization softwareanalysis pipelinecomputational pipelinescryptographydesigndifferential expressionexperimental studyhuman errorimprovedlight weighttask analysistooltranscriptometranscriptome sequencingtranscriptomicswasting
项目摘要
PROJECT SUMMARY / ABSTRACT
We propose to develop improved, modular pipelines for more accurate and reproducible RNA-seq analyses. RNA-
seq experiments are widely used in biological and biomedical sciences to determine the expression level of all genes
and isoforms across multiple samples. Raw RNA-seq data must be pre-processed to determine abundances of RNA
molecules. State-of-the-art tools for quantifying RNA abundances are fast and efficient, model and correct for common
technical biases, and provide estimates of the uncertainty of the abundances. Downstream tools for visualization and
statistical testing of abundance ideally should incorporate uncertainty of abundance estimates from the quantification
step, take into account the sampling variability inherent in observations in all sequencing experiments, and estimate, for
each transcript, the underlying biological variation in abundances across samples. While isolated tools fulfill a subset
of the above characteristics, we propose to develop a pipeline which addresses all of these, while at the same time
leveraging the powerful existing infrastructure for gene expression analysis. Our modular approach to improving the
current RNA-seq analysis pipelines will also seek to make use of the best downstream tools for gene set analysis and
dynamic report generation. Current RNA-seq computational pipelines do not keep track of critical pieces of metadata
throughout the analysis, including genome and transcriptome version, such that final results cannot reliably be repro-
duced or put in the correct genomic context as the information about annotation provenance may be lost. While fast
and lightweight tools have been quickly adopted for gene- and transcript-level quantification, they are not yet optimized
for certain RNA-seq analysis tasks such as quantification of allele specific expression. We have developed a set of top
performing tools for abundance quantification and downstream inference. We propose to formalize our existing tools
into a pipeline, and build additional tools and infrastructure, which optimally estimates and propagates uncertainty
from abundance estimation (described in Aim 1), and which stores critical provenance metadata automatically on
the user's behalf — this metadata tagging and propagation will be integrated with community resources (described
in Aim 2). Furthermore, we propose building out the capabilities of our existing quantification infrastructure to allow
for improved mapping accuracy and more robust and accurate allelic expression estimation (described in Aim 3).
项目总结/摘要
我们建议开发改进的模块化管道,以实现更准确和可重复的RNA-seq分析。核糖核酸
seq实验广泛用于生物学和生物医学科学,以确定所有基因的表达水平。
和同种型。必须对原始RNA-seq数据进行预处理,以确定RNA丰度
分子。用于定量RNA丰度的最先进的工具是快速和有效的,模型和正确的常见
技术偏差,并提供丰度不确定性的估计。用于可视化和
丰度的统计检验在理想情况下应包括定量估算丰度的不确定性
步骤,考虑所有测序实验中观察结果的固有采样变异性,并估计
每一个转录本,样本中丰度的潜在生物变异。虽然孤立的工具可以填充一个子集
考虑到上述特征,我们建议开发一条能够解决所有这些问题的管道,同时
利用现有强大的基础设施进行基因表达分析。我们的模块化方法,以提高
目前的RNA-seq分析管道也将寻求利用最好的下游工具进行基因集分析,
动态报告生成。目前的RNA-seq计算管道没有跟踪关键的元数据片段
在整个分析过程中,包括基因组和转录组版本,因此最终结果不能可靠地重现。
引入或放入正确的基因组背景中,因为关于注释起源的信息可能丢失。而fast
轻量级工具已被迅速用于基因和转录水平的定量,但它们尚未优化
用于某些RNA-seq分析任务,例如等位基因特异性表达的定量。我们开发了一套顶级的
执行丰度量化和下游推断的工具。我们建议将我们现有的工具正规化
并构建额外的工具和基础设施,以最佳方式估计和传播不确定性
从丰度估计(目标1中所述),并自动存储关键出处元数据,
该元数据标记和传播将与社区资源集成(描述了
目标2)。此外,我们建议加强现有量化基础设施的能力,
用于改进的定位准确性和更稳健和准确的等位基因表达估计(描述于目标3中)。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Michael Isaiah Love其他文献
Michael Isaiah Love的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Michael Isaiah Love', 18)}}的其他基金
Systematic in vivo characterization of disease-associated regulatory variants
疾病相关调控变异的系统体内表征
- 批准号:
10472058 - 财政年份:2021
- 资助金额:
$ 29.5万 - 项目类别:
Systematic in vivo characterization of disease-associated regulatory variants
疾病相关调控变异的系统体内表征
- 批准号:
10296745 - 财政年份:2021
- 资助金额:
$ 29.5万 - 项目类别:
Systematic in vivo characterization of disease-associated regulatory variants
疾病相关调控变异的系统体内表征
- 批准号:
10631225 - 财政年份:2021
- 资助金额:
$ 29.5万 - 项目类别:
A Modular Framework for Accurate, Efficient, and Reproducible Analysis of RNA-Seq Data
用于准确、高效和可重复分析 RNA-Seq 数据的模块化框架
- 批准号:
10170579 - 财政年份:2020
- 资助金额:
$ 29.5万 - 项目类别:
A Modular Framework for Accurate, Efficient, and Reproducible Analysis of RNA-Seq Data
用于准确、高效和可重复分析 RNA-Seq 数据的模块化框架
- 批准号:
10440402 - 财政年份:2020
- 资助金额:
$ 29.5万 - 项目类别:
pathQTL: Integrative Multi-Omics Causal Inference of Molecular Mechanisms Leading to Neuropsychiatric Illness
pathQTL:导致神经精神疾病的分子机制的综合多组学因果推断
- 批准号:
10318952 - 财政年份:2018
- 资助金额:
$ 29.5万 - 项目类别:
pathQTL: Integrative Multi-Omics Causal Inference of Molecular Mechanisms Leading to Neuropsychiatric Illness
pathQTL:导致神经精神疾病的分子机制的综合多组学因果推断
- 批准号:
10550143 - 财政年份:2018
- 资助金额:
$ 29.5万 - 项目类别:
pathQTL: Integrative Multi-Omics Causal Inference of Molecular Mechanisms Leading to Neuropsychiatric Illness
pathQTL:导致神经精神疾病的分子机制的综合多组学因果推断
- 批准号:
10066367 - 财政年份:2018
- 资助金额:
$ 29.5万 - 项目类别:
相似海外基金
How novices write code: discovering best practices and how they can be adopted
新手如何编写代码:发现最佳实践以及如何采用它们
- 批准号:
2315783 - 财政年份:2023
- 资助金额:
$ 29.5万 - 项目类别:
Standard Grant
One or Several Mothers: The Adopted Child as Critical and Clinical Subject
一位或多位母亲:收养的孩子作为关键和临床对象
- 批准号:
2719534 - 财政年份:2022
- 资助金额:
$ 29.5万 - 项目类别:
Studentship
A material investigation of the ceramic shards excavated from the Omuro Ninsei kiln site: Production techniques adopted by Nonomura Ninsei.
对大室仁清窑遗址出土的陶瓷碎片进行材质调查:野野村仁清采用的生产技术。
- 批准号:
20K01113 - 财政年份:2020
- 资助金额:
$ 29.5万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
A comparative study of disabled children and their adopted maternal figures in French and English Romantic Literature
英法浪漫主义文学中残疾儿童及其收养母亲形象的比较研究
- 批准号:
2633211 - 财政年份:2020
- 资助金额:
$ 29.5万 - 项目类别:
Studentship
A comparative study of disabled children and their adopted maternal figures in French and English Romantic Literature
英法浪漫主义文学中残疾儿童及其收养母亲形象的比较研究
- 批准号:
2436895 - 财政年份:2020
- 资助金额:
$ 29.5万 - 项目类别:
Studentship
A comparative study of disabled children and their adopted maternal figures in French and English Romantic Literature
英法浪漫主义文学中残疾儿童及其收养母亲形象的比较研究
- 批准号:
2633207 - 财政年份:2020
- 资助金额:
$ 29.5万 - 项目类别:
Studentship
A Study on Mutual Funds Adopted for Individual Defined Contribution Pension Plans
个人设定缴存养老金计划采用共同基金的研究
- 批准号:
19K01745 - 财政年份:2019
- 资助金额:
$ 29.5万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
The limits of development: State structural policy, comparing systems adopted in two European mountain regions (1945-1989)
发展的限制:国家结构政策,比较欧洲两个山区采用的制度(1945-1989)
- 批准号:
426559561 - 财政年份:2019
- 资助金额:
$ 29.5万 - 项目类别:
Research Grants
Securing a Sense of Safety for Adopted Children in Middle Childhood
确保被收养儿童的中期安全感
- 批准号:
2236701 - 财政年份:2019
- 资助金额:
$ 29.5万 - 项目类别:
Studentship
Structural and functional analyses of a bacterial protein translocation domain that has adopted diverse pathogenic effector functions within host cells
对宿主细胞内采用多种致病效应功能的细菌蛋白易位结构域进行结构和功能分析
- 批准号:
415543446 - 财政年份:2019
- 资助金额:
$ 29.5万 - 项目类别:
Research Fellowships