Batch effects in molecular profiling data on cancers: detection, quantitation, interpretation, and correction

癌症分子分析数据的批次效应:检测、定量、解释和校正

基本信息

  • 批准号:
    9789027
  • 负责人:
  • 金额:
    $ 37.84万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2016
  • 资助国家:
    美国
  • 起止时间:
    2016-09-13 至 2021-08-31
  • 项目状态:
    已结题

项目摘要

Abstract: Technical batch effects pose a fundamental challenge to quality control and reproducibility of even single-laboratory research projects, but the possibilities for serious error are greatly magnified in complex, multi-institutional enterprises such as the cancer molecular profiling projects being undertaken by the NCI Center for Cancer Genomics (CCG). To aid in detection, quantitation, interpretation, and (when appropriate) correction for technical batch effects in such data, we have developed the MBatch computational tool and web portal. MBatch has become indispensible for quality-control “surveillance” of data in The Cancer Genome Atlas (TCGA) project, but detecting and quantitating batch effects (or trend effects or statistical outliers) are just the first steps in a process. The next steps involve detective work in collaboration with those who generated the data, drawing upon expertise in integrative analysis across data types, pathways, and systems-level biology. That detective work usually succeeds in diagnosing the cause of a batch effect as technical or biological. If technical, then computational correction can be done (judiciously). The primary aim of the proposed Genome Data Analysis Center (GDAC) is to translate that successful quality-control model from TCGA to other current and future large-scale molecular profiling projects sponsored by the CCG. We will be ready to do that on Day 1. The second aim is to increase the power of MBatch to perform the basic quality-control functions. We will add a number of innovative new algorithms (Replicates- Based Normalization, Empirical Bayes++, and CorNet) and increase the repertoire of standard methods. We will also add major visualization resources including our interactive Next-Generation Clustered Heat Maps. The third aim is to make the system sufficiently robust, user-friendly, interactive, carefully documented, and easy to install that bench biologists and clinical researchers can use it to explore CCG-generated data or their own. Toward those ends, we have established collaborations to implement MBatch in Galaxy and on the cloud. We bring a number of assets to the proposed GDAC, including (i) multidisciplinary expertise in bioinformatics, biostatistics, software engineering, biology, and clinical oncology; PIs with a combined 21 years of experience in high-throughput molecular profiling studies of clinical cancers (in a highly consortial context); international leadership in batch effects analysis; a highly professional software engineering team with a track record of producing high-end, highly visual bioinformatics packages and websites; a team of 20 Analysts whose expertise can be called on; extensive computing resources, including one of the most powerful academically- based machines in the world; strong institutional support; close working relationships with first-class basic, translational, and clinical researchers throughout MD Anderson, one of the foremost cancer centers in the country. The bottom-line mission of the GDAC will be aid the research community's effort to understand cancer and to prevent, detect, diagnose, and treat it more effectively for the benefit of patients and their families.
文摘:技术批次效应对EVEN的质量控制和重现性提出了根本的挑战 单一实验室的研究项目,但严重错误的可能性在复杂的 多机构企业,如NCI正在进行的癌症分子图谱项目 癌症基因组学中心(CCG)。协助检测、量化、解释和(在适当情况下) 为了修正这些数据中的技术批量效应,我们开发了MBatch计算工具和Web 传送门。MBatch已成为对癌症基因组图谱中数据的质量控制“监视”不可或缺的一部分 (TCGA)项目,但检测和量化批次效应(或趋势效应或统计异常值)只是 流程的第一步。接下来的步骤包括与那些产生 数据,利用跨数据类型、路径和系统级生物学的综合分析方面的专业知识。 这种检测工作通常会成功地将一批效应的原因诊断为技术或生物原因。如果 技术上,然后可以(明智地)进行计算校正。 拟议的基因组数据分析中心(GDAC)的主要目标是将成功的 从TCGA到其他当前和未来大型分子图谱项目的质量控制模型 由CCG提供。我们将在第一天做好准备。第二个目标是将MBatch的能力增加到 履行基本的质量控制职能。我们将增加一批创新的新算法(复制- 基于归一化、经验贝叶斯++和Cornet),并增加了标准方法的保留范围。我们 还将添加主要的可视化资源,包括我们的交互式下一代集群热图。这个 第三个目标是使系统足够健壮、用户友好、交互、仔细记录并易于 安装工作台,生物学家和临床研究人员可以使用它来探索CCG生成的数据或他们自己的数据。 为此,我们已经建立了合作关系,以便在Galaxy和云上实现MBatch。 我们为拟议的GDAC带来了许多资产,包括(I)生物信息学方面的多学科专业知识, 生物统计学、软件工程、生物学和临床肿瘤学;具有21年经验的PI 临床癌症高通量分子图谱研究(在高度联合的情况下);国际 在批量效果分析方面具有领导力;具有以下记录的高度专业的软件工程团队 制作高端、高度可视化的生物信息学包和网站;一个由20名分析师组成的团队,他们 可以调用专业知识;广泛的计算资源,包括最强大的学术资源之一- 世界上基于机器的;强大的机构支持;与一流的基础、 整个MD Anderson的翻译和临床研究人员,世界上最重要的癌症中心之一 国家。GDAC的底线使命将是帮助研究界努力了解癌症 更有效地预防、发现、诊断和治疗它,以造福于患者及其家人。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Rehan Akbani其他文献

Rehan Akbani的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Rehan Akbani', 18)}}的其他基金

The Cancer Proteome Atlas: an Integrated Bioinformatics Resource for Functional Cancer Proteomic Data
癌症蛋白质组图谱:功能性癌症蛋白质组数据的综合生物信息学资源
  • 批准号:
    10653202
  • 财政年份:
    2022
  • 资助金额:
    $ 37.84万
  • 项目类别:
A Genome Data Analysis Center Focused on Batch Effect Analysis and Data Integration
专注于批量效应分析和数据集成的基因组数据分析中心
  • 批准号:
    10300778
  • 财政年份:
    2021
  • 资助金额:
    $ 37.84万
  • 项目类别:
A Genome Data Analysis Center Focused on Batch Effect Analysis and Data Integration
专注于批量效应分析和数据整合的基因组数据分析中心
  • 批准号:
    10689115
  • 财政年份:
    2021
  • 资助金额:
    $ 37.84万
  • 项目类别:
Computational Tools for Analysis and Visualization of Quality Control Issues in Metabolomic Data
用于代谢组数据质量控制问题分析和可视化的计算工具
  • 批准号:
    9615762
  • 财政年份:
    2018
  • 资助金额:
    $ 37.84万
  • 项目类别:
Computational Tools for Analysis and Visualization of Quality Control Issues in Metabolomic Data
用于代谢组数据质量控制问题分析和可视化的计算工具
  • 批准号:
    10251093
  • 财政年份:
    2018
  • 资助金额:
    $ 37.84万
  • 项目类别:
Computational Tools for Analysis and Visualization of Quality Control Issues in Metabolomic Data
用于代谢组数据质量控制问题分析和可视化的计算工具
  • 批准号:
    10005202
  • 财政年份:
    2018
  • 资助金额:
    $ 37.84万
  • 项目类别:
Batch effects in molecular profiling data on cancers: detection, quantitation, interpretation, and correction
癌症分子分析数据的批次效应:检测、定量、解释和校正
  • 批准号:
    9352299
  • 财政年份:
    2016
  • 资助金额:
    $ 37.84万
  • 项目类别:
Integrated analysis of protein expression data from the Reverse Phase Protein Array (RPPA) platform
对反相蛋白阵列 (RPPA) 平台的蛋白表达数据进行集成分析
  • 批准号:
    10005168
  • 财政年份:
    2016
  • 资助金额:
    $ 37.84万
  • 项目类别:
Integrated analysis of protein expression data from the Reverse Phase Protein Array (RPPA) platform
对反相蛋白阵列 (RPPA) 平台的蛋白表达数据进行集成分析
  • 批准号:
    9789028
  • 财政年份:
    2016
  • 资助金额:
    $ 37.84万
  • 项目类别:
Integrative Pipeline for Analysis & Translational Application of TCGA Data (GDAC)
综合分析管道
  • 批准号:
    8546703
  • 财政年份:
    2009
  • 资助金额:
    $ 37.84万
  • 项目类别:

相似海外基金

Medcircuit, the algorithmic software reducing waiting times in emergency department and general practice waiting rooms.
MedCircuit,一种算法软件,可减少急诊科和全科候诊室的等待时间。
  • 批准号:
    133416
  • 财政年份:
    2018
  • 资助金额:
    $ 37.84万
  • 项目类别:
    Feasibility Studies
SHF: Small: Programming Abstractions for Algorithmic Software Synthesis
SHF:小型:算法软件综合的编程抽象
  • 批准号:
    0916351
  • 财政年份:
    2009
  • 资助金额:
    $ 37.84万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了