Batch effects in molecular profiling data on cancers: detection, quantitation, interpretation, and correction

癌症分子分析数据的批次效应:检测、定量、解释和校正

基本信息

  • 批准号:
    9789027
  • 负责人:
  • 金额:
    $ 37.84万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2016
  • 资助国家:
    美国
  • 起止时间:
    2016-09-13 至 2021-08-31
  • 项目状态:
    已结题

项目摘要

Abstract: Technical batch effects pose a fundamental challenge to quality control and reproducibility of even single-laboratory research projects, but the possibilities for serious error are greatly magnified in complex, multi-institutional enterprises such as the cancer molecular profiling projects being undertaken by the NCI Center for Cancer Genomics (CCG). To aid in detection, quantitation, interpretation, and (when appropriate) correction for technical batch effects in such data, we have developed the MBatch computational tool and web portal. MBatch has become indispensible for quality-control “surveillance” of data in The Cancer Genome Atlas (TCGA) project, but detecting and quantitating batch effects (or trend effects or statistical outliers) are just the first steps in a process. The next steps involve detective work in collaboration with those who generated the data, drawing upon expertise in integrative analysis across data types, pathways, and systems-level biology. That detective work usually succeeds in diagnosing the cause of a batch effect as technical or biological. If technical, then computational correction can be done (judiciously). The primary aim of the proposed Genome Data Analysis Center (GDAC) is to translate that successful quality-control model from TCGA to other current and future large-scale molecular profiling projects sponsored by the CCG. We will be ready to do that on Day 1. The second aim is to increase the power of MBatch to perform the basic quality-control functions. We will add a number of innovative new algorithms (Replicates- Based Normalization, Empirical Bayes++, and CorNet) and increase the repertoire of standard methods. We will also add major visualization resources including our interactive Next-Generation Clustered Heat Maps. The third aim is to make the system sufficiently robust, user-friendly, interactive, carefully documented, and easy to install that bench biologists and clinical researchers can use it to explore CCG-generated data or their own. Toward those ends, we have established collaborations to implement MBatch in Galaxy and on the cloud. We bring a number of assets to the proposed GDAC, including (i) multidisciplinary expertise in bioinformatics, biostatistics, software engineering, biology, and clinical oncology; PIs with a combined 21 years of experience in high-throughput molecular profiling studies of clinical cancers (in a highly consortial context); international leadership in batch effects analysis; a highly professional software engineering team with a track record of producing high-end, highly visual bioinformatics packages and websites; a team of 20 Analysts whose expertise can be called on; extensive computing resources, including one of the most powerful academically- based machines in the world; strong institutional support; close working relationships with first-class basic, translational, and clinical researchers throughout MD Anderson, one of the foremost cancer centers in the country. The bottom-line mission of the GDAC will be aid the research community's effort to understand cancer and to prevent, detect, diagnose, and treat it more effectively for the benefit of patients and their families.
摘要:技术批次效应对质量控制和重复性提出了根本性挑战。 单个实验室的研究项目,但在复杂的、 多机构企业,例如 NCI 正在进行的癌症分子分析项目 癌症基因组学中心 (CCG)。帮助检测、定量、解释和(在适当时) 为了修正此类数据中的技术批次效应,我们开发了 MBatch 计算工具和网络 门户网站。 MBatch 已成为癌症基因组图谱数据质量控制“监视”不可或缺的一部分 (TCGA)项目,但检测和量化批次效应(或趋势效应或统计异常值)只是 流程中的第一步。接下来的步骤涉及与那些生成该事件的人合作进行侦探工作 数据,利用跨数据类型、途径和系统级生物学的综合分析专业知识。 这种侦探工作通常能够成功地将批次效应的原因诊断为技术或生物原因。如果 技术上,然后可以进行计算校正(明智地)。 拟议的基因组数据分析中心 (GDAC) 的主要目标是将成功的成果转化为 从 TCGA 到其他当前和未来赞助的大规模分子分析项目的质量控制模型 由CCG。我们将在第一天就做好准备。第二个目标是增强 MBatch 的能力 执行基本的质量控制职能。我们将添加一些创新的新算法(Replicates- 基于归一化、经验贝叶斯++和CorNet)并增加标准方法的库。我们 还将添加主要的可视化资源,包括我们的交互式下一代集群热图。这 第三个目标是使系统足够强大、用户友好、交互式、仔细记录并且易于使用 安装该平台后,生物学家和临床研究人员可以使用它来探索 CCG 生成的数据或他们自己的数据。 为了实现这些目标,我们建立了合作关系,在 Galaxy 和云端实施 MBatch。 我们为拟议的 GDAC 带来了许多资产,包括 (i) 生物信息学方面的多学科专业知识, 生物统计学、软件工程、生物学和临床肿瘤学; PI 总共拥有 21 年的经验 临床癌症的高通量分子分析研究(在高度联盟的背景下);国际的 在批次效应分析方面处于领先地位;高度专业的软件工程团队,拥有 制作高端、高度可视化的生物信息学软件包和网站;由 20 名分析师组成的团队 可以请专业人士;广泛的计算资源,包括最强大的学术资源之一 世界各地的机器;强有力的制度支持;与一流的基础设施密切的工作关系, MD 安德森癌症中心 (MD Anderson) 是美国最重要的癌症中心之一,其转化和临床研究人员 国家。 GDAC 的底线使命将是帮助研究界努力了解癌症 并更有效地预防、检测、诊断和治疗,造福患者及其家人。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Rehan Akbani其他文献

Rehan Akbani的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Rehan Akbani', 18)}}的其他基金

The Cancer Proteome Atlas: an Integrated Bioinformatics Resource for Functional Cancer Proteomic Data
癌症蛋白质组图谱:功能性癌症蛋白质组数据的综合生物信息学资源
  • 批准号:
    10653202
  • 财政年份:
    2022
  • 资助金额:
    $ 37.84万
  • 项目类别:
A Genome Data Analysis Center Focused on Batch Effect Analysis and Data Integration
专注于批量效应分析和数据集成的基因组数据分析中心
  • 批准号:
    10300778
  • 财政年份:
    2021
  • 资助金额:
    $ 37.84万
  • 项目类别:
A Genome Data Analysis Center Focused on Batch Effect Analysis and Data Integration
专注于批量效应分析和数据整合的基因组数据分析中心
  • 批准号:
    10689115
  • 财政年份:
    2021
  • 资助金额:
    $ 37.84万
  • 项目类别:
Computational Tools for Analysis and Visualization of Quality Control Issues in Metabolomic Data
用于代谢组数据质量控制问题分析和可视化的计算工具
  • 批准号:
    9615762
  • 财政年份:
    2018
  • 资助金额:
    $ 37.84万
  • 项目类别:
Computational Tools for Analysis and Visualization of Quality Control Issues in Metabolomic Data
用于代谢组数据质量控制问题分析和可视化的计算工具
  • 批准号:
    10251093
  • 财政年份:
    2018
  • 资助金额:
    $ 37.84万
  • 项目类别:
Computational Tools for Analysis and Visualization of Quality Control Issues in Metabolomic Data
用于代谢组数据质量控制问题分析和可视化的计算工具
  • 批准号:
    10005202
  • 财政年份:
    2018
  • 资助金额:
    $ 37.84万
  • 项目类别:
Batch effects in molecular profiling data on cancers: detection, quantitation, interpretation, and correction
癌症分子分析数据的批次效应:检测、定量、解释和校正
  • 批准号:
    9352299
  • 财政年份:
    2016
  • 资助金额:
    $ 37.84万
  • 项目类别:
Integrated analysis of protein expression data from the Reverse Phase Protein Array (RPPA) platform
对反相蛋白阵列 (RPPA) 平台的蛋白表达数据进行集成分析
  • 批准号:
    10005168
  • 财政年份:
    2016
  • 资助金额:
    $ 37.84万
  • 项目类别:
Integrated analysis of protein expression data from the Reverse Phase Protein Array (RPPA) platform
对反相蛋白阵列 (RPPA) 平台的蛋白表达数据进行集成分析
  • 批准号:
    9789028
  • 财政年份:
    2016
  • 资助金额:
    $ 37.84万
  • 项目类别:
Integrative Pipeline for Analysis & Translational Application of TCGA Data (GDAC)
综合分析管道
  • 批准号:
    8546703
  • 财政年份:
    2009
  • 资助金额:
    $ 37.84万
  • 项目类别:

相似海外基金

Medcircuit, the algorithmic software reducing waiting times in emergency department and general practice waiting rooms.
MedCircuit,一种算法软件,可减少急诊科和全科候诊室的等待时间。
  • 批准号:
    133416
  • 财政年份:
    2018
  • 资助金额:
    $ 37.84万
  • 项目类别:
    Feasibility Studies
SHF: Small: Programming Abstractions for Algorithmic Software Synthesis
SHF:小型:算法软件综合的编程抽象
  • 批准号:
    0916351
  • 财政年份:
    2009
  • 资助金额:
    $ 37.84万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了