Batch effects in molecular profiling data on cancers: detection, quantitation, interpretation, and correction
癌症分子分析数据的批次效应:检测、定量、解释和校正
基本信息
- 批准号:9352299
- 负责人:
- 金额:$ 41.63万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2016
- 资助国家:美国
- 起止时间:2016-09-13 至 2021-08-31
- 项目状态:已结题
- 来源:
- 关键词:AddressAlgorithmic SoftwareAlgorithmsBioinformaticsBiologicalBiologyBiomedical ResearchBiometryCancer CenterCancer DetectionCancer PatientClinicalClinical OncologyCollaborationsCommunitiesCompetenceComplexCountryDataDetectionDiagnosisDiseaseFamilyFundingFutureGalaxyGenerationsGenome Data Analysis CenterGenome Data Analysis NetworkGoalsImageryInstructionInternationalLaboratory ResearchLeadershipLinkMalignant NeoplasmsMapsMedicalMethodsMindMissionModelingMolecular ProfilingNational Cancer InstituteOrganPathway interactionsPatientsProcessProtocols documentationQuality ControlReproducibilityResearchResearch PersonnelResearch Project GrantsResourcesSamplingSecondary toSoftware EngineeringSpecimenSystemThe Cancer Genome AtlasTissuesTrainingTranslatingVisualWorkanticancer researchbasecancer genomicscomputerized toolscomputing resourcesdata integrationexperienceinnovationmembermolecular scalemultidisciplinarynext generationpreventsoftware developmentsurveillance datatooltrendtumoruser-friendlyweb portalweb siteworking group
项目摘要
Abstract: Technical batch effects pose a fundamental challenge to quality control and reproducibility of even
single-laboratory research projects, but the possibilities for serious error are greatly magnified in complex,
multi-institutional enterprises such as the cancer molecular profiling projects being undertaken by the NCI
Center for Cancer Genomics (CCG). To aid in detection, quantitation, interpretation, and (when appropriate)
correction for technical batch effects in such data, we have developed the MBatch computational tool and web
portal. MBatch has become indispensible for quality-control “surveillance” of data in The Cancer Genome Atlas
(TCGA) project, but detecting and quantitating batch effects (or trend effects or statistical outliers) are just the
first steps in a process. The next steps involve detective work in collaboration with those who generated the
data, drawing upon expertise in integrative analysis across data types, pathways, and systems-level biology.
That detective work usually succeeds in diagnosing the cause of a batch effect as technical or biological. If
technical, then computational correction can be done (judiciously).
The primary aim of the proposed Genome Data Analysis Center (GDAC) is to translate that successful
quality-control model from TCGA to other current and future large-scale molecular profiling projects sponsored
by the CCG. We will be ready to do that on Day 1. The second aim is to increase the power of MBatch to
perform the basic quality-control functions. We will add a number of innovative new algorithms (Replicates-
Based Normalization, Empirical Bayes++, and CorNet) and increase the repertoire of standard methods. We
will also add major visualization resources including our interactive Next-Generation Clustered Heat Maps. The
third aim is to make the system sufficiently robust, user-friendly, interactive, carefully documented, and easy to
install that bench biologists and clinical researchers can use it to explore CCG-generated data or their own.
Toward those ends, we have established collaborations to implement MBatch in Galaxy and on the cloud.
We bring a number of assets to the proposed GDAC, including (i) multidisciplinary expertise in bioinformatics,
biostatistics, software engineering, biology, and clinical oncology; PIs with a combined 21 years of experience
in high-throughput molecular profiling studies of clinical cancers (in a highly consortial context); international
leadership in batch effects analysis; a highly professional software engineering team with a track record of
producing high-end, highly visual bioinformatics packages and websites; a team of 20 Analysts whose
expertise can be called on; extensive computing resources, including one of the most powerful academically-
based machines in the world; strong institutional support; close working relationships with first-class basic,
translational, and clinical researchers throughout MD Anderson, one of the foremost cancer centers in the
country. The bottom-line mission of the GDAC will be aid the research community's effort to understand cancer
and to prevent, detect, diagnose, and treat it more effectively for the benefit of patients and their families.
翻译后摘要:技术批效应构成了一个根本的挑战,质量控制和再现性,甚至
单一实验室的研究项目,但严重错误的可能性大大放大了复杂,
多机构企业,如国家癌症研究所正在进行的癌症分子谱项目
癌症基因组学中心(CCG)。帮助检测、定量、解释和(适当时)
修正技术批量效应,在这样的数据,我们已经开发了MBatch计算工具和网络
portal. MBatch已经成为癌症基因组图谱数据质量控制“监测”的不可或缺的工具
(TCGA)项目,但检测和定量批次效应(或趋势效应或统计离群值)只是
一个过程中的第一步。接下来的步骤涉及与那些生成
数据,利用跨数据类型、路径和系统级生物学的综合分析专业知识。
这种检测工作通常可以成功地诊断出批量效应的原因是技术原因还是生物原因。如果
技术,然后计算校正可以做(明智地)。
拟议中的基因组数据分析中心(GDAC)的主要目的是将成功的
质量控制模型从TCGA到其他当前和未来的大规模分子分析项目赞助
在CCG。我们将在第一天做好准备。第二个目标是增加MBatch的功能,
履行基本的质量控制职能。我们将增加一些创新的新算法(复制-
基于标准化、经验贝叶斯++和CorNet)并增加标准方法的库。我们
还将添加主要的可视化资源,包括我们的交互式下一代加密热图。的
第三个目标是使系统足够强大、用户友好、交互式、仔细记录并且易于使用
生物学家和临床研究人员可以使用它来探索CCG生成数据或他们自己的数据。
为了实现这些目标,我们已经建立了合作关系,在Galaxy和云上实施MBatch。
我们为拟议中的GDAC带来了许多资产,包括(i)生物信息学方面的多学科专业知识,
生物统计学、软件工程、生物学和临床肿瘤学;具有21年经验的PI
临床癌症的高通量分子分析研究(在高度联合的背景下);国际
在批量效应分析方面的领导地位;一个高度专业的软件工程团队,
生产高端,高度可视化的生物信息学软件包和网站;一个由20名分析师组成的团队,
可以调用专业知识;广泛的计算资源,包括最强大的学术之一-
世界上最先进的机器;强大的机构支持;与一流的基础,
翻译和临床研究人员在MD安德森,在最重要的癌症中心之一,
国家GDAC的底线使命是帮助研究团体了解癌症
并更有效地预防、检测、诊断和治疗,造福患者及其家属。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(1)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Rehan Akbani其他文献
Rehan Akbani的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Rehan Akbani', 18)}}的其他基金
The Cancer Proteome Atlas: an Integrated Bioinformatics Resource for Functional Cancer Proteomic Data
癌症蛋白质组图谱:功能性癌症蛋白质组数据的综合生物信息学资源
- 批准号:
10653202 - 财政年份:2022
- 资助金额:
$ 41.63万 - 项目类别:
A Genome Data Analysis Center Focused on Batch Effect Analysis and Data Integration
专注于批量效应分析和数据集成的基因组数据分析中心
- 批准号:
10300778 - 财政年份:2021
- 资助金额:
$ 41.63万 - 项目类别:
A Genome Data Analysis Center Focused on Batch Effect Analysis and Data Integration
专注于批量效应分析和数据整合的基因组数据分析中心
- 批准号:
10689115 - 财政年份:2021
- 资助金额:
$ 41.63万 - 项目类别:
Computational Tools for Analysis and Visualization of Quality Control Issues in Metabolomic Data
用于代谢组数据质量控制问题分析和可视化的计算工具
- 批准号:
9615762 - 财政年份:2018
- 资助金额:
$ 41.63万 - 项目类别:
Computational Tools for Analysis and Visualization of Quality Control Issues in Metabolomic Data
用于代谢组数据质量控制问题分析和可视化的计算工具
- 批准号:
10251093 - 财政年份:2018
- 资助金额:
$ 41.63万 - 项目类别:
Computational Tools for Analysis and Visualization of Quality Control Issues in Metabolomic Data
用于代谢组数据质量控制问题分析和可视化的计算工具
- 批准号:
10005202 - 财政年份:2018
- 资助金额:
$ 41.63万 - 项目类别:
Integrated analysis of protein expression data from the Reverse Phase Protein Array (RPPA) platform
对反相蛋白阵列 (RPPA) 平台的蛋白表达数据进行集成分析
- 批准号:
10005168 - 财政年份:2016
- 资助金额:
$ 41.63万 - 项目类别:
Batch effects in molecular profiling data on cancers: detection, quantitation, interpretation, and correction
癌症分子分析数据的批次效应:检测、定量、解释和校正
- 批准号:
9789027 - 财政年份:2016
- 资助金额:
$ 41.63万 - 项目类别:
Integrated analysis of protein expression data from the Reverse Phase Protein Array (RPPA) platform
对反相蛋白阵列 (RPPA) 平台的蛋白表达数据进行集成分析
- 批准号:
9789028 - 财政年份:2016
- 资助金额:
$ 41.63万 - 项目类别:
Integrative Pipeline for Analysis & Translational Application of TCGA Data (GDAC)
综合分析管道
- 批准号:
8546703 - 财政年份:2009
- 资助金额:
$ 41.63万 - 项目类别:
相似海外基金
Medcircuit, the algorithmic software reducing waiting times in emergency department and general practice waiting rooms.
MedCircuit,一种算法软件,可减少急诊科和全科候诊室的等待时间。
- 批准号:
133416 - 财政年份:2018
- 资助金额:
$ 41.63万 - 项目类别:
Feasibility Studies
SHF: Small: Programming Abstractions for Algorithmic Software Synthesis
SHF:小型:算法软件综合的编程抽象
- 批准号:
0916351 - 财政年份:2009
- 资助金额:
$ 41.63万 - 项目类别:
Standard Grant