A cohesive statistical approach for missing values in high-dimensional metabolomics data
针对高维代谢组学数据中缺失值的内聚统计方法
基本信息
- 批准号:9433329
- 负责人:
- 金额:$ 16.84万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-09-20 至 2019-09-19
- 项目状态:已结题
- 来源:
- 关键词:AddressAffectAlgorithmsAntibiotic ResistanceAutoimmune DiseasesBayesian ModelingBiologicalClassificationCommunitiesComplexComputer softwareCoupledDataData SetDatabasesDetectionDiabetes MellitusDiseaseFundingGoalsGuidelinesHealthHumanLaboratoriesLiteratureMalignant NeoplasmsMethodologyMethodsModelingObesityPhilosophyPlant RootsPlayPractice GuidelinesProceduresRecommendationResearchResolutionRoleSample SizeSamplingSourceStatistical Data InterpretationStatistical ModelsStructureUncertaintyWorkanalytical methodbasecohesionhigh dimensionalityhuman diseaseimprovedmetabolomicsmethod developmentnoveloutreachprogramsresponsesimulationuser friendly softwareuser-friendlyweb appweb-accessible
项目摘要
PROJECT SUMMARY
With the number of human health studies involving metabolomics rising at a rapid rate, the development of
methods to address critical analytic barriers in the analysis of metabolomics data is of critical importance.
Missing values (MVs) are a pervasive, and often ignored, issue in metabolomics, yet the treatment of MVs can
have a substantial impact on differential abundance and other downstream statistical analyses. The MVs
problem in metabolomics is quite challenging, namely because the source of MVs is not always clear and can
arise because the metabolite is i) not biologically present in the sample, ii) present in the sample but at a
concentration below the lower limit of detection (LOD), or iii) present in the sample but undetected due to
technical issues related to sample pre-processing steps (e.g. peak resolution). Current commonly used methods
(e.g., substitution by zeros, LOD, or the mean value) tend to be overly-simplistic and produce sub-optimal and
potentially misleading results. Since there is a noticeable absence of imputation methods from the literature that
properly account for the different types of missingness in metabolomics data, there is an urgent need to invest
in improving statistical models of MVs that are specific to metabolomics. We have recently developed a modified
K-nearest neighbors (KNN) imputation algorithm that accounts for the truncation point (i.e., the LOD) in the data
(KNN-TN). Based on simulations derived from real metabolomics studies, this algorithm showed considerable
improvement in imputation accuracy (root-mean squared error) compared to single value (LOD, mean, zero)
imputation approaches and standard KNN imputation. In this proposal, we will develop an alternative Bayesian
modeling approach that accounts for the uncertainty due to imputation and stabilizes estimates for small samples
by sharing information across metabolites. Further, we will evaluate the impact of MV imputation on downstream
statistical analyses based on simulations from a wide-variety of publicly available datasets from the
Metabolomics Workbench. Our analyses will allow us to make comprehensive recommendations to analysts
about which imputation algorithm(s) are optimal in terms of biological impact. Lastly, we will develop publicly
available software for implementing all developed imputation methods, including a web-accessible interface to
broaden outreach and impact. The overall long term goal of this proposal is to develop user-friendly software
and best-practices guidelines for imputation strategies in metabolomics data, thereby improving accuracy of
downstream statistical analysis and the resulting biological impact.
项目摘要
随着涉及代谢组学的人类健康研究数量的快速增长,
解决代谢组学数据分析中的关键分析障碍的方法是至关重要的。
缺失值(MV)是代谢组学中普遍存在且经常被忽视的问题,但MV的处理可以
对不同丰度和其他下游统计分析有重大影响。的mv
代谢组学中的问题是相当具有挑战性的,即因为MV的来源并不总是清楚的,
因为代谢物i)不生物学地存在于样品中,ii)存在于样品中,但以
浓度低于检测下限(LOD),或iii)存在于样品中,但由于以下原因未检出
与样品预处理步骤相关的技术问题(例如峰分辨率)。目前常用的方法
(e.g.,用零、LOD或平均值替换)往往过于简单化,
可能误导的结果。由于文献中明显缺乏估算方法,
正确解释代谢组学数据中不同类型的缺失,迫切需要投资
改进代谢组学特定MV的统计模型。我们最近开发了一种改良的
K-最近邻(KNN)填补算法,其考虑截断点(即,LOD)的数据
(KNN-TN)。基于来自真实的代谢组学研究的模拟,该算法显示出相当大的
与单个值(LOD、平均值、零)相比,插补准确度(均方根误差)提高
插补方法和标准KNN插补。在这个建议中,我们将开发一个替代贝叶斯
一种建模方法,用于解释归因于插补的不确定性,并稳定小样本的估计值
通过在代谢物之间共享信息。此外,我们将评估MV插补对下游的影响,
统计分析的基础上模拟从各种公开可用的数据集,从
代谢组学我们的分析将使我们能够向分析师提出全面的建议
关于哪种插补算法在生物影响方面是最佳的。最后,我们将公开开发
用于执行所有已开发估算方法的现有软件,包括一个可通过网络访问的界面,
扩大外联和影响。这项建议的整体长远目标,是发展方便用户的软件
以及代谢组学数据中插补策略的最佳实践指南,从而提高
下游统计分析和由此产生的生物影响。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Guy Brock其他文献
Guy Brock的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Guy Brock', 18)}}的其他基金
BD4ISU: Big Data for Indiana State University
BD4ISU:印第安纳州立大学的大数据
- 批准号:
9883642 - 财政年份:2017
- 资助金额:
$ 16.84万 - 项目类别:
Integrated Analysis: Epigenetic Regulation of Gene Expression During Orofacial De
综合分析:口面部脱发过程中基因表达的表观遗传调控
- 批准号:
8385921 - 财政年份:2012
- 资助金额:
$ 16.84万 - 项目类别:
Integrated Analysis: Epigenetic Regulation of Gene Expression During Orofacial De
综合分析:口面部脱发过程中基因表达的表观遗传调控
- 批准号:
8537402 - 财政年份:2012
- 资助金额:
$ 16.84万 - 项目类别:
相似海外基金
How Does Particle Material Properties Insoluble and Partially Soluble Affect Sensory Perception Of Fat based Products
不溶性和部分可溶的颗粒材料特性如何影响脂肪基产品的感官知觉
- 批准号:
BB/Z514391/1 - 财政年份:2024
- 资助金额:
$ 16.84万 - 项目类别:
Training Grant
BRC-BIO: Establishing Astrangia poculata as a study system to understand how multi-partner symbiotic interactions affect pathogen response in cnidarians
BRC-BIO:建立 Astrangia poculata 作为研究系统,以了解多伙伴共生相互作用如何影响刺胞动物的病原体反应
- 批准号:
2312555 - 财政年份:2024
- 资助金额:
$ 16.84万 - 项目类别:
Standard Grant
RII Track-4:NSF: From the Ground Up to the Air Above Coastal Dunes: How Groundwater and Evaporation Affect the Mechanism of Wind Erosion
RII Track-4:NSF:从地面到沿海沙丘上方的空气:地下水和蒸发如何影响风蚀机制
- 批准号:
2327346 - 财政年份:2024
- 资助金额:
$ 16.84万 - 项目类别:
Standard Grant
Graduating in Austerity: Do Welfare Cuts Affect the Career Path of University Students?
紧缩毕业:福利削减会影响大学生的职业道路吗?
- 批准号:
ES/Z502595/1 - 财政年份:2024
- 资助金额:
$ 16.84万 - 项目类别:
Fellowship
感性個人差指標 Affect-X の構築とビスポークAIサービスの基盤確立
建立个人敏感度指数 Affect-X 并为定制人工智能服务奠定基础
- 批准号:
23K24936 - 财政年份:2024
- 资助金额:
$ 16.84万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Insecure lives and the policy disconnect: How multiple insecurities affect Levelling Up and what joined-up policy can do to help
不安全的生活和政策脱节:多种不安全因素如何影响升级以及联合政策可以提供哪些帮助
- 批准号:
ES/Z000149/1 - 财政年份:2024
- 资助金额:
$ 16.84万 - 项目类别:
Research Grant
How does metal binding affect the function of proteins targeted by a devastating pathogen of cereal crops?
金属结合如何影响谷类作物毁灭性病原体靶向的蛋白质的功能?
- 批准号:
2901648 - 财政年份:2024
- 资助金额:
$ 16.84万 - 项目类别:
Studentship
Investigating how double-negative T cells affect anti-leukemic and GvHD-inducing activities of conventional T cells
研究双阴性 T 细胞如何影响传统 T 细胞的抗白血病和 GvHD 诱导活性
- 批准号:
488039 - 财政年份:2023
- 资助金额:
$ 16.84万 - 项目类别:
Operating Grants
New Tendencies of French Film Theory: Representation, Body, Affect
法国电影理论新动向:再现、身体、情感
- 批准号:
23K00129 - 财政年份:2023
- 资助金额:
$ 16.84万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
The Protruding Void: Mystical Affect in Samuel Beckett's Prose
突出的虚空:塞缪尔·贝克特散文中的神秘影响
- 批准号:
2883985 - 财政年份:2023
- 资助金额:
$ 16.84万 - 项目类别:
Studentship














{{item.name}}会员




