Robust Inference in the Presence of Data Heterogeneity and Structured Missing Data
存在数据异构性和结构化缺失数据时的稳健推理
基本信息
- 批准号:10238926
- 负责人:
- 金额:$ 23.68万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-09-01 至 2023-08-31
- 项目状态:已结题
- 来源:
- 关键词:AftercareBiologicalCase StudyDataData AnalyticsData SetDevelopmentGenerationsGeneticHealth PolicyHealth SurveysHeterogeneityIndividualMedicalMethodsModelingModernizationMolecular BiologyPathway interactionsPerformanceProceduresProcessReproducibilityResearch PersonnelRunningSamplingSampling StudiesStatistical MethodsStructureSurveysSystemTechnologyTestingTimeWorkalgorithm developmentanalytical methodbaseexperimental studyhealth dataheterogenous datanew technologypopulation surveysequencing platformstructured datatool
项目摘要
Modern sequencing platforms can sequence tens of billions of bases per run and generate peta-bytes of
data, but individual study sizes may be small. Similarly, a wide variety of health data are now publicly
available to inform health policy decisions, and it may be advantageous to use data from several different
surveys. The ability to aggregate and compare heterogeneous data across different datasets would be
critical to expanding the usable data available for any individual study. We propose systematically studying
two major barriers to this effort: 1) Aggregating different medical and biological datasets; 2) Dealing
with batch effects and structured heterogeneous data. Aim 1 allows us to fully utilize information on
related topics from diverse datasets, as information across different experiments needs to be combined in a
statistically rigorous, reliable way - the process needs to fully exploit the available information, not
introduce biases, and still be systematic and reproducible. Not all experiments study the same set of
variables/features, and combining this information is a non-trivial task. The second aim allows researchers
to handle heterogeneity between individuals or samples, which happens with ubiquity in biological and
health data. For instance, sequencing machines are evolving over time and samples obtained wlth new
technologies cannot be directly compared to samples taken on older systems, even if data was collected in
the same lab. This also applies to samples obtained under different environmental conditions. Currently,
researchers are forced to either ignore such biases, potentially leading to violations of statistical validity, or
limit their analysis to data generated in one batch of samples. This work will extend the set of useful data
available to researchers in a wide variety of domains and provide methods to compare and synthesize
disparate datasets. The proposed work will result in: (1) Development of algorithms with theoretical
performance guarantees for combining information from datasets with small number of overlapping
features; (2) Development of rigorous statistical procedures for hypothesis testing in the presence of within-.
group heterogeneity. These methods are particularly helpful for pre-/post- treatment studies, studies
containing batch effects, or studies where samples are collected over long time periods using different
technologies; (3) Implementation of these methods in case studies to domains in molecular biology (genetic
pathway hypothesis generation) and population survey data for health policy modeling.
现代测序平台每次运行可以对数百亿个碱基进行测序,并产生千万亿字节的DNA序列。
数据,但个别研究规模可能很小。同样,各种各样的健康数据现在都是公开的,
可用于为卫生政策决策提供信息,并且使用来自多个不同来源的数据可能会很有利
调查。跨不同数据集聚合和比较异构数据的能力将是
这对于扩大任何单独研究的可用数据至关重要。我们建议系统地研究
这一努力的两个主要障碍:1)聚集不同的医学和生物数据集; 2)处理
批量效果和结构化异构数据。目标1使我们能够充分利用信息,
不同数据集的相关主题,因为不同实验的信息需要结合在一起,
统计上严格、可靠的方法--该过程需要充分利用现有信息,
引入偏差,并且仍然是系统的和可重复的。并不是所有的实验都研究同一组
变量/功能,并且组合这些信息是一项不平凡的任务。第二个目标是让研究人员
处理个体或样本之间的异质性,这种异质性在生物学和医学中普遍存在,
健康数据。例如,测序机随着时间的推移而不断发展,并且获得的样本具有新的
技术不能直接与旧系统上采集的样本进行比较,即使数据是在
同一个实验室。这也适用于在不同环境条件下获得的样品。目前,
研究人员被迫要么忽视这些偏见,可能导致违反统计有效性,或
将他们的分析限制在一批样品中产生的数据。这项工作将扩大有用的数据集
可供研究人员在各种各样的领域,并提供方法来比较和综合
不同的数据集。所提出的工作将导致:(1)发展的算法与理论
组合来自具有少量重叠的数据集的信息的性能保证
特征;(2)在存在内-的情况下,制定严格的假设检验统计程序。
群体异质性这些方法特别有助于治疗前/治疗后研究,
包含批次效应,或使用不同的方法在长时间内收集样本的研究
(3)在分子生物学(遗传学)领域的案例研究中实施这些方法
路径假设生成)和人口调查数据用于卫生政策建模。
项目成果
期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
RIFLE: Imputation and Robust Inference from Low Order Marginals.
- DOI:
- 发表时间:2021-09
- 期刊:
- 影响因子:0
- 作者:Sina Baharlouei;Kelechi Ogudu;S. Suen;Meisam Razaviyayn
- 通讯作者:Sina Baharlouei;Kelechi Ogudu;S. Suen;Meisam Razaviyayn
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Meisam Razaviyayn其他文献
Meisam Razaviyayn的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Meisam Razaviyayn', 18)}}的其他基金
Robust Inference in the Presence of Data Heterogeneity and Structured Missing Data
存在数据异构性和结构化缺失数据时的稳健推理
- 批准号:
10000139 - 财政年份:2019
- 资助金额:
$ 23.68万 - 项目类别:
Robust Inference in the Presence of Data Heterogeneity and Structured Missing Data
存在数据异构性和结构化缺失数据时的鲁棒推理
- 批准号:
9916886 - 财政年份:2019
- 资助金额:
$ 23.68万 - 项目类别:
相似海外基金
Analysis of Anthropocene with biological archives: A case study in Lake Biwa
利用生物档案分析人类世:以琵琶湖为例
- 批准号:
20H00208 - 财政年份:2020
- 资助金额:
$ 23.68万 - 项目类别:
Grant-in-Aid for Scientific Research (A)
Spatiotemporal dynamics of cohesive species in a granule; case study granular enhanced biological phosphorus removal (EBPR)
颗粒中粘性物质的时空动力学;
- 批准号:
487755-2016 - 财政年份:2018
- 资助金额:
$ 23.68万 - 项目类别:
Postdoctoral Fellowships
Spatiotemporal dynamics of cohesive species in a granule; case study granular enhanced biological phosphorus removal (EBPR)
颗粒中粘性物质的时空动力学;
- 批准号:
487755-2016 - 财政年份:2017
- 资助金额:
$ 23.68万 - 项目类别:
Postdoctoral Fellowships
Soil Organic Carbon Quantity and Distribution in Polar Deserts, and the Associated Biological Communities - A Case Study from Both Poles
极地沙漠土壤有机碳数量和分布以及相关生物群落——两极案例研究
- 批准号:
475751-2015 - 财政年份:2015
- 资助金额:
$ 23.68万 - 项目类别:
Postgraduate Scholarships - Doctoral
Silicic acid uptake by marine diatoms and the biological pump of carbon during glacial periods: A case study in the eastern tropical Pacific
冰期期间海洋硅藻对硅酸的吸收和碳的生物泵:东热带太平洋的案例研究
- 批准号:
NE/E017738/1 - 财政年份:2008
- 资助金额:
$ 23.68万 - 项目类别:
Research Grant
Implementation of a Biological Case Study Curriculum at a Minority - Serving Institution
在少数族裔服务机构实施生物学案例研究课程
- 批准号:
0511697 - 财政年份:2005
- 资助金额:
$ 23.68万 - 项目类别:
Standard Grant
A biological basis for the efficient breeding of native plants for export markets: a case study with the Australian Goodeniaceae
出口市场本地植物高效育种的生物学基础:澳大利亚古德尼亚科植物的案例研究
- 批准号:
LP0218037 - 财政年份:2002
- 资助金额:
$ 23.68万 - 项目类别:
Linkage Projects
A biological basis for the efficient breeding of native plants for export markets: a case study with the Australian Goodeniaceae
出口市场本地植物高效育种的生物学基础:澳大利亚古德尼亚科植物的案例研究
- 批准号:
ARC : LP0218037 - 财政年份:2002
- 资助金额:
$ 23.68万 - 项目类别:
Linkage Projects
Establishment of fundamental theory for classical biological control - a case study on biological control of arrowhead scale -
经典生物防治基础理论的建立——以箭头鳞生物防治为例——
- 批准号:
10660050 - 财政年份:1998
- 资助金额:
$ 23.68万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Secondary biological pump driven by food web interactions : A case study in Lake Biwa for gloval environmental issues
由食物网相互作用驱动的二次生物泵:琵琶湖全球环境问题的案例研究
- 批准号:
08308031 - 财政年份:1996
- 资助金额:
$ 23.68万 - 项目类别:
Grant-in-Aid for Scientific Research (A)