Robust Inference in the Presence of Data Heterogeneity and Structured Missing Data
存在数据异构性和结构化缺失数据时的鲁棒推理
基本信息
- 批准号:9916886
- 负责人:
- 金额:$ 21.2万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-09-01 至 2022-08-31
- 项目状态:已结题
- 来源:
- 关键词:AftercareAlgorithmsBiologicalCase StudyDataData SetDevelopmentGenerationsGeneticHealth PolicyHeterogeneityIndividualInstructionMedicalMethodsModelingModernizationMolecular BiologyPathway interactionsPerformanceProceduresProcessReproducibilityResearch PersonnelRunningSamplingSampling StudiesStatistical MethodsStructureSurveysSystemTechnologyTestingTimeWorkbaseexperimental studyhealth datanew technologypopulation surveysequencing platformtool
项目摘要
Modern sequencing platforms can sequence tens of billions of bases per run and generate peta-bytes of
data, but individual study sizes may be small. Similarly, a wide variety of health data are now publicly
available to inform health policy decisions, and it may be advantageous to use data from several different
surveys. The ability to aggregate and compare heterogeneous data across different datasets would be
critical to expanding the usable data available for any individual study. We propose systematically studying
two major barriers to this effort: 1) Aggregating different medical and biological datasets; 2) Dealing
with batch effects and structured heterogeneous data. Aim 1 allows us to fully utilize information on
related topics from diverse datasets, as information across different experiments needs to be combined in a
statistically rigorous, reliable way - the process needs to fully exploit the available information, not
introduce biases, and still be systematic and reproducible. Not all experiments study the same set of
variables/features, and combining this information is a non-trivial task. The second aim allows researchers
to handle heterogeneity between individuals or samples, which happens with ubiquity in biological and
health data. For instance, sequencing machines are evolving over time and samples obtained wlth new
technologies cannot be directly compared to samples taken on older systems, even if data was collected in
the same lab. This also applies to samples obtained under different environmental conditions. Currently,
researchers are forced to either ignore such biases, potentially leading to violations of statistical validity, or
limit their analysis to data generated in one batch of samples. This work will extend the set of useful data
available to researchers in a wide variety of domains and provide methods to compare and synthesize
disparate datasets. The proposed work will result in: (1) Development of algorithms with theoretical
performance guarantees for combining information from datasets with small number of overlapping
features; (2) Development of rigorous statistical procedures for hypothesis testing in the presence of within-.
group heterogeneity. These methods are particularly helpful for pre-/post- treatment studies, studies
containing batch effects, or studies where samples are collected over long time periods using different
technologies; (3) Implementation of these methods in case studies to domains in molecular biology (genetic
pathway hypothesis generation) and population survey data for health policy modeling.
现代测序平台可以在每一次运行中对数百亿个碱基进行测序,并产生Peta字节的
数据,但个别研究的规模可能很小。同样,各种各样的健康数据现在也是公开的
可用于为卫生政策决策提供信息,并且使用来自几个不同的
调查。跨不同数据集聚合和比较异类数据的能力将是
对于扩大可用于任何个别研究的可用数据至关重要。我们建议系统地研究
这项工作的两个主要障碍:1)聚合不同的医学和生物数据集;2)处理
具有批处理效果和结构化的异类数据。目标1使我们能够充分利用关于
来自不同数据集的相关主题,因为不同实验的信息需要在
统计上严谨、可靠的方式-流程需要充分利用可用的信息,而不是
引入偏见,而且仍然是系统性和可重复性的。并不是所有的实验都研究相同的一组
变量/功能,并且组合这些信息不是一项微不足道的任务。第二个目标是让研究人员
处理个体或样本之间的异质性,这种情况在生物和
健康数据。例如,测序机随着时间的推移而发展,并用新的方法获得样本
技术不能直接与在较旧系统上采集的样本进行比较,即使数据是在
同一个实验室。这也适用于在不同环境条件下获得的样品。目前,
研究人员被迫要么忽视这种偏见,这可能会导致违反统计有效性,要么
将他们的分析限制在一批样本中产生的数据。这项工作将扩展有用的数据集
可供广泛领域的研究人员使用,并提供比较和综合的方法
完全不同的数据集。所提出的工作将导致:(1)开发具有理论意义的算法
用于组合来自具有少量重叠的数据集的信息的性能保证
特征;(2)制定严格的统计程序,以便在存在内部因素的情况下进行假设检验。
群体异质性。这些方法对治疗前/治疗后的研究特别有帮助
包含批处理效果的研究,或在长时间内使用不同的
技术;(3)在分子生物学(遗传学)领域的案例研究中实施这些方法
路径假设生成)和用于卫生政策建模的人口调查数据。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Meisam Razaviyayn其他文献
Meisam Razaviyayn的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Meisam Razaviyayn', 18)}}的其他基金
Robust Inference in the Presence of Data Heterogeneity and Structured Missing Data
存在数据异构性和结构化缺失数据时的稳健推理
- 批准号:
10000139 - 财政年份:2019
- 资助金额:
$ 21.2万 - 项目类别:
Robust Inference in the Presence of Data Heterogeneity and Structured Missing Data
存在数据异构性和结构化缺失数据时的稳健推理
- 批准号:
10238926 - 财政年份:2019
- 资助金额:
$ 21.2万 - 项目类别:
相似海外基金
CAREER: Transferring biological networks emergent principles to drone swarm collaborative algorithms
职业:将生物网络新兴原理转移到无人机群协作算法
- 批准号:
2339373 - 财政年份:2024
- 资助金额:
$ 21.2万 - 项目类别:
Continuing Grant
Point-of-care optical spectroscopy platform and novel ratio-metric algorithms for rapid and systematic functional characterization of biological models in vivo
即时光学光谱平台和新颖的比率度量算法,可快速、系统地表征体内生物模型的功能
- 批准号:
10655174 - 财政年份:2023
- 资助金额:
$ 21.2万 - 项目类别:
Statistical Inference from Multiscale Biological Data: theory, algorithms, applications
多尺度生物数据的统计推断:理论、算法、应用
- 批准号:
EP/Y037375/1 - 财政年份:2023
- 资助金额:
$ 21.2万 - 项目类别:
Research Grant
Analysis of words: algorithms for biological sequences, music and texts
单词分析:生物序列、音乐和文本的算法
- 批准号:
RGPIN-2016-03661 - 财政年份:2021
- 资助金额:
$ 21.2万 - 项目类别:
Discovery Grants Program - Individual
Analysis of words: algorithms for biological sequences, music and texts
单词分析:生物序列、音乐和文本的算法
- 批准号:
RGPIN-2016-03661 - 财政年份:2019
- 资助金额:
$ 21.2万 - 项目类别:
Discovery Grants Program - Individual
Building flexible biological particle detection algorithms for emerging real-time instrumentation
为新兴实时仪器构建灵活的生物颗粒检测算法
- 批准号:
2278799 - 财政年份:2019
- 资助金额:
$ 21.2万 - 项目类别:
Studentship
CAREER: Microscopy Image Analysis to Aid Biological Discovery: Optics, Algorithms, and Community
职业:显微镜图像分析有助于生物发现:光学、算法和社区
- 批准号:
2019967 - 财政年份:2019
- 资助金额:
$ 21.2万 - 项目类别:
Standard Grant
Analysis of words: algorithms for biological sequences, music and texts
单词分析:生物序列、音乐和文本的算法
- 批准号:
RGPIN-2016-03661 - 财政年份:2018
- 资助金额:
$ 21.2万 - 项目类别:
Discovery Grants Program - Individual
Analysis of words: algorithms for biological sequences, music and texts
单词分析:生物序列、音乐和文本的算法
- 批准号:
RGPIN-2016-03661 - 财政年份:2017
- 资助金额:
$ 21.2万 - 项目类别:
Discovery Grants Program - Individual
Analysis of words: algorithms for biological sequences, music and texts
单词分析:生物序列、音乐和文本的算法
- 批准号:
RGPIN-2016-03661 - 财政年份:2016
- 资助金额:
$ 21.2万 - 项目类别:
Discovery Grants Program - Individual