Statistical Analysis of Complex Featured Data: High Dimensionality, Measurement Error and Missing Values
复杂特征数据的统计分析:高维、测量误差和缺失值
基本信息
- 批准号:RGPIN-2018-03819
- 负责人:
- 金额:$ 2.71万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2020
- 资助国家:加拿大
- 起止时间:2020-01-01 至 2021-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
As the advancement of modern technology in acquiring data, data with diverse features are becoming more accessible than ever before. The increasing complexity of structures and the large dimension of data have posed an urgent need for the development of novel and flexible modeling and analysis tools. While many complex features may be present in different applications, this research focuses on two prevailing issues commonly present in modern data : the quality and dimensionality of data. I plan to explore important problems in the following areas.
(1) High dimensional data with measurement error and missing values
In the era of Big Data, large scale data are often available where the dimension of the variables is much larger than the number of subjects in the study. This presents a great challenge to traditional statistical methods which normally require the sample size to be bigger than the dimension of the variables. In addition, we face challenges related to data quality - measurement imprecision and missing observations. This research aims to investigate problems concerning high dimensionality, measurement error, and missing observations. The plan is to examine how measurement error and missing values may interplay in the analysis of high dimensional data. The objectives are to develop valid inference methods to handle data with all these features involved. Applications of the developed methods to survival data, image data and longitudinal data are planned.
(2) Causal inference with complex featured data
As opposed to association studies, causal inference is often the focus of empirical research. While many research methods are available for various settings, they are vulnerable to poor quality data. Most existing methods require that the data are “perfect” in the sense that no missing observations nor measurement error are present, but these assumptions are often violated in practice. Measurement error and missing observations have been a long standing concern in many studies including epidemiological, nutrition and environmental studies. However, research on causal inference with these features is rather limited and remains unexplored. I plan to explore this exciting area and develop new methods to address complex effects caused by measurement error and/or missing observation on causal inference. Furthermore, I intend to investigate the problems in the presence of large scale data where the dimension of potential confounders is high.
My primary goals are to develop original and innovative methodology in advancing foundational work and to facilitate applications. This research is anticipated to provide valuable insights into making the best use of available large scale data and to broaden the scope of existing strategies and research. It is expected to have significant impact on the statistical community as well as other fields including public health, medical studies and data science.
随着现代数据获取技术的进步,具有不同特征的数据正变得比以往任何时候都更容易获得。结构日益复杂,数据量越来越大,迫切需要开发新的、灵活的建模和分析工具。虽然在不同的应用中可能会出现许多复杂的特征,但本研究集中在现代数据中常见的两个普遍问题:数据的质量和维度。我计划在以下几个方面探索重要问题。
(1)存在测量误差和缺失值的高维数据
在大数据时代,在变量的维度远远大于研究对象数量的情况下,往往可以获得大规模数据。这对传统的统计方法提出了很大的挑战,传统的统计方法通常要求样本量大于变量的维度。此外,我们还面临着与数据质量相关的挑战--测量不准确和遗漏观测数据。本研究旨在探讨高维、测量误差及遗漏观测值等问题。我们的计划是研究测量误差和缺失值在高维数据分析中的相互作用。我们的目标是开发有效的推理方法来处理包含所有这些特征的数据。计划将所开发的方法应用于生存数据、图像数据和纵向数据。
(2)复杂特征数据的因果推理
与关联研究不同,因果推理往往是实证研究的重点。虽然许多研究方法适用于不同的环境,但它们容易受到质量不佳的数据的影响。大多数现有的方法要求数据是“完美的”,即不存在遗漏观测或测量误差,但这些假设在实践中经常被违反。测量误差和遗漏观测一直是包括流行病学、营养学和环境研究在内的许多研究中长期关注的问题。然而,关于具有这些特征的因果推理的研究相当有限,仍未被探索。我计划探索这一令人兴奋的领域,并开发新的方法来解决测量误差和/或遗漏观测对因果推理造成的复杂影响。此外,我打算在存在大规模数据的情况下调查潜在混杂因素的维度很高的问题。
我的主要目标是在推进基础工作和促进应用方面发展原创和创新的方法。预计这项研究将为最大限度地利用现有大规模数据提供有价值的见解,并扩大现有战略和研究的范围。预计它将对统计界以及包括公共卫生、医学研究和数据科学在内的其他领域产生重大影响。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Yi, Grace其他文献
Assessing trauma and related distress in refugee youth and their caregivers: should we be concerned about iatrogenic effects?
- DOI:
10.1007/s00787-020-01635-z - 发表时间:
2021-09 - 期刊:
- 影响因子:6.4
- 作者:
Greene, M. Claire;Kane, Jeremy C.;Bolton, Paul;Murray, Laura K.;Wainberg, Milton L.;Yi, Grace;Sim, Amanda;Puffer, Eve;Ismael, Abdulkadir;Hall, Brian J. - 通讯作者:
Hall, Brian J.
The Effect of Intimate Partner Violence and Probable Traumatic Brain Injury on Mental Health Outcomes for Black Women
- DOI:
10.1080/10926771.2019.1587657 - 发表时间:
2019-01-01 - 期刊:
- 影响因子:1.8
- 作者:
Cimino, Andrea N.;Yi, Grace;Stockman, Jamila K. - 通讯作者:
Stockman, Jamila K.
Yi, Grace的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Yi, Grace', 18)}}的其他基金
Statistical Analysis of Complex Featured Data: High Dimensionality, Measurement Error and Missing Values
复杂特征数据的统计分析:高维、测量误差和缺失值
- 批准号:
RGPIN-2018-03819 - 财政年份:2022
- 资助金额:
$ 2.71万 - 项目类别:
Discovery Grants Program - Individual
Statistical Analysis of Complex Featured Data: High Dimensionality, Measurement Error and Missing Values
复杂特征数据的统计分析:高维、测量误差和缺失值
- 批准号:
RGPIN-2018-03819 - 财政年份:2021
- 资助金额:
$ 2.71万 - 项目类别:
Discovery Grants Program - Individual
Statistical Analysis of Complex Featured Data: High Dimensionality, Measurement Error and Missing Values
复杂特征数据的统计分析:高维、测量误差和缺失值
- 批准号:
RGPIN-2018-03819 - 财政年份:2020
- 资助金额:
$ 2.71万 - 项目类别:
Discovery Grants Program - Individual
Statistical Analysis of Complex Featured Data: High Dimensionality, Measurement Error and Missing Values
复杂特征数据的统计分析:高维、测量误差和缺失值
- 批准号:
RGPIN-2018-03819 - 财政年份:2019
- 资助金额:
$ 2.71万 - 项目类别:
Discovery Grants Program - Individual
Statistical Analysis of Complex Featured Data: High Dimensionality, Measurement Error and Missing Values
复杂特征数据的统计分析:高维、测量误差和缺失值
- 批准号:
RGPIN-2018-03819 - 财政年份:2018
- 资助金额:
$ 2.71万 - 项目类别:
Discovery Grants Program - Individual
Statistical Methods on Challenging Issues of Biosciences
生物科学难题的统计方法
- 批准号:
239733-2013 - 财政年份:2017
- 资助金额:
$ 2.71万 - 项目类别:
Discovery Grants Program - Individual
相似国自然基金
Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:合作创新研究团队
Intelligent Patent Analysis for Optimized Technology Stack Selection:Blockchain BusinessRegistry Case Demonstration
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:外国学者研究基金项目
基于Meta-analysis的新疆棉花灌水增产模型研究
- 批准号:41601604
- 批准年份:2016
- 资助金额:22.0 万元
- 项目类别:青年科学基金项目
大规模微阵列数据组的meta-analysis方法研究
- 批准号:31100958
- 批准年份:2011
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
用“后合成核磁共振分析”(retrobiosynthetic NMR analysis)技术阐明青蒿素生物合成途径
- 批准号:30470153
- 批准年份:2004
- 资助金额:22.0 万元
- 项目类别:面上项目
相似海外基金
REU Site: University of North Carolina at Greensboro - Complex Data Analysis using Statistical and Machine Learning Tools
REU 站点:北卡罗来纳大学格林斯伯勒分校 - 使用统计和机器学习工具进行复杂数据分析
- 批准号:
2244160 - 财政年份:2023
- 资助金额:
$ 2.71万 - 项目类别:
Standard Grant
Statistical models for the integrative analysis of complex biomedical images with manifold structure
具有流形结构的复杂生物医学图像综合分析的统计模型
- 批准号:
10590469 - 财政年份:2023
- 资助金额:
$ 2.71万 - 项目类别:
Statistical Analysis of Complex Featured Data: High Dimensionality, Measurement Error and Missing Values
复杂特征数据的统计分析:高维、测量误差和缺失值
- 批准号:
RGPIN-2018-03819 - 财政年份:2022
- 资助金额:
$ 2.71万 - 项目类别:
Discovery Grants Program - Individual
Statistical Challenges and Methods in the Analysis of High Dimensional and Complex Structured Data
高维复杂结构化数据分析中的统计挑战和方法
- 批准号:
RGPIN-2018-05475 - 财政年份:2022
- 资助金额:
$ 2.71万 - 项目类别:
Discovery Grants Program - Individual
Sampling Designs and Statistical Methods for the Analysis of Complex Life History and Genetic Data
用于分析复杂生活史和遗传数据的抽样设计和统计方法
- 批准号:
RGPIN-2020-05528 - 财政年份:2022
- 资助金额:
$ 2.71万 - 项目类别:
Discovery Grants Program - Individual
Statistical methods of multivariate analysis for large and complex data
海量复杂数据的多元分析统计方法
- 批准号:
RGPIN-2016-05880 - 财政年份:2022
- 资助金额:
$ 2.71万 - 项目类别:
Discovery Grants Program - Individual
Statistical Analysis of Complex Featured Data: High Dimensionality, Measurement Error and Missing Values
复杂特征数据的统计分析:高维、测量误差和缺失值
- 批准号:
RGPIN-2018-03819 - 财政年份:2021
- 资助金额:
$ 2.71万 - 项目类别:
Discovery Grants Program - Individual
Statistical methods of multivariate analysis for large and complex data
海量复杂数据的多元分析统计方法
- 批准号:
RGPIN-2016-05880 - 财政年份:2021
- 资助金额:
$ 2.71万 - 项目类别:
Discovery Grants Program - Individual
Sampling Designs and Statistical Methods for the Analysis of Complex Life History and Genetic Data
用于分析复杂生活史和遗传数据的抽样设计和统计方法
- 批准号:
RGPIN-2020-05528 - 财政年份:2021
- 资助金额:
$ 2.71万 - 项目类别:
Discovery Grants Program - Individual
Statistical Challenges and Methods in the Analysis of High Dimensional and Complex Structured Data
高维复杂结构化数据分析中的统计挑战和方法
- 批准号:
RGPIN-2018-05475 - 财政年份:2021
- 资助金额:
$ 2.71万 - 项目类别:
Discovery Grants Program - Individual