Computational and Statistical Approaches to Regression Problems in the Presence of Linkage Errors

存在联动误差时回归问题的计算和统计方法

基本信息

  • 批准号:
    2120318
  • 负责人:
  • 金额:
    $ 35万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-09-15 至 2024-08-31
  • 项目状态:
    已结题

项目摘要

This research project will develop computational tools for minimizing the impact of mismatched records on subsequent data analysis. To adequately address a research question of interest, multiple data sources often need to be combined. Record linkage is the process of identifying matched records in multiple data sources pertaining to the same entity. Advances in record linkage and computation yield substantial opportunities for creating rich data products. At the same time, high data volumes, data quality issues, and the need for data anonymization and privacy increase the potential for mismatch error that can considerably disrupt subsequent analysis and in turn lead to incorrect conclusions. The tools to be developed in this project will help leverage the potential inherent in linked data by improving the integrity of a significant range of downstream statistical analyses. All technical developments resulting from this project will be released as open-source software. Research results will be applied to large-scale survey data analysis and data linkages of interest to the Federal statistical agencies. The investigators will integrate the results of this project into their educational activities and will offer hands-on tutorials to train students, professionals, and scientists in the analysis of linked data. The project also will provide research opportunities and support for graduate students.This research project will build on techniques in high-dimensional statistics and optimization to develop a suite of methods adjusting for and correcting mismatch error along with uncertainty quantification. The investigators will tackle a variety of problems whose solutions will require an appropriate balance of statistical, algorithmic, and practical aspects pertaining to specific real data applications. The statistical properties of the methods will be rigorously studied theoretically, in simulation studies, and in various contemporary linked data problems. Post-linkage analytic scenarios to be investigated include modern semiparametric regression and common unsupervised multivariate analysis methods that have scarcely been studied in the context of linked data analysis. Advances in optimal transport theory will be leveraged to correct mismatch error and hence improve data quality. This award is supported by the MMS Program and a consortium of Federal statistical agencies.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
本研究项目将开发计算工具,以最大限度地减少不匹配记录对后续数据分析的影响。为了充分解决感兴趣的研究问题,通常需要组合多个数据源。记录链接是在多个数据源中标识属于同一实体的匹配记录的过程。记录连接和计算方面的进展为创造丰富的数据产品提供了大量机会。与此同时,高数据量、数据质量问题以及对数据匿名化和隐私的需求增加了不匹配错误的可能性,这可能会严重干扰后续分析,进而导致错误的结论。本项目拟开发的工具将有助于提高大量下游统计分析的完整性,从而发挥关联数据的内在潜力。该项目产生的所有技术发展将作为开放源码软件发布。研究成果将用于联邦统计机构感兴趣的大规模调查数据分析和数据联系。研究人员将把该项目的结果整合到他们的教育活动中,并将提供实践教程,以培训学生、专业人员和科学家分析关联数据。本研究计画将以高维统计与最佳化技术为基础,发展一套调整与修正不匹配误差沿着不确定度量化的方法。研究人员将解决各种问题,这些问题的解决方案将需要与特定真实的数据应用有关的统计、算法和实践方面的适当平衡。这些方法的统计特性将在理论上、模拟研究中以及各种当代关联数据问题中进行严格的研究。要研究的后连接分析方案包括现代半参数回归和常见的无监督多变量分析方法,这些方法在连接数据分析的背景下几乎没有研究。最佳传输理论的进步将被用来纠正失配误差,从而提高数据质量。该奖项由MMS计划和联邦统计机构联盟支持。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Regularization for Shuffled Data Problems via Exponential Family Priors on the Permutation Group
通过排列群上的指数族先验对混洗数据问题进行正则化
Estimation in exponential family regression based on linked data contaminated by mismatch error
基于受错配误差污染的关联数据的指数族回归估计
  • DOI:
    10.4310/22-sii726
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0.8
  • 作者:
    Wang, Zhenbang;Ben-David, Emanuel;Slawski, Martin
  • 通讯作者:
    Slawski, Martin
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Martin Slawski其他文献

Martin Slawski的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Martin Slawski', 18)}}的其他基金

CRII: CIF: New Directions in Learning from Data with Faulty Correspondence
CRII:CIF:从错误对应的数据中学习的新方向
  • 批准号:
    1849876
  • 财政年份:
    2019
  • 资助金额:
    $ 35万
  • 项目类别:
    Standard Grant

相似海外基金

CAREER: Statistical approaches and computational tools for analyzing spatially-resolved single-cell transcriptomics data
职业:用于分析空间分辨单细胞转录组数据的统计方法和计算工具
  • 批准号:
    2047611
  • 财政年份:
    2021
  • 资助金额:
    $ 35万
  • 项目类别:
    Continuing Grant
Advanced Statistical and Computational approaches for Exposome profiling and integration: applications to cancer and environmental epidemiology
用于暴露组分析和整合的先进统计和计算方法:在癌症和环境流行病学中的应用
  • 批准号:
    2286505
  • 财政年份:
    2019
  • 资助金额:
    $ 35万
  • 项目类别:
    Studentship
Collaborative Research: Novel Computational and Statistical Approaches to Prediction and Estimation
协作研究:预测和估计的新颖计算和统计方法
  • 批准号:
    1841187
  • 财政年份:
    2018
  • 资助金额:
    $ 35万
  • 项目类别:
    Continuing Grant
CDS&E: Computational Riemannian Approaches for Statistical Analysis and Modeling of Complex Structures
CDS
  • 批准号:
    1621787
  • 财政年份:
    2016
  • 资助金额:
    $ 35万
  • 项目类别:
    Continuing Grant
Collaborative Research: Novel Computational and Statistical Approaches to Prediction and Estimation
协作研究:预测和估计的新颖计算和统计方法
  • 批准号:
    1521529
  • 财政年份:
    2015
  • 资助金额:
    $ 35万
  • 项目类别:
    Continuing Grant
Collaborative Research: Novel Computational and Statistical Approaches to Prediction and Estimation
协作研究:预测和估计的新颖计算和统计方法
  • 批准号:
    1521544
  • 财政年份:
    2015
  • 资助金额:
    $ 35万
  • 项目类别:
    Continuing Grant
Phylogenetic clustering: new statistical inference and computational approaches to identify patterns of transmission of HIV and Hepatitis C.
系统发育聚类:识别艾滋病毒和丙型肝炎传播模式的新统计推断和计算方法。
  • 批准号:
    274502
  • 财政年份:
    2012
  • 资助金额:
    $ 35万
  • 项目类别:
    Operating Grants
Computational and mathematical approaches for statistical sequence alignment and phylogenetic inference on emerging parallel architectures
对新兴并行架构进行统计序列比对和系统发育推断的计算和数学方法
  • 批准号:
    200966394
  • 财政年份:
    2011
  • 资助金额:
    $ 35万
  • 项目类别:
    Research Grants
CDI-Type II: Collaborative Research: Integrating Statistical and Computational Approaches to Privacy
CDI-类型 II:协作研究:整合隐私统计和计算方法
  • 批准号:
    0941226
  • 财政年份:
    2010
  • 资助金额:
    $ 35万
  • 项目类别:
    Standard Grant
CDI-Type II: Collaborative Research: Integrating Statistical and Computational Approaches to Privacy
CDI-类型 II:协作研究:整合隐私统计和计算方法
  • 批准号:
    0941553
  • 财政年份:
    2010
  • 资助金额:
    $ 35万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了