CAREER: Scalable Record Linkage through the Microclustering Property
职业:通过微集群属性实现可扩展的记录链接
基本信息
- 批准号:1652431
- 负责人:
- 金额:$ 45万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-05-15 至 2023-04-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Duplicative information across multiple databases is a common problem, whether one is trying to accurately estimate the number of patients who have died from sepsis in the United States, the number of people who live in a congressional district, or the number of individuals who have died in armed conflicts. Before such questions can be answered accurately, duplicated information from databases must be removed in a systematic and accurate way. In the research literature, this process is commonly known as record linkage, de-duplication, or entity resolution. This CAREER award will develop general methods and scalable algorithms for record linkage so that pressing global issues can be addressed in real time or near real time. The modeling and computational tools to be developed will significantly increase the volume of data that can be analyzed. This project will enable researchers to address a broader range of scientific questions and advance research in multiple domains, including precision medicine, official statistics, and human rights. To facilitate these advances and encourage further development, all algorithms will be released as open source software. In terms of education, the investigator will expand the Youth in Machine Learning (YiML) program to enable 50 high school students and 50 undergraduate students per year to participate in the bootcamp and skills-building workshops offered. This will enhance the pipeline of students prepared to study machine learning in future years. At an international level, the investigator will teach workshops at the International Society for Bayesian Analysis Meeting, including a YiML workshop for women.This research project will develop flexible, general Bayesian nonparametric models for record linkage tasks that propagate the amount of linkage error exactly. The project also will develop scalable record linkage algorithms. By drawing on recent advances in clustering, Bayesian nonparametrics, and probablistic dimension-reduction algorithms, this project will advance the state-of-the-art in record linkage. The models and algorithms to be developed will attempt to solve the microclustering problem, which is at the core of this research. In collaboration with domain experts, the investigator will test the new methods using data sets from health care, official statistics, and human rights. The resulting estimates may provide useful information for policy makers in these areas.
无论是试图准确估计美国死于败血症的患者人数、居住在国会选区的人数还是在武装冲突中死亡的人数,多个数据库中的重复信息都是一个常见问题。在准确回答此类问题之前,必须系统、准确地删除数据库中的重复信息。在研究文献中,此过程通常称为记录链接、重复数据删除或实体解析。该职业奖将开发用于记录链接的通用方法和可扩展算法,以便可以实时或近实时地解决紧迫的全球问题。待开发的建模和计算工具将显着增加可分析的数据量。该项目将使研究人员能够解决更广泛的科学问题,并推进多个领域的研究,包括精准医学、官方统计和人权。 为了促进这些进步并鼓励进一步发展,所有算法都将作为开源软件发布。在教育方面,研究人员将扩大青年机器学习 (YiML) 计划,每年让 50 名高中生和 50 名本科生参加所提供的训练营和技能建设研讨会。这将增强准备在未来几年学习机器学习的学生的渠道。在国际层面上,研究人员将在国际贝叶斯分析学会会议上讲授研讨会,包括为女性举办的 YiML 研讨会。该研究项目将开发灵活的通用贝叶斯非参数模型,用于记录链接任务,精确传播链接误差量。该项目还将开发可扩展的记录链接算法。通过利用聚类、贝叶斯非参数和概率降维算法的最新进展,该项目将推进记录链接的最先进技术。待开发的模型和算法将尝试解决微聚类问题,这是本研究的核心。研究人员将与领域专家合作,使用来自医疗保健、官方统计和人权的数据集测试新方法。由此产生的估计可能为这些领域的政策制定者提供有用的信息。
项目成果
期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
A Unified Framework for De-Duplication and Population Size Estimation
重复数据删除和总体规模估计的统一框架
- DOI:10.1214/19-ba1146
- 发表时间:2020
- 期刊:
- 影响因子:4.4
- 作者:Tancredi, A.
- 通讯作者:Tancredi, A.
A Practical Approach to Proper Inference with Linked Data
使用关联数据进行正确推理的实用方法
- DOI:10.1080/00031305.2022.2041482
- 发表时间:2022
- 期刊:
- 影响因子:0
- 作者:Kaplan, Andee;Betancourt, Brenda;Steorts, Rebecca C.
- 通讯作者:Steorts, Rebecca C.
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Rebecca Steorts其他文献
Rebecca Steorts的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Rebecca Steorts', 18)}}的其他基金
Collaborative Research: Record Linkage and Privacy-Preserving Methods for Big Data
协作研究:大数据的记录链接和隐私保护方法
- 批准号:
1534412 - 财政年份:2015
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
相似国自然基金
Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:合作创新研究团队
相似海外基金
Scalable indoor power harvesters using halide perovskites
使用卤化物钙钛矿的可扩展室内能量收集器
- 批准号:
MR/Y011686/1 - 财政年份:2025
- 资助金额:
$ 45万 - 项目类别:
Fellowship
DREAM Sentinels: Multiplexable and programmable cell-free ADAR-mediated RNA sensing platform (cfRADAR) for quick and scalable response to emergent viral threats
DREAM Sentinels:可复用且可编程的无细胞 ADAR 介导的 RNA 传感平台 (cfRADAR),可快速、可扩展地响应突发病毒威胁
- 批准号:
2319913 - 财政年份:2024
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
Collaborative Research: Scalable Nanomanufacturing of Perovskite-Analogue Nanocrystals via Continuous Flow Reactors
合作研究:通过连续流反应器进行钙钛矿类似物纳米晶体的可扩展纳米制造
- 批准号:
2315997 - 财政年份:2024
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
RestoreDNA: Development of scalable eDNA-based solutions for biodiversity regulators and nature-related disclosure
RestoreDNA:为生物多样性监管机构和自然相关披露开发可扩展的基于 eDNA 的解决方案
- 批准号:
10086990 - 财政年份:2024
- 资助金额:
$ 45万 - 项目类别:
Collaborative R&D
Scalable and Automated Tuning of Spin-based Quantum Computer Architectures
基于自旋的量子计算机架构的可扩展和自动调整
- 批准号:
2887634 - 财政年份:2024
- 资助金额:
$ 45万 - 项目类别:
Studentship
FAST CAR-T: Faster, Adaptive and Scalable Technologies For CAR-T Manufacture
FAST CAR-T:更快、自适应和可扩展的 CAR-T 制造技术
- 批准号:
EP/Z532770/1 - 财政年份:2024
- 资助金额:
$ 45万 - 项目类别:
Research Grant
CAREER: Scalable Physics-Inspired Ising Computing for Combinatorial Optimizations
职业:用于组合优化的可扩展物理启发伊辛计算
- 批准号:
2340453 - 财政年份:2024
- 资助金额:
$ 45万 - 项目类别:
Continuing Grant
Collaborative Research: SHF: Small: Efficient and Scalable Privacy-Preserving Neural Network Inference based on Ciphertext-Ciphertext Fully Homomorphic Encryption
合作研究:SHF:小型:基于密文-密文全同态加密的高效、可扩展的隐私保护神经网络推理
- 批准号:
2412357 - 财政年份:2024
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
SHF: Small: QED - A New Approach to Scalable Verification of Hardware Memory Consistency
SHF:小型:QED - 硬件内存一致性可扩展验证的新方法
- 批准号:
2332891 - 财政年份:2024
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
SBIR Phase I: Scalable Magnetically-Geared Modular Space Manipulator for In-space Manufacturing and Active Debris Remediation Missions
SBIR 第一阶段:用于太空制造和主动碎片修复任务的可扩展磁力齿轮模块化空间操纵器
- 批准号:
2335583 - 财政年份:2024
- 资助金额:
$ 45万 - 项目类别:
Standard Grant