Collaborative Research: SaTC: CORE: Small: Differentially Private Data Synthesis: Practical Algorithms and Statistical Foundations
协作研究:SaTC:核心:小型:差分隐私数据合成:实用算法和统计基础
基本信息
- 批准号:2247795
- 负责人:
- 金额:$ 30万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-07-15 至 2026-06-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Data collected by organizations and agencies are a key resource in today’s information age and fuel a significant part of today's economy. However, the disclosure of those data poses serious threats to individual privacy. One important approach to using data while protecting privacy is differential private data synthesis (DPDS). That is, given as input a private dataset, one uses a differentially private algorithm to generate synthetic datasets that are “similar” to the input dataset. While DPDS has received much attention in recent years, our understanding on this topic remains limited. This project takes a multi-disciplinary approach to advance our scientific understanding as well as improve practice techniques for DPDS. More specifically, this project’s novelties are as follows. First, it systematically explores the design space in marginal-based DPDS algorithms that have been proven to be effective in NIST competitions on DPDS, while also taking insights from data synthesis techniques developed in similar fields (often not satisfying DP). Second, it develops statistical theories that both are motivated by the empirical performances of DPDS algorithms, and guide the empirical research of these algorithms. The project’s broader significance and importance are as follows. We are in the information economy. Data of all kinds, such as online interaction, medical sensor data, genomic data, and location data are being collected. Practical techniques that enable use of these data while protecting individual privacy are crucially needed and will greatly enhance the value of such data. Users will gain from increased control of their private information, and society as a whole will benefit from deriving maximal benefit from aggregated data. PIs plan to jointly develop and teach a graduate-level course on synthetic data based on the existing research in this area as well as research results from this project, and involve undergraduate students in research. This project has two thrusts. The first thrust aims to develop new marginal-based DPDS algorithms that improve upon the state-of-art in empirical evaluations. The tasks include: perform an in-depth study of the “marginal-to-dataset” problem (how to synthesize a dataset when given a set of marginals); develop and evaluate new approaches for handling numerical attributes; and develop adaptive and automated techniques for selecting marginals so that dataset synthesized with them captures as much useful information from the input dataset as possible. The second thrust complements the empirical research in the first thrust, and aims to develop statistical theory for high dimensional marginal-based data synthesis algorithms, and also a general learning theory framework to evaluate the utility of synthetic data in downstream tasks. The two thrusts are highly complementary and support each other. The experimental study in Thrust 1 will provide insights and directions for theoretical studies in Thrust 2, which will help explain the experimental findings as well as guide additional experimental studies.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
组织和机构收集的数据是当今信息时代的关键资源,也是当今经济的重要组成部分。然而,这些数据的披露对个人隐私构成严重威胁。 在保护隐私的同时使用数据的一种重要方法是差分私有数据合成(DPDS)。 也就是说,给定私有数据集作为输入,使用差分私有算法来生成与输入数据集“相似”的合成数据集。虽然DPDS近年来受到了广泛关注,但我们对这个话题的了解仍然有限。该项目采用多学科的方法来推进我们对DPDS的科学理解,并提高DPDS的实践技术。更具体地说,这个项目的新奇之处如下。首先,它系统地探讨了基于边缘的DPDS算法的设计空间,这些算法在NIST的DPDS竞赛中已被证明是有效的,同时还从类似领域开发的数据综合技术(通常不满足DP)中获得了见解。 其次,发展了基于DPDS算法的实证性能的统计理论,并指导了这些算法的实证研究。 该项目的更广泛意义和重要性如下。我们处于信息经济时代。正在收集各种数据,例如在线交互、医疗传感器数据、基因组数据和位置数据。迫切需要能够在保护个人隐私的同时使用这些数据的实用技术,这将大大提高这些数据的价值。用户将受益于对其私人信息的更多控制,而整个社会将受益于从聚合数据中获得最大利益。PI计划根据该领域的现有研究以及本项目的研究成果,共同开发和教授研究生水平的合成数据课程,并让本科生参与研究。 这个项目有两个重点。第一个推力的目的是开发新的边缘为基础的DPDS算法,提高了国家的最先进的经验评估。 这些任务包括:深入研究“边缘到数据集”问题(在给定一组边缘时如何合成数据集);开发和评估处理数值属性的新方法;开发用于选择边缘的自适应和自动化技术,以便用它们合成的数据集从输入数据集中捕获尽可能多的有用信息。第二个推力补充了第一个推力中的实证研究,旨在为基于高维边缘的数据合成算法开发统计理论,并建立一个通用的学习理论框架来评估合成数据在下游任务中的效用。这两个方向是高度互补和相互支持的。推力1的实验研究将为推力2的理论研究提供见解和方向,这将有助于解释实验发现,并指导其他实验研究。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(11)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
FairRR: Pre-Processing for Group Fairness through Randomized Response
FairRR:通过随机响应进行群体公平性预处理
- DOI:
- 发表时间:2024
- 期刊:
- 影响因子:0
- 作者:Zeng, Xianli;Ward, Joshua;Cheng, Guang
- 通讯作者:Cheng, Guang
Improving Adversarial Robustness Through the Contrastive-Guided Diffusion Process
- DOI:
- 发表时间:2022-10
- 期刊:
- 影响因子:0
- 作者:Yidong Ouyang;Liyan Xie;Guang Cheng
- 通讯作者:Yidong Ouyang;Liyan Xie;Guang Cheng
AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing
- DOI:10.48550/arxiv.2310.15479
- 发表时间:2023-10
- 期刊:
- 影响因子:0
- 作者:Namjoon Suh;Xiaofeng Lin;Din-Yin Hsieh;Merhdad Honarkhah;Guang Cheng
- 通讯作者:Namjoon Suh;Xiaofeng Lin;Din-Yin Hsieh;Merhdad Honarkhah;Guang Cheng
Binary Classification under Local Label Differential Privacy Using Randomized Response Mechanisms
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:Shi Xu;Chendi Wang;W. Sun;Guang Cheng
- 通讯作者:Shi Xu;Chendi Wang;W. Sun;Guang Cheng
Optimal Convergence Rates of Deep Convolutional Neural Networks: Additive Ridge Functions
- DOI:
- 发表时间:2022-02
- 期刊:
- 影响因子:0
- 作者:Zhiying Fang;Guang Cheng
- 通讯作者:Zhiying Fang;Guang Cheng
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Guang Cheng其他文献
PDA-cross-linked beta-cyclodextrin: a novel adsorbent for the removal of BPA and cationic dyes.
PDA 交联 β-环糊精:一种用于去除 BPA 和阳离子染料的新型吸附剂。
- DOI:
10.2166/wst.2020.286 - 发表时间:
2020-06 - 期刊:
- 影响因子:2.7
- 作者:
Jianyu Wang;Guang Cheng;Jian Lu;Huafeng Chen;Yanbo Zhou - 通讯作者:
Yanbo Zhou
RBAS: A Real-Time User Behavior Analysis System for Internet TV in Cloud Computing
RBAS:云计算下的互联网电视实时用户行为分析系统
- DOI:
10.1145/2935663.2935664 - 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
C. Zhu;Guang Cheng;Xiaojun Guo;Yuxiang Wang - 通讯作者:
Yuxiang Wang
BadGD: A unified data-centric framework to identify gradient descent vulnerabilities
BadGD:一个以数据为中心的统一框架,用于识别梯度下降漏洞
- DOI:
10.48550/arxiv.2405.15979 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
ChiHua Wang;Guang Cheng - 通讯作者:
Guang Cheng
TimeAutoDiff: Combining Autoencoder and Diffusion model for time series tabular data synthesizing
TimeAutoDiff:结合自动编码器和扩散模型进行时间序列表格数据合成
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Namjoon Suh;Yuning Yang;Din;Qitong Luan;Shirong Xu;Shixiang Zhu;Guang Cheng - 通讯作者:
Guang Cheng
HIGHER ORDER SEMIPARAMETRIC FREQUENTIST INFERENCE WITH THE PROFILE SAMPLER
使用配置文件采样器进行高阶半参数频率推理
- DOI:
- 发表时间:
2006 - 期刊:
- 影响因子:0
- 作者:
Guang Cheng;M. Kosorok - 通讯作者:
M. Kosorok
Guang Cheng的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Guang Cheng', 18)}}的其他基金
Conference: UCLA Synthetic Data Workshop
会议:加州大学洛杉矶分校综合数据研讨会
- 批准号:
2309349 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
I-Corps: Trustworthy Synthetic Data Generation
I-Corps:值得信赖的综合数据生成
- 批准号:
2317549 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: Nonparametric Bayesian Aggregation for Massive Data
协作研究:海量数据的非参数贝叶斯聚合
- 批准号:
1712907 - 财政年份:2017
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
Collaborative Research: Semiparametric ODE Models for Complex Gene Regulatory Networks
合作研究:复杂基因调控网络的半参数 ODE 模型
- 批准号:
1418202 - 财政年份:2014
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
CAREER: Bootstrap M-estimation in Semi-Nonparametric Models
职业:半非参数模型中的 Bootstrap M 估计
- 批准号:
1151692 - 财政年份:2012
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
General Semiparametric Inference via Bootstrap Sampling
通过 Bootstrap 采样进行一般半参数推理
- 批准号:
0906497 - 财政年份:2009
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
相似国自然基金
Research on Quantum Field Theory without a Lagrangian Description
- 批准号:24ZR1403900
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
Cell Research
- 批准号:31224802
- 批准年份:2012
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research
- 批准号:31024804
- 批准年份:2010
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research (细胞研究)
- 批准号:30824808
- 批准年份:2008
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
- 批准号:10774081
- 批准年份:2007
- 资助金额:45.0 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: SaTC: CORE: Medium: Using Intelligent Conversational Agents to Empower Adolescents to be Resilient Against Cybergrooming
合作研究:SaTC:核心:中:使用智能会话代理使青少年能够抵御网络诱骗
- 批准号:
2330940 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
Collaborative Research: SaTC: CORE: Medium: Differentially Private SQL with flexible privacy modeling, machine-checked system design, and accuracy optimization
协作研究:SaTC:核心:中:具有灵活隐私建模、机器检查系统设计和准确性优化的差异化私有 SQL
- 批准号:
2317232 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
Collaborative Research: NSF-BSF: SaTC: CORE: Small: Detecting malware with machine learning models efficiently and reliably
协作研究:NSF-BSF:SaTC:核心:小型:利用机器学习模型高效可靠地检测恶意软件
- 批准号:
2338301 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
Collaborative Research: SaTC: CORE: Medium: Differentially Private SQL with flexible privacy modeling, machine-checked system design, and accuracy optimization
协作研究:SaTC:核心:中:具有灵活隐私建模、机器检查系统设计和准确性优化的差异化私有 SQL
- 批准号:
2317233 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
Collaborative Research: NSF-BSF: SaTC: CORE: Small: Detecting malware with machine learning models efficiently and reliably
协作研究:NSF-BSF:SaTC:核心:小型:利用机器学习模型高效可靠地检测恶意软件
- 批准号:
2338302 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
Collaborative Research: SaTC: CORE: Medium: Using Intelligent Conversational Agents to Empower Adolescents to be Resilient Against Cybergrooming
合作研究:SaTC:核心:中:使用智能会话代理使青少年能够抵御网络诱骗
- 批准号:
2330941 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
Collaborative Research: SaTC: CORE: Small: Towards Secure and Trustworthy Tree Models
协作研究:SaTC:核心:小型:迈向安全可信的树模型
- 批准号:
2413046 - 财政年份:2024
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: SaTC: EDU: RoCCeM: Bringing Robotics, Cybersecurity and Computer Science to the Middled School Classroom
合作研究:SaTC:EDU:RoCCeM:将机器人、网络安全和计算机科学带入中学课堂
- 批准号:
2312057 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: SaTC: CORE: Small: Investigation of Naming Space Hijacking Threat and Its Defense
协作研究:SaTC:核心:小型:命名空间劫持威胁及其防御的调查
- 批准号:
2317830 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant
Collaborative Research: SaTC: CORE: Small: Towards a Privacy-Preserving Framework for Research on Private, Encrypted Social Networks
协作研究:SaTC:核心:小型:针对私有加密社交网络研究的隐私保护框架
- 批准号:
2318843 - 财政年份:2023
- 资助金额:
$ 30万 - 项目类别:
Continuing Grant