A privacy-preserving socio-demographic enrichment framework for big data and its empirical application
保护隐私的大数据社会人口丰富框架及其实证应用
基本信息
- 批准号:ES/W005352/1
- 负责人:
- 金额:$ 15.13万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Fellowship
- 财政年份:2021
- 资助国家:英国
- 起止时间:2021 至 无数据
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Recent decades have seen a substantial growth in the awareness and demand for privacy preservation set in legislation and by the public. This can be partly attributed to the proliferation of information and communications technologies, which generates large amounts of data (i.e. big data) when used. In parallel, the growing availability of big data has created opportunities for their applications, drawing upon the insights they can provide insights into people's daily behaviour patterns. However, the applicability of anonymous big data has been limited in behaviour-based analysis because of the essential role of socio-demographic information as exogenous determinants of human behaviour. Therefore, a plethora of studies have emerged to predict the absent socio-demographic attributes of respondents in various big data sources, termed the socio-demographic enrichment of big data.Existing socio-demographic enrichment approaches use either performance-based data mining and machine learning methods or statistical-fit-oriented models, which typically lack theoretical underpinnings that can explain or justify the postulated relationship between respondents' behaviour patterns (input features) and their socio-demographic attributes (output of the enrichment). A theoretical underpinning is, however, crucial because microeconomic consumer theory suggests that people's behaviour is driven by their socio-demographic attributes. One immediate consequence to neglect the underpinning microeconomic and/or sociological behavioural theoriesconcerns the incapability of existing methods to either predict the quality of enrichment or interpret the change in their performance due to the variation in data distributions. This motivates my PhD research in which I propose and formalise a new enrichment framework, called the Inverse Discrete Choice Modelling (IDCM) framework. The IDCM socio-demographic enrichment framework allows to quantitatively understand the trade-offs between enrichment accuracy and privacy preservation. Specifically, the IDCM approach performs statistical inversion to a discrete choice model (DCM), which is a well-established modelling technique relying on explicit behavioural assumptions grounded in social science, including microeconomics, sociology and psychology. The IDCM performance theory is established to estimate the IDCM enrichment performance based on known information about the data distribution in the enriched sample. This is enabled by drawing an analogy of human behaviour in information theory, i.e. observed individual as a 'message' transmitted over an information communication channel, which allows to use several powerful information-theoretic concepts to mathematically link how well we can predict who the person is and his/her privacy.So far, the ability of the IDCM performance theory is developed for socio-demographic enrichment of observation of a single, binary behaviour feature. To improve the empirical enrichment performance, the aim of the proposed research project is to extend the current IDCM approach by including multiple behaviour patterns as the input features. This can be achieved by using several DCMs that respectively captures the relationship between each behaviour feature and the enriched attribute and then to find the value of the socio-demographic attribute that is most likely to result in the joint behaviour patterns. The proposed extension of the IDCM approach involves the incorporation of machine learning or deep learning algorithms, applied to extract meaningful behaviour patterns, from raw big data, that can be further employed as the input feature for the subsequent IDCM enrichment. Correspondingly, the accompanying IDCM performance theory will be extended accordingly to accommodate the estimation of the enrichment performance based on the use of multiple behaviour features to retain transferability of the proposed extension of the IDCM methodology.
近几十年来,立法和公众对隐私保护的认识和需求大幅增长。这部分归因于信息和通信技术的扩散,这些技术在使用时产生大量数据(即大数据)。与此同时,大数据的日益可用性为它们的应用创造了机会,利用它们可以提供人们日常行为模式的见解。然而,匿名大数据在基于行为的分析中的适用性有限,因为社会人口信息作为人类行为的外生决定因素发挥着至关重要的作用。因此,大量的研究已经出现,以预测在各种大数据源中的受访者的缺失的社会人口统计属性,称为大数据的社会人口统计富集。现有的社会人口统计富集方法使用基于性能的数据挖掘和机器学习方法或以适应性为导向的模型,这通常缺乏理论基础,可以解释或证明受访者的行为模式之间的假设关系(输入特征)及其社会人口属性(丰富的输出)。然而,理论基础是至关重要的,因为微观经济消费者理论表明,人们的行为是由他们的社会人口属性驱动的。忽视微观经济学和/或社会学行为理论的一个直接后果是,现有方法无法预测富集的质量或解释由于数据分布的变化而导致的性能变化。这激发了我的博士研究,我提出并正式提出了一个新的丰富框架,称为逆离散选择模型(IDCM)框架。IDCM的社会人口富集框架允许定量地了解富集准确性和隐私保护之间的权衡。具体而言,IDCM方法对离散选择模型(DCM)进行统计反演,DCM是一种成熟的建模技术,依赖于基于社会科学(包括微观经济学,社会学和心理学)的明确行为假设。建立了IDCM性能理论,根据已知的数据分布信息估计IDCM的富集性能。这是通过在信息理论中对人类行为进行类比来实现的,即观察到的个体作为通过信息通信信道传输的“消息”,这允许使用几个强大的信息理论概念来数学地联系我们可以预测这个人是谁以及他/她的隐私。IDCM性能理论的能力被开发用于对单个二元行为特征的观察的社会人口统计学富集。为了提高经验丰富的性能,建议的研究项目的目的是扩展目前的IDCM方法,包括多个行为模式作为输入功能。这可以通过使用几个DCM来实现,这些DCM分别捕获每个行为特征与丰富属性之间的关系,然后找到最有可能导致联合行为模式的社会人口统计属性的值。IDCM方法的拟议扩展涉及机器学习或深度学习算法的合并,应用于从原始大数据中提取有意义的行为模式,这些模式可以进一步用作后续IDCM富集的输入特征。相应地,伴随的IDCM性能理论将相应地扩展,以适应基于使用多个行为特征的富集性能的估计,以保留IDCM方法的拟议扩展的可转移性。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Yuanying Zhao其他文献
A Semiparametric Bayesian approach to Simplex Regression Models with Heterogeneous Dispersion
异质分散单纯形回归模型的半参数贝叶斯方法
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
Xingde Duan;Yuanying Zhao;Anmin Tang - 通讯作者:
Anmin Tang
A Semiparametric Bayesian approach to Simplex Regression Models with Heterogeneous Dispersion
- DOI:
10. 1080/03610918.2018.1458129. - 发表时间:
- 期刊:
- 影响因子:
- 作者:
Xingde Duan;Yuanying Zhao;Anmin Tang - 通讯作者:
Anmin Tang
Non-selfsimilar global solutions to a two-dimensional system of conservation laws
二维守恒定律系统的非自相似全局解
- DOI:
10.1002/mma.4398 - 发表时间:
2017 - 期刊:
- 影响因子:2.9
- 作者:
Yicheng Pang;Jinhuan Wang;Yuanying Zhao - 通讯作者:
Yuanying Zhao
Bayesian Inference for Double Generalized Linear Regression Models of the Inverse Gaussian Distribution
逆高斯分布的双重广义线性回归模型的贝叶斯推理
- DOI:
- 发表时间:
2014-05 - 期刊:
- 影响因子:0.2
- 作者:
Dengke Xu;Yuanying Zhao;Weijie Chen - 通讯作者:
Weijie Chen
Bayesian Subset Selection for Reproductive Dispersion Linear Models
生殖扩散线性模型的贝叶斯子集选择
- DOI:
10.1515/jssi-2014-0077 - 发表时间:
2014-01 - 期刊:
- 影响因子:0
- 作者:
Yuanying Zhao;Dengke Xu;Xingde Duan;Yicheng Pang - 通讯作者:
Yicheng Pang
Yuanying Zhao的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
面向MANET的密钥管理关键技术研究
- 批准号:61173188
- 批准年份:2011
- 资助金额:52.0 万元
- 项目类别:面上项目
相似海外基金
Design and Analysis of Structure Preserving Discretizations to Simulate Pattern Formation in Liquid Crystals and Ferrofluids
模拟液晶和铁磁流体中图案形成的结构保持离散化的设计和分析
- 批准号:
2409989 - 财政年份:2024
- 资助金额:
$ 15.13万 - 项目类别:
Standard Grant
CAREER: Architectural Foundations for Practical Privacy-Preserving Computation
职业:实用隐私保护计算的架构基础
- 批准号:
2340137 - 财政年份:2024
- 资助金额:
$ 15.13万 - 项目类别:
Continuing Grant
Collaborative Research: SHF: Small: Efficient and Scalable Privacy-Preserving Neural Network Inference based on Ciphertext-Ciphertext Fully Homomorphic Encryption
合作研究:SHF:小型:基于密文-密文全同态加密的高效、可扩展的隐私保护神经网络推理
- 批准号:
2412357 - 财政年份:2024
- 资助金额:
$ 15.13万 - 项目类别:
Standard Grant
Collaborative Research: CIF-Medium: Privacy-preserving Machine Learning on Graphs
合作研究:CIF-Medium:图上的隐私保护机器学习
- 批准号:
2402815 - 财政年份:2024
- 资助金额:
$ 15.13万 - 项目类别:
Standard Grant
Structure-Preserving Integrators for Lévy-Driven Stochastic Systems
Levy 驱动随机系统的结构保持积分器
- 批准号:
EP/Y033248/1 - 财政年份:2024
- 资助金额:
$ 15.13万 - 项目类别:
Research Grant
HarmonicAI: Human-guided collaborative multi-objective design of explainable, fair and privacy-preserving AI for digital health
HarmonicAI:用于数字健康的可解释、公平和隐私保护人工智能的人工引导协作多目标设计
- 批准号:
EP/Z000262/1 - 财政年份:2024
- 资助金额:
$ 15.13万 - 项目类别:
Research Grant
Preserving dark skies with neuromorphic camera technology
利用神经形态相机技术保护黑暗天空
- 批准号:
ST/Y50998X/1 - 财政年份:2024
- 资助金额:
$ 15.13万 - 项目类别:
Research Grant
Structure theory for measure-preserving systems, additive combinatorics, and correlations of multiplicative functions
保测系统的结构理论、加法组合学和乘法函数的相关性
- 批准号:
2347850 - 财政年份:2024
- 资助金额:
$ 15.13万 - 项目类别:
Continuing Grant
Collaborative Research: CIF-Medium: Privacy-preserving Machine Learning on Graphs
合作研究:CIF-Medium:图上的隐私保护机器学习
- 批准号:
2402817 - 财政年份:2024
- 资助金额:
$ 15.13万 - 项目类别:
Standard Grant
Collaborative Research: CIF-Medium: Privacy-preserving Machine Learning on Graphs
合作研究:CIF-Medium:图上的隐私保护机器学习
- 批准号:
2402816 - 财政年份:2024
- 资助金额:
$ 15.13万 - 项目类别:
Standard Grant