Confidentiality and Estimation for Large Sparse Multi-Dimensional Contingency Tables

大型稀疏多维列联表的保密性和估计

基本信息

  • 批准号:
    0631589
  • 负责人:
  • 金额:
    $ 30万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2006
  • 资助国家:
    美国
  • 起止时间:
    2006-09-15 至 2011-08-31
  • 项目状态:
    已结题

项目摘要

This research project deals with two crucial aspects of working with large sparse contingency tables: protecting the confidentiality of responses when data are shared with other researchers, and the implications of sparsity for maximum likelihood estimation in log-linear models. The first problem entails the evaluation of the disclosure risk associated with the partial release of information from a classified database, e.g., in the form of marginal tables involving subsets of variables. The second problem is concerned with developing general-purpose inferential methodologies for model selection and estimation/testing in log-linear model analysis that are appropriate for sparse categorical data. The links between these seemingly separate problems emanate from the common statistical and mathematical formalism of algebraic statistics. This research will produce new computational algorithms and sharable computer code for use by behavioral and social science researchers, as well as foundational methods and theory linking the problems of cell estimation using maximum likelihood and log-linear models and confidentiality protection. The expected outcomes of this activity will include: (1) more effective inferential procedures for the quantitative analysis and interpretation of behavioral and social science data and for the determination of the risk of disclosure; (2) statistical software for the analysis of categorical data targeted at a large audience of practitioners and researchers, which will be developed and freely distributed in the form of both computer source codes and modular, executable files; (3) more efficient numerical procedures for assessing the disclosure risk associated with the release of marginal totals.Log-linear models analysis forms a well-established and powerful set of statistical tools for the study of categorical data, especially in the form of multi-dimentional cross-classifications or multi-way contingency tables, These models have proved to be essential for the analysis of data emanating from many areas of the social and behavioral sciences, as well as in other scientific areas. For example, in a typical sample survey,data are generated for several thousand individuals on a large number of categorical variables, measuring such information on employment, income, health status, etc. The resulting cross-classification of these variables is large, i.e., involving many thousands of cells, and sparse, i.e., most of the cell entries are either very small or contain zero counts. Similar problems arise in the study of social networks, in public health and medicine, and in the analysis of genetics databases. Recent developments in the mathematical area of algebraic geometry have provided a novel and powerful formalism for the representation of log-linear models relevant for such contingency table data. This project will use this mathematical formalism to focus on two different aspects of large sparse contingency tables: (1) Protecting the privacy of the data providers when data are shared with other users, while at the same time (2) Ensuring that such tables are useful for statistical analysis by developing new methods for log-linear model computation. The results of the project will improve access to data for secondary analysis and enhance the capacity of researchers and analysts to exploit the information in large sparse databases.
本研究项目涉及使用大型稀疏列联表的两个关键方面:与其他研究人员共享数据时保护响应的机密性,以及稀疏性对对数线性模型中最大似然估计的影响。第一个问题涉及评估与从机密数据库部分发布信息有关的披露风险,例如,以涉及变量子集的边际表的形式。第二个问题是关于开发通用推理方法的模型选择和估计/测试的对数线性模型分析,适合稀疏的分类数据。这些看似独立的问题之间的联系源于代数统计的常见统计和数学形式。这项研究将产生新的计算算法和可共享的计算机代码,供行为和社会科学研究人员使用,以及将使用最大似然和对数线性模型的细胞估计问题与保密性保护联系起来的基础方法和理论。 这一活动的预期成果将包括:(1)更有效的推理程序,用于对行为和社会科学数据进行定量分析和解释,并用于确定披露的风险;(2)针对大量从业人员和研究人员的分类数据分析统计软件,将以计算机源代码和模块化可执行文件的形式开发和自由分发;(3)采用更有效的数字程序,以评估公布边际总额所带来的披露风险。线性模型分析形成了一套完善的和强大的统计工具,用于研究分类数据,特别是以多维交叉分类或多向列联表的形式,这些模型已被证明是分析来自社会和行为科学的许多领域的数据所必需的,以及其他科学领域。 例如,在一个典型的抽样调查中,为数千人生成大量分类变量的数据,测量有关就业、收入、健康状况等的信息。涉及数千个细胞,并且稀疏,即, 大多数单元条目或者非常小或者包含零计数。 在社交网络的研究、公共卫生和医学以及遗传学数据库的分析中,也出现了类似的问题。 最近的发展,在数学领域的代数几何提供了一个新的和强大的形式主义的对数线性模型相关的列联表数据的表示。 该项目将使用这种数学形式主义来关注大型稀疏列联表的两个不同方面:(1)在与其他用户共享数据时保护数据提供者的隐私,同时(2)通过开发新的对数线性模型计算方法来确保这些表对统计分析有用。 该项目的成果将改善获取数据进行二次分析的机会,并提高研究人员和分析人员利用大型稀疏数据库中信息的能力。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Stephen Fienberg其他文献

Stephen Fienberg的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Stephen Fienberg', 18)}}的其他基金

CDI-Type II: Collaborative Research: Integrating Statistical and Computational Approaches to Privacy
CDI-类型 II:协作研究:整合隐私统计和计算方法
  • 批准号:
    0941518
  • 财政年份:
    2010
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant
Participant Support for Workshop on Statistical Methods for the Analysis of Network Data in in Dublin, Ireland.
爱尔兰都柏林网络数据分析统计方法研讨会的参与者支持。
  • 批准号:
    0924358
  • 财政年份:
    2009
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant
Travel Grant Proposal for Workshop on Data Confidentiality
数据保密研讨会差旅补助金提案
  • 批准号:
    0741571
  • 财政年份:
    2007
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant
Workshop on Privacy and Confidentiality; July 19-26, 2005,Italy.
隐私和保密研讨会;
  • 批准号:
    0517956
  • 财政年份:
    2005
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant
Fifth International Conference on Forensic Statistics
第五届国际法医统计会议
  • 批准号:
    0201814
  • 财政年份:
    2002
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant
International Conference on the Foundation of Statistical Inference: Applications in the Medical and Social Sciences and in Industry and the Interface with Computer Science
统计推断基础国际会议:在医学和社会科学以及工业中的应用以及与计算机科学的接口
  • 批准号:
    0086688
  • 财政年份:
    2000
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant
International Conference on Forensic Statistics, June 30 to July 3, 1996 at the University of Edinburgh, Scotland
国际法医统计会议,1996 年 6 月 30 日至 7 月 3 日,苏格兰爱丁堡大学
  • 批准号:
    9529348
  • 财政年份:
    1996
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant
Mathematical Sciences: Workshops for Statistical Methodologyin Quality and Productivity Improvement
数学科学:质量和生产力改进统计方法研讨会
  • 批准号:
    8912592
  • 财政年份:
    1989
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant
Collaborative Research on Survey Designs and Randomized Experiments
调查设计和随机实验的合作研究
  • 批准号:
    8701606
  • 财政年份:
    1987
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant
Collaborative Research on the Design and Analysis Parallels Between Sample Surveys and Randomized Experiments
抽样调查与随机实验设计与分析并行的协同研究
  • 批准号:
    8406952
  • 财政年份:
    1984
  • 资助金额:
    $ 30万
  • 项目类别:
    Continuing Grant

相似海外基金

LEAPS-MPS: Importance, Significance, and Fairness in Large-Scale Estimation and Testing of Heteroscedastic Data
LEAPS-MPS:异方差数据大规模估计和测试的重要性、意义和公平性
  • 批准号:
    2316746
  • 财政年份:
    2023
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant
Advanced large-scale damage estimation method using deep learning and 3D building models
使用深度学习和 3D 建筑模型的先进大规模损伤估计方法
  • 批准号:
    23K04108
  • 财政年份:
    2023
  • 资助金额:
    $ 30万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Estimation of economic effect on firms and economy triggerd by disaster and anti-infection measure through large-scale supply chains
通过大规模供应链评估灾害和抗感染措施对企业和经济造成的经济影响
  • 批准号:
    21H00743
  • 财政年份:
    2021
  • 资助金额:
    $ 30万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
III: Small: Collaborative Research: Cost-Efficient Sampling and Estimation from Large-Scale Networks
III:小型:协作研究:大规模网络的经济高效采样和估计
  • 批准号:
    2209921
  • 财政年份:
    2021
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant
Quantitative estimation of sea water spray generated by a large-scale wave-overtopping at a vertical seawall
垂直海堤大规模浪翻浪产生海水喷雾的定量估算
  • 批准号:
    21H01438
  • 财政年份:
    2021
  • 资助金额:
    $ 30万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Feasibility Study on Estimation of Structural Damage due to Earthquakes for Large-Spanned Structures
大跨度结构地震损伤估算的可行性研究
  • 批准号:
    20K05056
  • 财政年份:
    2020
  • 资助金额:
    $ 30万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Minimax Optimal Functional Estimation on Large-Scale Discrete Distributions
大规模离散分布的极小极大最优函数估计
  • 批准号:
    20K19750
  • 财政年份:
    2020
  • 资助金额:
    $ 30万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
RAPID: Computational Modeling of Contact Density and Outbreak Estimation for COVID-19 Using Large-scale Geolocation Data from Mobile Devices
RAPID:使用来自移动设备的大规模地理位置数据进行接触密度计算建模和 COVID-19 爆发估计
  • 批准号:
    2028687
  • 财政年份:
    2020
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant
Estimation of potential large wood export in Japan
日本潜在大量木材出口的估计
  • 批准号:
    19H02395
  • 财政年份:
    2019
  • 资助金额:
    $ 30万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Estimation of Trends in Health Issues and Social Burden in Japan: A Simulation Based on Large-Scale Data
日本健康问题和社会负担的趋势估计:基于大规模数据的模拟
  • 批准号:
    19K19458
  • 财政年份:
    2019
  • 资助金额:
    $ 30万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了