权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Preserving Privacy in Medical Data Sets

保护医疗数据集中的隐私

基本信息

批准号：
6733529
负责人：
STAAL A VINTERBO
金额：
$ 40.7万
依托单位：
BRIGHAM AND WOMEN'S HOSPITAL
依托单位国家：
美国
项目类别：
财政年份：
2002
资助国家：
美国
起止时间：
2002-02-01 至 2005-07-31
项目状态：
已结题

项目摘要

Privacy is a fundamental right and needs to be protected. For health care related d information, there are regulations for disclosure. These regulations were motivated by the public's concern of breaches of confidentiality that might result in discrimination. The recent progress in electronic medical record technology, the Internet, and the genetic revolution, together with media reports on violations of privacy have generated increasing interest in this topic. A common belief is that sensitive information is more easily available with the use of networked computers. Since total lack of disclosure is not realistic, current regulations require that the "minimal amount" of information be given to a certain party. A thorough study on what constitutes "minimal" for particular types of applications and a "usefulness index" is lacking. An exact quantification of the potential for privacy breach in de-identified or anonymized databases is also lacking. Definition and quantification of these indices is important for decision-making. As we demonstrate, de-identified data sets can still be used for inference and therefore may disclose sensitive information. The use of machine learning methods to verify the remaining functional dependencies in a de- identified data set leads to better understanding of the possible inferences. Anonymization techniques based on logic, statistics, database theory, and machine learning methods can help in the protection of privacy. We will formally define and study anonymity in databases, from a theoretical and a practical standpoint. We will develop and implement algorithms to anonymize data sets that will be in accordance with the balance of anonymity and "usefulness" of the disclosed data sets. We will also develop and implement algorithms to verify the anonymity of a given data set and indicate the type of records that are at highest risk for a privacy attack. We will make our methods and documented tools freely available to researchers via the WWW.

隐私是一项基本权利，需要得到保护。对于医疗保健相关信息，有披露规定。这些规定的动机是公众担心违反保密规定可能导致歧视。电子病历技术、互联网和基因革命的最新进展，以及媒体对侵犯隐私的报道，使人们对这一主题越来越感兴趣。人们普遍认为，使用联网的计算机更容易获得敏感信息。由于完全不披露是不现实的，目前的规定要求向某一方提供“最少量”的信息。对于什么是特定类型的应用程序的“最小值”和“有用性指数”，缺乏深入的研究。对去身份化或匿名化数据库中隐私泄露可能性的精确量化也缺乏。这些指标的定义和量化对决策具有重要意义。正如我们所展示的，去识别的数据集仍然可以用于推理，因此可能会泄露敏感信息。使用机器学习方法来验证未识别数据集中剩余的功能依赖关系，可以更好地理解可能的推断。基于逻辑、统计学、数据库理论和机器学习方法的匿名化技术可以帮助保护隐私。我们将从理论和实践的角度正式定义和研究数据库中的匿名性。我们将开发和实现算法来匿名化数据集，这将符合匿名性和公开数据集的“有用性”的平衡。我们还将开发和实现算法来验证给定数据集的匿名性，并指出隐私攻击风险最高的记录类型。我们将通过WWW向研究人员免费提供我们的方法和文档工具。