权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Preserving Privacy in Medical Data Sets

保护医疗数据集中的隐私

基本信息

批准号：
6620783
负责人：
STAAL A VINTERBO
金额：
$ 38.08万
依托单位：
BRIGHAM AND WOMEN'S HOSPITAL
依托单位国家：
美国
项目类别：
财政年份：
2002
资助国家：
美国
起止时间：
2002-02-01 至 2005-01-31
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/6620783
关键词：
Internet behavioral /social science research tag computer program /software computer simulation computer system design /evaluation data management decision making health care facility information system health care policy human data human rights information dissemination information retrieval mathematical model medical records model design /development patient oriented research statistics /biometry

项目摘要

Privacy is a fundamental right and needs to be protected. For health care related d information, there are regulations for disclosure. These regulations were motivated by the public's concern of breaches of confidentiality that might result in discrimination. The recent progress in electronic medical record technology, the Internet, and the genetic revolution, together with media reports on violations of privacy have generated increasing interest in this topic. A common belief is that sensitive information is more easily available with the use of networked computers. Since total lack of disclosure is not realistic, current regulations require that the "minimal amount" of information be given to a certain party. A thorough study on what constitutes "minimal" for particular types of applications and a "usefulness index" is lacking. An exact quantification of the potential for privacy breach in de-identified or anonymized databases is also lacking. Definition and quantification of these indices is important for decision-making. As we demonstrate, de-identified data sets can still be used for inference and therefore may disclose sensitive information. The use of machine learning methods to verify the remaining functional dependencies in a de- identified data set leads to better understanding of the possible inferences. Anonymization techniques based on logic, statistics, database theory, and machine learning methods can help in the protection of privacy. We will formally define and study anonymity in databases, from a theoretical and a practical standpoint. We will develop and implement algorithms to anonymize data sets that will be in accordance with the balance of anonymity and "usefulness" of the disclosed data sets. We will also develop and implement algorithms to verify the anonymity of a given data set and indicate the type of records that are at highest risk for a privacy attack. We will make our methods and documented tools freely available to researchers via the WWW.

隐私权是一项基本权利，必须受到保护。对于医疗保健相关信息，有披露规定。制定这些条例的动机是公众担心违反保密规定可能导致歧视。最近在电子病历技术、互联网和基因革命方面取得的进展，以及媒体关于侵犯隐私权的报道，使人们对这一专题越来越感兴趣。人们普遍认为，使用联网的计算机更容易获得敏感信息。由于完全不披露是不现实的，目前的条例要求向某一方提供“最低限度”的信息。对于什么是特定类型应用程序的“最低限度”和“有用性指数”，缺乏深入研究。也缺乏对去识别或匿名数据库中隐私泄露潜力的准确量化。这些指数的定义和量化对于决策十分重要。正如我们所证明的，去识别的数据集仍然可以用于推理，因此可能会泄露敏感信息。使用机器学习方法来验证去识别数据集中剩余的函数依赖关系，可以更好地理解可能的推断。基于逻辑、统计、数据库理论和机器学习方法的匿名化技术可以帮助保护隐私。我们将从理论和实践的角度正式定义和研究数据库中的匿名性。我们将开发和实施算法来匿名化数据集，这些算法将符合所披露数据集的匿名性和“有用性”的平衡。我们还将开发和实施算法来验证给定数据集的匿名性，并指出隐私攻击风险最高的记录类型。我们将通过万维网向研究人员免费提供我们的方法和记录工具。