Preserving Privacy in Medical Data Sets
保护医疗数据集中的隐私
基本信息
- 批准号:6733529
- 负责人:
- 金额:$ 40.7万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2002
- 资助国家:美国
- 起止时间:2002-02-01 至 2005-07-31
- 项目状态:已结题
- 来源:
- 关键词:Internetbehavioral /social science research tagcomputer program /softwarecomputer simulationcomputer system design /evaluationconfidentialitydata managementdecision makinghealth care facility information systemhealth care policyhuman datahuman rightsinformation disseminationinformation retrievalmathematical modelmedical recordsmodel design /developmentpatient oriented researchstatistics /biometry
项目摘要
Privacy is a fundamental right and needs to be protected. For health care related d information, there are regulations for disclosure. These regulations were motivated by the public's concern of breaches of confidentiality that might result in discrimination. The recent progress in electronic medical record technology, the Internet, and the genetic revolution, together with media reports on violations of privacy have generated increasing interest in this topic. A common belief is that sensitive information is more easily available with the use of networked computers. Since total lack of disclosure is not realistic, current regulations require that the "minimal amount" of information be given to a certain party. A thorough study on what constitutes "minimal" for particular types of applications and a "usefulness index" is lacking. An exact quantification of the potential for privacy breach in de-identified or anonymized databases is also lacking. Definition and quantification of these indices is important for decision-making. As we demonstrate, de-identified data sets can still be used for inference and therefore may disclose sensitive information. The use of machine learning methods to verify the remaining functional dependencies in a de- identified data set leads to better understanding of the possible inferences. Anonymization techniques based on logic, statistics, database theory, and machine learning methods can help in the protection of privacy. We will formally define and study anonymity in databases, from a theoretical and a practical standpoint. We will develop and implement algorithms to anonymize data sets that will be in accordance with the balance of anonymity and "usefulness" of the disclosed data sets. We will also develop and implement algorithms to verify the anonymity of a given data set and indicate the type of records that are at highest risk for a privacy attack. We will make our methods and documented tools freely available to researchers via the WWW.
隐私是一项基本权利,需要得到保护。对于医疗保健相关信息,有披露规定。这些规定的动机是公众担心违反保密规定可能导致歧视。电子病历技术、互联网和基因革命的最新进展,以及媒体对侵犯隐私的报道,使人们对这一主题越来越感兴趣。人们普遍认为,使用联网的计算机更容易获得敏感信息。由于完全不披露是不现实的,目前的规定要求向某一方提供“最少量”的信息。对于什么是特定类型的应用程序的“最小值”和“有用性指数”,缺乏深入的研究。对去身份化或匿名化数据库中隐私泄露可能性的精确量化也缺乏。这些指标的定义和量化对决策具有重要意义。正如我们所展示的,去识别的数据集仍然可以用于推理,因此可能会泄露敏感信息。使用机器学习方法来验证未识别数据集中剩余的功能依赖关系,可以更好地理解可能的推断。基于逻辑、统计学、数据库理论和机器学习方法的匿名化技术可以帮助保护隐私。我们将从理论和实践的角度正式定义和研究数据库中的匿名性。我们将开发和实现算法来匿名化数据集,这将符合匿名性和公开数据集的“有用性”的平衡。我们还将开发和实现算法来验证给定数据集的匿名性,并指出隐私攻击风险最高的记录类型。我们将通过WWW向研究人员免费提供我们的方法和文档工具。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
STAAL A VINTERBO其他文献
STAAL A VINTERBO的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}