Methods for the privacy preserving analysis of sensitive health data: text analysis and data visualisation
敏感健康数据隐私保护分析方法:文本分析与数据可视化
基本信息
- 批准号:MR/S003959/2
- 负责人:
- 金额:$ 48.34万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Fellowship
- 财政年份:2020
- 资助国家:英国
- 起止时间:2020 至 无数据
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The "data revolution" can enhance health/social care, accelerate research and help us to assess new ways to improve health and health-care. But new ways to analyse health data must be used in ways that the public understand, are happy with and appropriately address data privacy and security. This fellowship will develop tools to help scientists and doctors make good use of sensitive health data, while minimising the risk of an individual or their health status becoming known. I will focus on two increasingly important areas of health data use: 1) information from medical text; 2) visual display of data, particularly in augmented reality (AR) or virtual reality (VR). 1) Sensitive text analysisMedical text (eg health records, medical letters) contain patient data over time including identifying information (eg address, next of kin, full date of birth). Although helpful for care and research, use of sensitive medical text is strictly controlled for privacy reasons. Existing methods extract information from text, but may control disclosure risk by deleting identifiable data or grouping patients into blocks. But these procedures are not foolproof: some patients may still be identifiable, and after discarding key information results may be wrong. My fellowship adopts a new approach we have developed for the free software package DataSHIELD. This allows sensitive data to be analysed without being seen/copied and automatically detects and blocks many analyses that may be identifying. My earlier work has shown DataSHIELD can be used on text data and I will extend it to protect the privacy of data extracted from medical text by computer-based text mining tools. This will markedly increase the range of analyses that may be applied to medical text while maintaining confidentiality. I will first work on synthetic (made-up but realistic) text to safely develop and test the new approach. Once I am satisfied the software works, I will apply it to a research project run by Dr Sarah Slight (School of Pharmacy, Newcastle University), asking whether patients treated with many medications ("polypharmacy") have poorer outcomes (eg more falls, hospital admissions). If they do, new policies can be created to control polypharmacy and improve health outcomes. 2) Sensitive data visualisationAR/VR technologies provide a quick way to interpret and understand health data without special technical/scientific expertise. These immersive environments work because they can simultaneously present more pieces of information about someone than can be seen on paper or screen. But this also makes individuals more identifiable. If AR/VR becomes widely used, we must properly understand the disclosure risks and develop ways to protect against them. In 2015, our collaboration with industry partners Masters of Pie and Lumacode won a competition to display Wellcome Trust data in VR. Ongoing work I led extended our work to explore VR visual methods using synthetic data based on the ALSPAC cohort. Together, we built the BigDataVR pilot analysis tool. This fellowship will explore factors determining the risk of identifying someone when using immersive environments like BigDataVR. The findings will be used to develop new ways to create VR compatible graphics via DataSHIELD that convey the "essence" of a data set without full data display which may identify someone. I will create a preliminary proof of concept, using DataSHIELD to send data underpinning visualisation to the free WebVR environment. Once safe visualisation has been shown using the synthetic data, the work will be extended to a real use case based on the polypharmacy project (see above) or on research data released by METADAC (a committee overseeing access to biomedical data from 5 major UK studies). Software created under both work programs will be freely available to researchers, helping doctors and scientists to better analyse sensitive health data while protecting confidentiality.
“数据革命”可以加强卫生/社会保健,加速研究,并帮助我们评估改善卫生和保健的新方法。但分析健康数据的新方法必须以公众理解、满意并适当解决数据隐私和安全问题的方式使用。该奖学金将开发工具,帮助科学家和医生充分利用敏感的健康数据,同时最大限度地减少个人或其健康状况被知晓的风险。我将重点关注健康数据用途:1)来自医学文本的信息; 2)数据的视觉显示,特别是在增强现实(AR)或虚拟现实(VR)中。1)敏感文本分析医疗文本(如健康记录,医疗信件)包含患者数据,包括识别信息(如地址,近亲,完整的出生日期)。虽然有助于护理和研究,但出于隐私原因,敏感医学文本的使用受到严格控制。现有的方法从文本中提取信息,但可以通过删除可识别的数据或将患者分组到块中来控制披露风险。但这些程序并不是万无一失的:有些病人可能仍然是可识别的,在丢弃关键信息后,结果可能是错误的。我的奖学金采用了我们为自由软件包DataSHIELD开发的新方法。这允许在不被看到/复制的情况下分析敏感数据,并自动检测和阻止可能识别的许多分析。我早期的工作表明DataSHIELD可以用于文本数据,我将扩展它以保护基于计算机的文本挖掘工具从医学文本中提取的数据的隐私。这将显著增加可应用于医学文本的分析范围,同时保持机密性。我将首先在合成(虚构但真实)文本上工作,以安全地开发和测试新方法。一旦我对软件的工作感到满意,我将把它应用到莎拉·斯莱特博士(纽卡斯尔大学药学院)运行的一个研究项目中,询问接受多种药物治疗(“多种药物治疗”)的患者是否有更差的结果(例如更多的福尔斯,入院)。如果他们这样做了,就可以制定新的政策来控制多种药物并改善健康结果。2)AR/VR技术提供了一种快速解释和理解健康数据的方法,而无需特殊的技术/科学专业知识。这些沉浸式环境之所以有效,是因为它们可以同时呈现有关某人的更多信息,而不是在纸上或屏幕上看到的。但这也使个人更容易识别。如果AR/VR被广泛使用,我们必须正确理解披露风险,并制定防范风险的方法。2015年,我们与行业合作伙伴Masters of Pie和Lumacode的合作赢得了在VR中显示Wellcome Trust数据的比赛。我领导的正在进行的工作扩展了我们的工作,使用基于ALSPAC队列的合成数据探索VR视觉方法。我们一起构建了BigDataVR试点分析工具。该奖学金将探讨在使用BigDataVR等沉浸式环境时确定识别某人风险的因素。研究结果将用于开发通过DataSHIELD创建VR兼容图形的新方法,这些图形传达了数据集的“本质”,而没有完整的数据显示,这可能会识别某人。我将创建一个初步的概念验证,使用DataSHIELD将支持可视化的数据发送到免费的WebVR环境。一旦使用合成数据显示出安全的可视化,这项工作将扩展到基于多药项目(见上文)或METADAC(一个监督5项主要英国研究的生物医学数据访问的委员会)发布的研究数据的真实的用例。根据这两个工作计划创建的软件将免费提供给研究人员,帮助医生和科学家更好地分析敏感的健康数据,同时保护机密性。
项目成果
期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Adjusting expected deaths for mortality displacement during the COVID-19 pandemic: a model based counterfactual approach at the level of individuals.
调整Covid-19期间死亡率位移的预期死亡人数:一种基于模型的反事实方法在个人水平上。
- DOI:10.1186/s12874-023-01984-8
- 发表时间:2023-10-18
- 期刊:
- 影响因子:4
- 作者:Holleyman, Richard James;Barnard, Sharmani;Bauer-Staeb, Clarissa;Hughes, Andrew;Dunn, Samantha;Fox, Sebastian;Newton, John N.;Fitzpatrick, Justine;Waller, Zachary;Deehan, David John;Charlett, Andre;Gregson, Celia L.;Wilson, Rebecca;Fryers, Paul;Goldblatt, Peter;Burton, Paul
- 通讯作者:Burton, Paul
Privacy preserving data visualizations.
- DOI:10.1140/epjds/s13688-020-00257-4
- 发表时间:2021
- 期刊:
- 影响因子:3.6
- 作者:Avraam D;Wilson R;Butters O;Burton T;Nicolaides C;Jones E;Boyd A;Burton P
- 通讯作者:Burton P
Reconciling the biomedical data commons and the GDPR: three lessons from the EUCAN ELSI collaboratory.
- DOI:10.1038/s41431-023-01403-y
- 发表时间:2024-01
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
PUblications Metadata Augmentation (PUMA) pipeline.
- DOI:10.12688/f1000research.25484.2
- 发表时间:2020
- 期刊:
- 影响因子:0
- 作者:Butters OW;Wilson RC;Garner H;Burton TWY
- 通讯作者:Burton TWY
Recognizing, reporting and reducing the data curation debt of cohort studies.
- DOI:10.1093/ije/dyaa087
- 发表时间:2020-08-01
- 期刊:
- 影响因子:7.7
- 作者:Butters OW;Wilson RC;Burton PR
- 通讯作者:Burton PR
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Rebecca Wilson其他文献
Pea chloroplast genes encoding a 4kDa polypeptide of photosystem I and a putative enzyme of C1 metabolism
豌豆叶绿体基因编码光系统 I 的 4kDa 多肽和推定的 C1 代谢酶
- DOI:
- 发表时间:
1991 - 期刊:
- 影响因子:2.5
- 作者:
Alison G. Smith;Rebecca Wilson;T. Kaethner;D. Willey;J. Gray - 通讯作者:
J. Gray
Educational interventions to prevent paediatric abusive head trauma in babies younger than one year old: A systematic review and meta-analyses
- DOI:
10.1016/j.chiabu.2022.105935 - 发表时间:
2022-12-01 - 期刊:
- 影响因子:3.400
- 作者:
Lauren J. Scott;Rebecca Wilson;Philippa Davies;Mark D. Lyttle;Julie Mytton;Sarah Dawson;Sharea Ijaz;Maria Theresa Redaniel;Joanna G. Williams;Jelena Savović - 通讯作者:
Jelena Savović
Improving Multimodal Physical Function in Adults with Heterogeneous Chronic Pain: Protocol for A Multisite Feasibility Randomized Control Trial
改善患有异质性慢性疼痛的成年人的多模式身体功能:一项多站点可行性随机对照试验方案
- DOI:
10.1016/j.jpain.2024.01.169 - 发表时间:
2024-04-01 - 期刊:
- 影响因子:4.000
- 作者:
Julia E. Hooker;Julie R. Brewer;Katherine McDermott;Millan Kanaya;Tamara J. Somers;Francis Keefe;Sarah Kelleher;Hannah M. Fisher;John Burns;Rebecca Wilson;Ronald Kulich;Gary Polykoff;Robert A. Parker;Jonathan Greenberg;Ana-Maria Vranceanu - 通讯作者:
Ana-Maria Vranceanu
Overall survival following breast conserving surgery compared with mastectomy: A systematic review
- DOI:
10.1016/j.ejso.2023.03.037 - 发表时间:
2023-05-01 - 期刊:
- 影响因子:
- 作者:
Kiran Kasper Rajan;Katherine Fairhurst;Beth Birkbeck;Rebecca Wilson;Jelena Savovic;Chris Holcombe;Shelley Potter - 通讯作者:
Shelley Potter
Troubleshooting with magseed and Magtrace during breast cancer surgery
- DOI:
10.1016/j.ejso.2022.03.225 - 发表时间:
2022-05-01 - 期刊:
- 影响因子:
- 作者:
Abeera Abbas;Rebecca Wilson;Alison Darlington;Chloe Wright;Ioannis Ntanos;Nabila Nasir;Mohammed Absar;Kate Williams - 通讯作者:
Kate Williams
Rebecca Wilson的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Rebecca Wilson', 18)}}的其他基金
Methods for the privacy preserving analysis of sensitive health data: text analysis and data visualisation
敏感健康数据隐私保护分析方法:文本分析与数据可视化
- 批准号:
MR/S003959/1 - 财政年份:2018
- 资助金额:
$ 48.34万 - 项目类别:
Fellowship
相似国自然基金
面向MANET的密钥管理关键技术研究
- 批准号:61173188
- 批准年份:2011
- 资助金额:52.0 万元
- 项目类别:面上项目
相似海外基金
CAREER: Architectural Foundations for Practical Privacy-Preserving Computation
职业:实用隐私保护计算的架构基础
- 批准号:
2340137 - 财政年份:2024
- 资助金额:
$ 48.34万 - 项目类别:
Continuing Grant
Collaborative Research: SHF: Small: Efficient and Scalable Privacy-Preserving Neural Network Inference based on Ciphertext-Ciphertext Fully Homomorphic Encryption
合作研究:SHF:小型:基于密文-密文全同态加密的高效、可扩展的隐私保护神经网络推理
- 批准号:
2412357 - 财政年份:2024
- 资助金额:
$ 48.34万 - 项目类别:
Standard Grant
Collaborative Research: CIF-Medium: Privacy-preserving Machine Learning on Graphs
合作研究:CIF-Medium:图上的隐私保护机器学习
- 批准号:
2402815 - 财政年份:2024
- 资助金额:
$ 48.34万 - 项目类别:
Standard Grant
Collaborative Research: CIF-Medium: Privacy-preserving Machine Learning on Graphs
合作研究:CIF-Medium:图上的隐私保护机器学习
- 批准号:
2402817 - 财政年份:2024
- 资助金额:
$ 48.34万 - 项目类别:
Standard Grant
HarmonicAI: Human-guided collaborative multi-objective design of explainable, fair and privacy-preserving AI for digital health
HarmonicAI:用于数字健康的可解释、公平和隐私保护人工智能的人工引导协作多目标设计
- 批准号:
EP/Z000262/1 - 财政年份:2024
- 资助金额:
$ 48.34万 - 项目类别:
Research Grant
Collaborative Research: CIF-Medium: Privacy-preserving Machine Learning on Graphs
合作研究:CIF-Medium:图上的隐私保护机器学习
- 批准号:
2402816 - 财政年份:2024
- 资助金额:
$ 48.34万 - 项目类别:
Standard Grant
HarmonicAI: Human-guided collaborative multi-objective design of explainable, fair and privacy-preserving AI for digital health
HarmonicAI:用于数字健康的可解释、公平和隐私保护人工智能的人工引导协作多目标设计
- 批准号:
EP/Y03743X/1 - 财政年份:2024
- 资助金额:
$ 48.34万 - 项目类别:
Research Grant
HarmonicAI: Human-guided collAboRative Multi-Objective design of explaiNable, faIr and privaCy-preserving AI for digital health
HarmonicAI:用于数字健康的可解释、公平和保护隐私的人工智能的人类引导协作多目标设计
- 批准号:
EP/Z000041/1 - 财政年份:2024
- 资助金额:
$ 48.34万 - 项目类别:
Research Grant
Privacy-preserving machine learning through secure management of data's lifecycle in distributed systems: REMINDER
通过安全管理分布式系统中的数据生命周期来保护隐私的机器学习:提醒
- 批准号:
EP/Y036301/1 - 财政年份:2024
- 资助金额:
$ 48.34万 - 项目类别:
Research Grant
HarmonicAI: Human-guided collaborative multi-objective design of explainable, fair and privacy-preserving AI for digital health
HarmonicAI:用于数字健康的可解释、公平和隐私保护人工智能的人工引导协作多目标设计
- 批准号:
EP/Y037367/1 - 财政年份:2024
- 资助金额:
$ 48.34万 - 项目类别:
Research Grant