权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Methods for the privacy preserving analysis of sensitive health data: text analysis and data visualisation

敏感健康数据隐私保护分析方法：文本分析与数据可视化

基本信息

批准号：
MR/S003959/2
负责人：
Rebecca Wilson
金额：
$ 48.34万
依托单位：
University of Liverpool
依托单位国家：
英国
项目类别：
Fellowship
财政年份：
2020
资助国家：
英国
起止时间：
2020 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=MR%2FS003959%2F2
关键词：
Methods privacy preserving analysis sensitive

项目摘要

The "data revolution" can enhance health/social care, accelerate research and help us to assess new ways to improve health and health-care. But new ways to analyse health data must be used in ways that the public understand, are happy with and appropriately address data privacy and security. This fellowship will develop tools to help scientists and doctors make good use of sensitive health data, while minimising the risk of an individual or their health status becoming known. I will focus on two increasingly important areas of health data use: 1) information from medical text; 2) visual display of data, particularly in augmented reality (AR) or virtual reality (VR). 1) Sensitive text analysisMedical text (eg health records, medical letters) contain patient data over time including identifying information (eg address, next of kin, full date of birth). Although helpful for care and research, use of sensitive medical text is strictly controlled for privacy reasons. Existing methods extract information from text, but may control disclosure risk by deleting identifiable data or grouping patients into blocks. But these procedures are not foolproof: some patients may still be identifiable, and after discarding key information results may be wrong. My fellowship adopts a new approach we have developed for the free software package DataSHIELD. This allows sensitive data to be analysed without being seen/copied and automatically detects and blocks many analyses that may be identifying. My earlier work has shown DataSHIELD can be used on text data and I will extend it to protect the privacy of data extracted from medical text by computer-based text mining tools. This will markedly increase the range of analyses that may be applied to medical text while maintaining confidentiality. I will first work on synthetic (made-up but realistic) text to safely develop and test the new approach. Once I am satisfied the software works, I will apply it to a research project run by Dr Sarah Slight (School of Pharmacy, Newcastle University), asking whether patients treated with many medications ("polypharmacy") have poorer outcomes (eg more falls, hospital admissions). If they do, new policies can be created to control polypharmacy and improve health outcomes. 2) Sensitive data visualisationAR/VR technologies provide a quick way to interpret and understand health data without special technical/scientific expertise. These immersive environments work because they can simultaneously present more pieces of information about someone than can be seen on paper or screen. But this also makes individuals more identifiable. If AR/VR becomes widely used, we must properly understand the disclosure risks and develop ways to protect against them. In 2015, our collaboration with industry partners Masters of Pie and Lumacode won a competition to display Wellcome Trust data in VR. Ongoing work I led extended our work to explore VR visual methods using synthetic data based on the ALSPAC cohort. Together, we built the BigDataVR pilot analysis tool. This fellowship will explore factors determining the risk of identifying someone when using immersive environments like BigDataVR. The findings will be used to develop new ways to create VR compatible graphics via DataSHIELD that convey the "essence" of a data set without full data display which may identify someone. I will create a preliminary proof of concept, using DataSHIELD to send data underpinning visualisation to the free WebVR environment. Once safe visualisation has been shown using the synthetic data, the work will be extended to a real use case based on the polypharmacy project (see above) or on research data released by METADAC (a committee overseeing access to biomedical data from 5 major UK studies). Software created under both work programs will be freely available to researchers, helping doctors and scientists to better analyse sensitive health data while protecting confidentiality.

“数据革命”可以加强卫生/社会保健，加速研究，并帮助我们评估改善卫生和保健的新方法。但分析健康数据的新方法必须以公众理解、满意并适当解决数据隐私和安全问题的方式使用。该奖学金将开发工具，帮助科学家和医生充分利用敏感的健康数据，同时最大限度地减少个人或其健康状况被知晓的风险。我将重点关注健康数据用途：1）来自医学文本的信息; 2）数据的视觉显示，特别是在增强现实（AR）或虚拟现实（VR）中。1)敏感文本分析医疗文本（如健康记录，医疗信件）包含患者数据，包括识别信息（如地址，近亲，完整的出生日期）。虽然有助于护理和研究，但出于隐私原因，敏感医学文本的使用受到严格控制。现有的方法从文本中提取信息，但可以通过删除可识别的数据或将患者分组到块中来控制披露风险。但这些程序并不是万无一失的：有些病人可能仍然是可识别的，在丢弃关键信息后，结果可能是错误的。我的奖学金采用了我们为自由软件包DataSHIELD开发的新方法。这允许在不被看到/复制的情况下分析敏感数据，并自动检测和阻止可能识别的许多分析。我早期的工作表明DataSHIELD可以用于文本数据，我将扩展它以保护基于计算机的文本挖掘工具从医学文本中提取的数据的隐私。这将显著增加可应用于医学文本的分析范围，同时保持机密性。我将首先在合成（虚构但真实）文本上工作，以安全地开发和测试新方法。一旦我对软件的工作感到满意，我将把它应用到莎拉·斯莱特博士（纽卡斯尔大学药学院）运行的一个研究项目中，询问接受多种药物治疗（“多种药物治疗”）的患者是否有更差的结果（例如更多的福尔斯，入院）。如果他们这样做了，就可以制定新的政策来控制多种药物并改善健康结果。2)AR/VR技术提供了一种快速解释和理解健康数据的方法，而无需特殊的技术/科学专业知识。这些沉浸式环境之所以有效，是因为它们可以同时呈现有关某人的更多信息，而不是在纸上或屏幕上看到的。但这也使个人更容易识别。如果AR/VR被广泛使用，我们必须正确理解披露风险，并制定防范风险的方法。2015年，我们与行业合作伙伴Masters of Pie和Lumacode的合作赢得了在VR中显示Wellcome Trust数据的比赛。我领导的正在进行的工作扩展了我们的工作，使用基于ALSPAC队列的合成数据探索VR视觉方法。我们一起构建了BigDataVR试点分析工具。该奖学金将探讨在使用BigDataVR等沉浸式环境时确定识别某人风险的因素。研究结果将用于开发通过DataSHIELD创建VR兼容图形的新方法，这些图形传达了数据集的“本质”，而没有完整的数据显示，这可能会识别某人。我将创建一个初步的概念验证，使用DataSHIELD将支持可视化的数据发送到免费的WebVR环境。一旦使用合成数据显示出安全的可视化，这项工作将扩展到基于多药项目（见上文）或METADAC（一个监督5项主要英国研究的生物医学数据访问的委员会）发布的研究数据的真实的用例。根据这两个工作计划创建的软件将免费提供给研究人员，帮助医生和科学家更好地分析敏感的健康数据，同时保护机密性。

项目成果

期刊论文数量（10）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Adjusting expected deaths for mortality displacement during the COVID-19 pandemic: a model based counterfactual approach at the level of individuals.

调整Covid-19期间死亡率位移的预期死亡人数：一种基于模型的反事实方法在个人水平上。

DOI：
10.1186/s12874-023-01984-8
发表时间：
2023-10-18
期刊：
BMC MEDICAL RESEARCH METHODOLOGY
影响因子：
4
作者：
Holleyman, Richard James;Barnard, Sharmani;Bauer-Staeb, Clarissa;Hughes, Andrew;Dunn, Samantha;Fox, Sebastian;Newton, John N.;Fitzpatrick, Justine;Waller, Zachary;Deehan, David John;Charlett, Andre;Gregson, Celia L.;Wilson, Rebecca;Fryers, Paul;Goldblatt, Peter;Burton, Paul
通讯作者：
Burton, Paul

Privacy preserving data visualizations.

DOI：
10.1140/epjds/s13688-020-00257-4
发表时间：
2021
期刊：
EPJ data science
影响因子：
3.6
作者：
Avraam D;Wilson R;Butters O;Burton T;Nicolaides C;Jones E;Boyd A;Burton P
通讯作者：
Burton P

Reconciling the biomedical data commons and the GDPR: three lessons from the EUCAN ELSI collaboratory.

DOI：
10.1038/s41431-023-01403-y
发表时间：
2024-01
期刊：
European journal of human genetics : EJHG
影响因子：
0
作者：
通讯作者：

PUblications Metadata Augmentation (PUMA) pipeline.

DOI：
10.12688/f1000research.25484.2
发表时间：
2020
期刊：
F1000Research
影响因子：
0
作者：
Butters OW;Wilson RC;Garner H;Burton TWY
通讯作者：
Burton TWY

Recognizing, reporting and reducing the data curation debt of cohort studies.

DOI：
10.1093/ije/dyaa087
发表时间：
2020-08-01
期刊：
International journal of epidemiology
影响因子：
7.7
作者：
Butters OW;Wilson RC;Burton PR
通讯作者：
Burton PR

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Rebecca Wilson其他文献

Pea chloroplast genes encoding a 4kDa polypeptide of photosystem I and a putative enzyme of C1 metabolism

豌豆叶绿体基因编码光系统 I 的 4kDa 多肽和推定的 C1 代谢酶

DOI：
发表时间：
1991
期刊：
Current Genetics
影响因子：
2.5
作者：
Alison G. Smith;Rebecca Wilson;T. Kaethner;D. Willey;J. Gray
通讯作者：
J. Gray

Educational interventions to prevent paediatric abusive head trauma in babies younger than one year old: A systematic review and meta-analyses

DOI：
10.1016/j.chiabu.2022.105935
发表时间：
2022-12-01
期刊：
CHILD ABUSE & NEGLECT
影响因子：
3.400
作者：
Lauren J. Scott;Rebecca Wilson;Philippa Davies;Mark D. Lyttle;Julie Mytton;Sarah Dawson;Sharea Ijaz;Maria Theresa Redaniel;Joanna G. Williams;Jelena Savović
通讯作者：
Jelena Savović

Improving Multimodal Physical Function in Adults with Heterogeneous Chronic Pain: Protocol for A Multisite Feasibility Randomized Control Trial

改善患有异质性慢性疼痛的成年人的多模式身体功能：一项多站点可行性随机对照试验方案

DOI：
10.1016/j.jpain.2024.01.169
发表时间：
2024-04-01
期刊：
JOURNAL OF PAIN
影响因子：
4.000
作者：
Julia E. Hooker;Julie R. Brewer;Katherine McDermott;Millan Kanaya;Tamara J. Somers;Francis Keefe;Sarah Kelleher;Hannah M. Fisher;John Burns;Rebecca Wilson;Ronald Kulich;Gary Polykoff;Robert A. Parker;Jonathan Greenberg;Ana-Maria Vranceanu
通讯作者：
Ana-Maria Vranceanu

Overall survival following breast conserving surgery compared with mastectomy: A systematic review

DOI：
10.1016/j.ejso.2023.03.037
发表时间：
2023-05-01
期刊：
Conference abstract
影响因子：
作者：
Kiran Kasper Rajan;Katherine Fairhurst;Beth Birkbeck;Rebecca Wilson;Jelena Savovic;Chris Holcombe;Shelley Potter
通讯作者：
Shelley Potter

Troubleshooting with magseed and Magtrace during breast cancer surgery

DOI：
10.1016/j.ejso.2022.03.225
发表时间：
2022-05-01
期刊：
Conference abstract
影响因子：
作者：
Abeera Abbas;Rebecca Wilson;Alison Darlington;Chloe Wright;Ioannis Ntanos;Nabila Nasir;Mohammed Absar;Kate Williams
通讯作者：
Kate Williams