权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

SIPHS: Semantic interpretation of personal health messages for generating public health summaries

SIPHS：个人健康信息的语义解释以生成公共卫生摘要

基本信息

批准号：
EP/M005089/1
负责人：
Nigel Collier
金额：
$ 123.85万
依托单位：
University of Cambridge
依托单位国家：
英国
项目类别：
Fellowship
财政年份：
2015
资助国家：
英国
起止时间：
2015 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FM005089%2F1
关键词：
SIPHS Semantic interpretation personal health

项目摘要

Open online data such as microblogs and discussion board messages have the potential to be an incredibly valuable source of information about health in populations. Such data has been rapidly growing, is low cost, real-time and seems likely to cover a significant proportion of the demographic. To take two examples, PatientsLikeMe has enjoyed 10% growth and now has over 200,000 users covering over 1500 health conditions; the generic Twitter service is expanding at a rate of 30% annually with over 200 million active users. Going beyond simple keyword search and harnessing this data for public health represents both an opportunity and a challenge to natural language processing (NLP). This fellowship proposal is about helping health experts leverage social media for their own clinical and scientific studies through automatic techniques that encode messages according to a machine understandable semantic representation. There are three major challenges this project seeks to address: (1) knowledge brokering: to develop algorithms to identify and code the informal descriptions of conditions, treatments, medications, behaviours and attitudes to standard ontologies such as the UMLS; (2) knowledge management: to create a structured resource of patient vocabulary used in blog texts and link it to existing coding systems; and (3) adding insight to evidence: to work with domain experts to utilize the coded information to automatically generate meaningful summaries for follow up investigation. At the technological level the fellowship seeks to pioneer new methods for NLP and machine learning (ML). Social media remains a challenging area for NLP for a variety of reasons: short de-contextualised messages, high levels of ambiguity/out of vocabulary words, use of slang and an evolving vocabulary, as well as inherent bias towards sensational topics. The fellowship seeks to harness the progress made so far in NLP for social media analysis in the commercial domain and develop it further to provide meaningful public health evidence. One key aspect not previously addressed is in the clinical coding of patient messages. Although knowledge brokering systems exist for clinical and scientific texts (e.g. the NLM's MetaMap), their performance on social media messages has been poor. The fellowship will utilise the rich availability of ontological resources in biomedicine together with ML on annotated message data to disambiguate informal language. Research will also aim to understanding the communicative function of messages, for example whether the message reports direct experience or is related to news, humour or marketing. If these problems are successfully overcome an important barrier to data integration with other types of clinical data will be removed. The advantage of providing health coding for social media reports is its potential for studying very-large scale cohorts and also in real-time early alerting of aberrations. In the fellowship I will research the potential for multi-variate time series alerting from semantically coded features, working with domain experts to evaluate across a range of metrics (e.g. sensitivity, timeliness, false alerting rates). A variety of approaches will be explored to generate real time risk summaries across social media sources. Two real-world applications have been chosen to take this forwards: early alerting for Adverse drug reactions (ADRs) and Infectious disease surveillance (IDS). Project outcomes will include fundamental technologies as well as open source algorithms, data sets and ontology. An exciting aspect of this fellowship is inter-disciplinary collaboration across stakeholders at all levels: scientists, public health experts and industry. Finally, participation will be opened up to the international community through the release of open source data. Colleagues working on social media technologies will be invited to participate in discussions with users at a new challenge evaluation workshop.

微博和讨论区消息等开放在线数据有可能成为有关人群健康的极其有价值的信息来源。此类数据一直在快速增长，成本低廉，实时，并且似乎可能覆盖很大一部分人口。举两个例子，PatientsLikeMe 增长了 10%，目前拥有超过 20 万用户，覆盖 1500 多种健康状况；通用 Twitter 服务正以每年 30% 的速度扩张，活跃用户超过 2 亿。超越简单的关键词搜索并利用这些数据促进公共卫生对自然语言处理 (NLP) 来说既是机遇也是挑战。该奖学金提案旨在帮助健康专家通过根据机器可理解的语义表示对消息进行编码的自动技术，利用社交媒体进行自己的临床和科学研究。该项目试图解决三个主要挑战：（1）知识代理：开发算法来识别和编码对标准本体（例如 UMLS）的状况、治疗、药物、行为和态度的非正式描述； (2) 知识管理：创建博客文本中使用的患者词汇的结构化资源，并将其链接到现有的编码系统； (3) 为证据增添洞察力：与领域专家合作，利用编码信息自动生成有意义的摘要以供后续调查。 At the technological level the fellowship seeks to pioneer new methods for NLP and machine learning (ML).出于多种原因，社交媒体仍然是 NLP 的一个具有挑战性的领域：简短的脱离语境的消息、高度歧义/词汇外的单词、俚语和不断发展的词汇的使用，以及对耸人听闻的话题的固有偏见。该奖学金旨在利用 NLP 迄今为止在商业领域社交媒体分析方面取得的进展，并进一步发展它以提供有意义的公共卫生证据。先前未解决的一个关键方面是患者消息的临床编码。尽管存在针对临床和科学文本的知识代理系统（例如 NLM 的 MetaMap），但它们在社交媒体消息上的表现一直很差。该奖学金将利用生物医学领域丰富的本体资源以及带注释的消息数据的机器学习来消除非正式语言的歧义。研究还将旨在了解消息的交流功能，例如消息是否报告直接经验或与新闻、幽默或营销相关。如果成功克服这些问题，与其他类型的临床数据进行数据集成的重要障碍将被消除。为社交媒体报告提供健康编码的优势在于它具有研究超大规模群体以及实时早期异常警报的潜力。在奖学金中，我将研究语义编码特征的多变量时间序列警报的潜力，与领域专家合作评估一系列指标（例如灵敏度、及时性、错误警报率）。将探索各种方法来跨社交媒体源生成实时风险摘要。我们选择了两个现实世界的应用来推动这一进程：药物不良反应 (ADR) 的早期预警和传染病监测 (IDS)。项目成果将包括基础技术以及开源算法、数据集和本体。该奖学金的一个令人兴奋的方面是各级利益相关者之间的跨学科合作：科学家、公共卫生专家和行业。最后，将通过开源数据的发布向国际社会开放参与。研究社交媒体技术的同事将被邀请参加新的挑战评估研讨会上与用户的讨论。

项目成果

期刊论文数量（10）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

BioReddit: Word Embeddings for User-Generation Biomedical NLP

BioReddit：用于用户生成生物医学 NLP 的词嵌入

DOI：
发表时间：
2019
期刊：
影响因子：
0
作者：
Basaldella M
通讯作者：
Basaldella M

WSDM 2017 Workshop on Mining Online Health Reports

WSDM 2017 挖掘在线健康报告研讨会

DOI：
10.1145/3018661.3022761
发表时间：
2017
期刊：
影响因子：
0
作者：
Collier N
通讯作者：
Collier N

COMETA: A Corpus for Medical Entity Linking in the Social Media

DOI：
10.18653/v1/2020.emnlp-main.253
发表时间：
2020-10
期刊：
ArXiv
影响因子：
0
作者：
Marco Basaldella;Fangyu Liu;Ehsan Shareghi;Nigel Collier
通讯作者：
Marco Basaldella;Fangyu Liu;Ehsan Shareghi;Nigel Collier

A Richer-but-Smarter Shortest Dependency Path with Attentive Augmentation for Relation Extraction

DOI：
10.18653/v1/n19-1298
发表时间：
2019-06
期刊：
影响因子：
0
作者：
Duy-Cat Can;Hoang-Quynh Le;Quang-Thuy Ha;Nigel Collier
通讯作者：
Duy-Cat Can;Hoang-Quynh Le;Quang-Thuy Ha;Nigel Collier

BioReddit: Word Embeddings for User-Generated Biomedical NLP

DOI：
10.18653/v1/d19-6205
发表时间：
2019-11
期刊：
影响因子：
0
作者：
Marco Basaldella;Nigel Collier
通讯作者：
Marco Basaldella;Nigel Collier

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Nigel Collier其他文献

Text Readability and Coreference Annotation across Heterogeneous Media for the Digital Archive of Rare Books

善本数字档案馆跨异构媒体的文本可读性和共指注释

DOI：
发表时间：
2004
期刊：
The Journal of the Institute of Image Electronics Engineers of Japan (in Japanese) Vol.33, No.5
影响因子：
0
作者：
Asanobu Kitamoto;Takeo Yamamoto;Sonoko Sato;Nigel Collier;Ai Kawazoe;Kinji Ono
通讯作者：
Kinji Ono

Incorporating topic information into semantic analysis models

将主题信息纳入语义分析模型

DOI：
10.3115/1219044.1219069
发表时间：
2004
期刊：
Chemistry Letters
影响因子：
1.6
作者：
Tony Mullen;Nigel Collier
通讯作者：
Nigel Collier

Synthetic Examples Improve Cross-Target Generalization: A Study on Stance Detection on a Twitter corpus.

综合示例提高跨目标泛化：Twitter 语料库上的立场检测研究。

DOI：
发表时间：
2021
期刊：
Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
影响因子：
0
作者：
Costanza Conforti;Jakob Berndt;Mohammad Taher Pilehvar;Chryssi Giannitsarou;Flavio Toxvaerd;Nigel Collier
通讯作者：
Nigel Collier

Annotation of Biomedical Texts for Zone Analysis

用于区域分析的生物医学文本注释

DOI：
发表时间：
2004
期刊：
影响因子：
0
作者：
Y. Mizuta;Tony Mullen;Nigel Collier
通讯作者：
Nigel Collier

在来産業の展開と資本主義, 有志舎, 佐々木寛司, 勝部真人編『講座, 明治維新』第8巻

本土产业与资本主义的发展，由志社、佐佐木宏、胜部正人（主编）讲义、明治维新第8卷

DOI：
发表时间：
2013
期刊：
影响因子：
0
作者：
杉山将;Nigel Collier;高田輝子;山崎志郎;中西聡;Yoshiaki Ogura and Hirofumi Uchida;Shingo IOKIBE;林采成;北澤満;山本達司;T.Takada,A.Inoue;冨善一敏;松村敏弘;湯澤規子;Keiichi Hori and Hiroshi Osano;谷本雅之
通讯作者：
谷本雅之