SIPHS: Semantic interpretation of personal health messages for generating public health summaries

SIPHS:个人健康信息的语义解释以生成公共卫生摘要

基本信息

  • 批准号:
    EP/M005089/1
  • 负责人:
  • 金额:
    $ 123.85万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Fellowship
  • 财政年份:
    2015
  • 资助国家:
    英国
  • 起止时间:
    2015 至 无数据
  • 项目状态:
    已结题

项目摘要

Open online data such as microblogs and discussion board messages have the potential to be an incredibly valuable source of information about health in populations. Such data has been rapidly growing, is low cost, real-time and seems likely to cover a significant proportion of the demographic. To take two examples, PatientsLikeMe has enjoyed 10% growth and now has over 200,000 users covering over 1500 health conditions; the generic Twitter service is expanding at a rate of 30% annually with over 200 million active users. Going beyond simple keyword search and harnessing this data for public health represents both an opportunity and a challenge to natural language processing (NLP). This fellowship proposal is about helping health experts leverage social media for their own clinical and scientific studies through automatic techniques that encode messages according to a machine understandable semantic representation. There are three major challenges this project seeks to address: (1) knowledge brokering: to develop algorithms to identify and code the informal descriptions of conditions, treatments, medications, behaviours and attitudes to standard ontologies such as the UMLS; (2) knowledge management: to create a structured resource of patient vocabulary used in blog texts and link it to existing coding systems; and (3) adding insight to evidence: to work with domain experts to utilize the coded information to automatically generate meaningful summaries for follow up investigation. At the technological level the fellowship seeks to pioneer new methods for NLP and machine learning (ML). Social media remains a challenging area for NLP for a variety of reasons: short de-contextualised messages, high levels of ambiguity/out of vocabulary words, use of slang and an evolving vocabulary, as well as inherent bias towards sensational topics. The fellowship seeks to harness the progress made so far in NLP for social media analysis in the commercial domain and develop it further to provide meaningful public health evidence. One key aspect not previously addressed is in the clinical coding of patient messages. Although knowledge brokering systems exist for clinical and scientific texts (e.g. the NLM's MetaMap), their performance on social media messages has been poor. The fellowship will utilise the rich availability of ontological resources in biomedicine together with ML on annotated message data to disambiguate informal language. Research will also aim to understanding the communicative function of messages, for example whether the message reports direct experience or is related to news, humour or marketing. If these problems are successfully overcome an important barrier to data integration with other types of clinical data will be removed. The advantage of providing health coding for social media reports is its potential for studying very-large scale cohorts and also in real-time early alerting of aberrations. In the fellowship I will research the potential for multi-variate time series alerting from semantically coded features, working with domain experts to evaluate across a range of metrics (e.g. sensitivity, timeliness, false alerting rates). A variety of approaches will be explored to generate real time risk summaries across social media sources. Two real-world applications have been chosen to take this forwards: early alerting for Adverse drug reactions (ADRs) and Infectious disease surveillance (IDS). Project outcomes will include fundamental technologies as well as open source algorithms, data sets and ontology. An exciting aspect of this fellowship is inter-disciplinary collaboration across stakeholders at all levels: scientists, public health experts and industry. Finally, participation will be opened up to the international community through the release of open source data. Colleagues working on social media technologies will be invited to participate in discussions with users at a new challenge evaluation workshop.
微博和讨论区消息等开放在线数据有可能成为有关人群健康的极其有价值的信息来源。此类数据一直在快速增长,成本低廉,实时,并且似乎可能覆盖很大一部分人口。举两个例子,PatientsLikeMe 增长了 10%,目前拥有超过 20 万用户,覆盖 1500 多种健康状况;通用 Twitter 服务正以每年 30% 的速度扩张,活跃用户超过 2 亿。超越简单的关键词搜索并利用这些数据促进公共卫生对自然语言处理 (NLP) 来说既是机遇也是挑战。该奖学金提案旨在帮助健康专家通过根据机器可理解的语义表示对消息进行编码的自动技术,利用社交媒体进行自己的临床和科学研究。该项目试图解决三个主要挑战:(1)知识代理:开发算法来识别和编码对标准本体(例如 UMLS)的状况、治疗、药物、行为和态度的非正式描述; (2) 知识管理:创建博客文本中使用的患者词汇的结构化资源,并将其链接到现有的编码系统; (3) 为证据增添洞察力:与领域专家合作,利用编码信息自动生成有意义的摘要以供后续调查。 At the technological level the fellowship seeks to pioneer new methods for NLP and machine learning (ML).出于多种原因,社交媒体仍然是 NLP 的一个具有挑战性的领域:简短的脱离语境的消息、高度歧义/词汇外的单词、俚语和不断发展的词汇的使用,以及对耸人听闻的话题的固有偏见。该奖学金旨在利用 NLP 迄今为止在商业领域社交媒体分析方面取得的进展,并进一步发展它以提供有意义的公共卫生证据。先前未解决的一个关键方面是患者消息的临床编码。尽管存在针对临床和科学文本的知识代理系统(例如 NLM 的 MetaMap),但它们在社交媒体消息上的表现一直很差。该奖学金将利用生物医学领域丰富的本体资源以及带注释的消息数据的机器学习来消除非正式语言的歧义。研究还将旨在了解消息的交流功能,例如消息是否报告直接经验或与新闻、幽默或营销相关。如果成功克服这些问题,与其他类型的临床数据进行数据集成的重要障碍将被消除。为社交媒体报告提供健康编码的优势在于它具有研究超大规模群体以及实时早期异常警报的潜力。在奖学金中,我将研究语义编码特征的多变量时间序列警报的潜力,与领域专家合作评估一系列指标(例如灵敏度、及时性、错误警报率)。将探索各种方法来跨社交媒体源生成实时风险摘要。我们选择了两个现实世界的应用来推动这一进程:药物不良反应 (ADR) 的早期预警和传染病监测 (IDS)。项目成果将包括基础技术以及开源算法、数据集和本体。该奖学金的一个令人兴奋的方面是各级利益相关者之间的跨学科合作:科学家、公共卫生专家和行业。最后,将通过开源数据的发布向国际社会开放参与。研究社交媒体技术的同事将被邀请参加新的挑战评估研讨会上与用户的讨论。

项目成果

期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
BioReddit: Word Embeddings for User-Generation Biomedical NLP
BioReddit:用于用户生成生物医学 NLP 的词嵌入
  • DOI:
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Basaldella M
  • 通讯作者:
    Basaldella M
WSDM 2017 Workshop on Mining Online Health Reports
WSDM 2017 挖掘在线健康报告研讨会
  • DOI:
    10.1145/3018661.3022761
  • 发表时间:
    2017
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Collier N
  • 通讯作者:
    Collier N
COMETA: A Corpus for Medical Entity Linking in the Social Media
  • DOI:
    10.18653/v1/2020.emnlp-main.253
  • 发表时间:
    2020-10
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Marco Basaldella;Fangyu Liu;Ehsan Shareghi;Nigel Collier
  • 通讯作者:
    Marco Basaldella;Fangyu Liu;Ehsan Shareghi;Nigel Collier
A Richer-but-Smarter Shortest Dependency Path with Attentive Augmentation for Relation Extraction
  • DOI:
    10.18653/v1/n19-1298
  • 发表时间:
    2019-06
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Duy-Cat Can;Hoang-Quynh Le;Quang-Thuy Ha;Nigel Collier
  • 通讯作者:
    Duy-Cat Can;Hoang-Quynh Le;Quang-Thuy Ha;Nigel Collier
BioReddit: Word Embeddings for User-Generated Biomedical NLP
  • DOI:
    10.18653/v1/d19-6205
  • 发表时间:
    2019-11
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Marco Basaldella;Nigel Collier
  • 通讯作者:
    Marco Basaldella;Nigel Collier
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Nigel Collier其他文献

Text Readability and Coreference Annotation across Heterogeneous Media for the Digital Archive of Rare Books
善本数字档案馆跨异构媒体的文本可读性和共指注释
Incorporating topic information into semantic analysis models
将主题信息纳入语义分析模型
  • DOI:
    10.3115/1219044.1219069
  • 发表时间:
    2004
  • 期刊:
  • 影响因子:
    1.6
  • 作者:
    Tony Mullen;Nigel Collier
  • 通讯作者:
    Nigel Collier
Synthetic Examples Improve Cross-Target Generalization: A Study on Stance Detection on a Twitter corpus.
综合示例提高跨目标泛化:Twitter 语料库上的立场检测研究。
Annotation of Biomedical Texts for Zone Analysis
用于区域分析的生物医学文本注释
  • DOI:
  • 发表时间:
    2004
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Y. Mizuta;Tony Mullen;Nigel Collier
  • 通讯作者:
    Nigel Collier
在来産業の展開と資本主義, 有志舎, 佐々木寛司, 勝部真人編『講座, 明治維新』第8巻
本土产业与资本主义的发展,由志社、佐佐木宏、胜部正人(主编)讲义、明治维新第8卷
  • DOI:
  • 发表时间:
    2013
  • 期刊:
  • 影响因子:
    0
  • 作者:
    杉山将;Nigel Collier;高田輝子;山崎志郎;中西聡;Yoshiaki Ogura and Hirofumi Uchida;Shingo IOKIBE;林 采成;北澤満;山本達司;T.Takada,A.Inoue;冨善一敏;松村敏弘;湯澤規子;Keiichi Hori and Hiroshi Osano;谷本 雅之
  • 通讯作者:
    谷本 雅之

Nigel Collier的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Nigel Collier', 18)}}的其他基金

EPI-AI: Automated Understanding and Alerting of Disease Outbreaks from Global News Media
EPI-AI:自动了解全球新闻媒体的疾病爆发并发出警报
  • 批准号:
    ES/T012277/1
  • 财政年份:
    2020
  • 资助金额:
    $ 123.85万
  • 项目类别:
    Research Grant
PheneBank: automatic extraction and validation of a database of human phenotype-disease associations in the scientific literature
PheneBank:自动提取和验证科学文献中人类表型与疾病关联的数据库
  • 批准号:
    MR/M025160/1
  • 财政年份:
    2015
  • 资助金额:
    $ 123.85万
  • 项目类别:
    Research Grant

相似海外基金

Collective Machine Learning for Semantic Data Interpretation
用于语义数据解释的集体机器学习
  • 批准号:
    RGPIN-2017-06320
  • 财政年份:
    2022
  • 资助金额:
    $ 123.85万
  • 项目类别:
    Discovery Grants Program - Individual
Preclinical markers of Alzheimer's disease using psycholinguistic semantic measures
使用心理语言语义测量的阿尔茨海默病临床前标记
  • 批准号:
    10617408
  • 财政年份:
    2022
  • 资助金额:
    $ 123.85万
  • 项目类别:
Preclinical markers of Alzheimer's disease using psycholinguistic semantic measures
使用心理语言语义测量的阿尔茨海默病临床前标记
  • 批准号:
    10629395
  • 财政年份:
    2022
  • 资助金额:
    $ 123.85万
  • 项目类别:
Diversity: Preclinical markers of Alzheimer's disease using psycholinguistic semantic measures
多样性:使用心理语言语义测量的阿尔茨海默病临床前标志物
  • 批准号:
    10818171
  • 财政年份:
    2022
  • 资助金额:
    $ 123.85万
  • 项目类别:
Preclinical markers of Alzheimer's disease using psycholinguistic semantic measures
使用心理语言语义测量的阿尔茨海默病临床前标记
  • 批准号:
    10810563
  • 财政年份:
    2022
  • 资助金额:
    $ 123.85万
  • 项目类别:
Collective Machine Learning for Semantic Data Interpretation
用于语义数据解释的集体机器学习
  • 批准号:
    RGPIN-2017-06320
  • 财政年份:
    2021
  • 资助金额:
    $ 123.85万
  • 项目类别:
    Discovery Grants Program - Individual
Principles and Mechanisms of Semantic Coercion for the Interpretation of Derived Adjectives in English
英语派生形容词解释的语义强制原理与机制
  • 批准号:
    21K20031
  • 财政年份:
    2021
  • 资助金额:
    $ 123.85万
  • 项目类别:
    Grant-in-Aid for Research Activity Start-up
A study on the prosodic influence on the semantic interpretation
韵律对语义解释的影响研究
  • 批准号:
    20K00601
  • 财政年份:
    2020
  • 资助金额:
    $ 123.85万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Collective Machine Learning for Semantic Data Interpretation
用于语义数据解释的集体机器学习
  • 批准号:
    RGPIN-2017-06320
  • 财政年份:
    2020
  • 资助金额:
    $ 123.85万
  • 项目类别:
    Discovery Grants Program - Individual
Collective Machine Learning for Semantic Data Interpretation
用于语义数据解释的集体机器学习
  • 批准号:
    507903-2017
  • 财政年份:
    2019
  • 资助金额:
    $ 123.85万
  • 项目类别:
    Discovery Grants Program - Accelerator Supplements
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了