Natural language processing in healthcare data

医疗数据中的自然语言处理

基本信息

  • 批准号:
    RGPIN-2019-04701
  • 负责人:
  • 金额:
    $ 1.68万
  • 依托单位:
  • 依托单位国家:
    加拿大
  • 项目类别:
    Discovery Grants Program - Individual
  • 财政年份:
    2019
  • 资助国家:
    加拿大
  • 起止时间:
    2019-01-01 至 2020-12-31
  • 项目状态:
    已结题

项目摘要

Word embeddings (i.e., 'word vectors' or 'distributed representations') are dense numeric representations of words, which serve as input to various statistical machine learning methods. Typically, by optimizing contextual statistics, these embeddings induce latent dimensions that encode aspects of morphology, syntax, and even semantics. The results therefore can capture meaningful relationships among concepts in the data not afforded by traditional methods.******The Vector Institute is partnering with the Institute for Clinical Evaluative Sciences (ICES) around the collaborative use of the EMRALD corpus, which consists of text from a variety of primary care sources (e.g., consult notes, referrals, risk factors, past medical history) sourced from hundreds of doctors in Ontario. EMRALD is an order of magnitude larger, in vocabulary and overall size, than Google's news corpus which is one of the de facto corpora used for training embeddings. Currently, the extremely large vocabulary size appears to produce two main consequences: i) a preponderance of technical terms and their many variants, and b) spelling mistakes. These consequences lead to very sparse contextual matrices.******We have three primary goals in this program of research:******1) To enrich word embeddings with ontological information. Our team has developed a method of 'enriching' embeddings using a multi-task learning approach and normative lexical data, from crowd-sourced statistics. For example, enriching the embedding process with norms of sentiment increases the accuracy not only of sentiment analysis, but out--of--domain tasks as well, e.g., machine translation. Here, we intend to apply a similar approach but with structured ontological information from medical texts and resources. ******2) To produce explainable and private models. It is increasingly important to audit decisions made by classifiers, and to ensure the privacy of personal information in their respective models. For instance, it was recently shown that it is possible to re-identify patients in an anonymized data set using another set of minimally linked data. In order to increase the explainability of our models, we will apply methods such as LIME , text--based 'anchoring', and differential privacy. We will explore whether generative adversarial networks can also synthesize distributions with similar properties. ******3) To perform longitudinal classification. An initial goal will be to use the structured data in EMRALD to perform supervised classification of diagnostic codes given clinical notes. Given the longitudinal nature of the data, this will include recurrent neural networks and convolutional neural networks with attention. We will similarly explore semi-supervised learning either by removing some structured data or adding noise to the labels. The long-term aim is to combine these approaches in order to predict various long-term trends and human trajectories.**
词嵌入(即“词向量”或“分布式表示”)是词的密集数字表示,可作为各种统计机器学习方法的输入。通常,通过优化上下文统计,这些嵌入会产生对形态学、语法甚至语义方面进行编码的潜在维度。因此,结果可以捕捉到传统方法无法提供的数据中概念之间有意义的关系。******病媒研究所正在与临床评价科学研究所(ICES)合作,共同使用EMRALD语料库,该语料库由来自安大略省数百名医生的各种初级保健来源(例如,咨询说明、转诊、风险因素、过去病史)的文本组成。EMRALD在词汇量和总体大小上比b谷歌的新闻语料库大一个数量级,b谷歌的新闻语料库是用于训练嵌入的事实上的语料库之一。目前,极其庞大的词汇量似乎产生了两个主要后果:1)专业术语及其众多变体占主导地位;2)拼写错误。这些结果导致了非常稀疏的上下文矩阵。******我们在这个研究项目中有三个主要目标:******1)用本体信息丰富词嵌入。我们的团队开发了一种“丰富”嵌入的方法,使用多任务学习方法和来自众包统计的规范词汇数据。例如,用情感规范丰富嵌入过程不仅可以提高情感分析的准确性,还可以提高域外任务的准确性,例如机器翻译。在这里,我们打算应用类似的方法,但是使用来自医学文本和资源的结构化本体信息。******2)产生可解释的和私有的模型。审计分类器所做的决策,并在各自的模型中确保个人信息的隐私性变得越来越重要。例如,最近有研究表明,使用另一组最小关联数据可以在匿名数据集中重新识别患者。为了提高我们模型的可解释性,我们将应用LIME、基于文本的“锚定”和差异隐私等方法。我们将探讨生成对抗网络是否也可以合成具有类似性质的分布。******3)纵向分类。最初的目标是使用EMRALD中的结构化数据对给定临床记录的诊断代码进行监督分类。考虑到数据的纵向性质,这将包括递归神经网络和卷积神经网络。我们将通过删除一些结构化数据或在标签上添加噪声来类似地探索半监督学习。长期目标是将这些方法结合起来,以预测各种长期趋势和人类轨迹

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Rudzicz, Frank其他文献

Validating pertussis data measures using electronic medical record data in Ontario, Canada 1986-2016.
  • DOI:
    10.1016/j.jvacx.2023.100408
  • 发表时间:
    2023-12
  • 期刊:
  • 影响因子:
    3.8
  • 作者:
    Mcburney, Shilo H.;Kwong, Jeffrey C.;Brown, Kevin A.;Rudzicz, Frank;Chen, Branson;Candido, Elisa;Crowcroft, Natasha S.
  • 通讯作者:
    Crowcroft, Natasha S.
A Conversational Robot for Older Adults with Alzheimer's Disease
Speech Interaction with Personal Assistive Robots Supporting Aging at Home for Individuals with Alzheimer's Disease
Thinker invariance: enabling deep neural networks for BCI across more people
  • DOI:
    10.1088/1741-2552/abb7a7
  • 发表时间:
    2020-10-01
  • 期刊:
  • 影响因子:
    4
  • 作者:
    Kostas, Demetres;Rudzicz, Frank
  • 通讯作者:
    Rudzicz, Frank
Linguistic Features Identify Alzheimer's Disease in Narrative Speech
  • DOI:
    10.3233/jad-150520
  • 发表时间:
    2016-01-01
  • 期刊:
  • 影响因子:
    4
  • 作者:
    Fraser, Kathleen C.;Meltzer, Jed A.;Rudzicz, Frank
  • 通讯作者:
    Rudzicz, Frank

Rudzicz, Frank的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Rudzicz, Frank', 18)}}的其他基金

Machine learning in surgical safety
机器学习在手术安全中的应用
  • 批准号:
    RGPIN-2020-05910
  • 财政年份:
    2022
  • 资助金额:
    $ 1.68万
  • 项目类别:
    Discovery Grants Program - Individual
Machine learning in surgical safety
机器学习在手术安全中的应用
  • 批准号:
    RGPIN-2020-05910
  • 财政年份:
    2022
  • 资助金额:
    $ 1.68万
  • 项目类别:
    Discovery Grants Program - Individual
Machine learning in surgical safety
机器学习在手术安全中的应用
  • 批准号:
    RGPIN-2020-05910
  • 财政年份:
    2021
  • 资助金额:
    $ 1.68万
  • 项目类别:
    Discovery Grants Program - Individual
Machine learning in surgical safety
机器学习在手术安全中的应用
  • 批准号:
    RGPIN-2020-05910
  • 财政年份:
    2020
  • 资助金额:
    $ 1.68万
  • 项目类别:
    Discovery Grants Program - Individual
Analyzing Child Language Experiences Around the World (ACLEW)
分析世界各地儿童语言经历 (ACLEW)
  • 批准号:
    501769-2016
  • 财政年份:
    2018
  • 资助金额:
    $ 1.68万
  • 项目类别:
    Discovery Frontiers - Digging into Data
A control-theoretic model of speech production and recognition for use within prosthetic communication devices
用于假肢通信设备的语音产生和识别的控制理论模型
  • 批准号:
    435874-2013
  • 财政年份:
    2018
  • 资助金额:
    $ 1.68万
  • 项目类别:
    Discovery Grants Program - Individual
Automatic remote screening of speech features associated with Alzheimer's disease
自动远程筛查与阿尔茨海默病相关的语音特征
  • 批准号:
    508463-2017
  • 财政年份:
    2018
  • 资助金额:
    $ 1.68万
  • 项目类别:
    Collaborative Health Research Projects
Automatic remote screening of speech features associated with Alzheimer's disease
自动远程筛查与阿尔茨海默病相关的语音特征
  • 批准号:
    508463-2017
  • 财政年份:
    2017
  • 资助金额:
    $ 1.68万
  • 项目类别:
    Collaborative Health Research Projects
A control-theoretic model of speech production and recognition for use within prosthetic communication devices
用于假肢通信设备的语音产生和识别的控制理论模型
  • 批准号:
    435874-2013
  • 财政年份:
    2017
  • 资助金额:
    $ 1.68万
  • 项目类别:
    Discovery Grants Program - Individual
Exploiting natural neural control to minimize speech errors among language learners
利用自然神经控制来最大限度地减少语言学习者的言语错误
  • 批准号:
    522798-2017
  • 财政年份:
    2017
  • 资助金额:
    $ 1.68万
  • 项目类别:
    Engage Grants Program

相似国自然基金

儿童音乐能力发展对语言与社会认知能力及脑发育的影响
  • 批准号:
    31971003
  • 批准年份:
    2019
  • 资助金额:
    58.0 万元
  • 项目类别:
    面上项目
面向英汉双向跨语言图像检索的文本分析关键技术研究
  • 批准号:
    61170095
  • 批准年份:
    2011
  • 资助金额:
    57.0 万元
  • 项目类别:
    面上项目
儿童植入耳蜗后听觉行为与言语发展进程的关联性研究
  • 批准号:
    81170916
  • 批准年份:
    2011
  • 资助金额:
    65.0 万元
  • 项目类别:
    面上项目
基于儿童心理分析的图解式汉语口语自动解析方法研究
  • 批准号:
    60175012
  • 批准年份:
    2001
  • 资助金额:
    18.0 万元
  • 项目类别:
    面上项目

相似海外基金

Navigating Chemical Space with Natural Language Processing and Deep Learning
利用自然语言处理和深度学习驾驭化学空间
  • 批准号:
    EP/Y004167/1
  • 财政年份:
    2024
  • 资助金额:
    $ 1.68万
  • 项目类别:
    Research Grant
REU Site: Recent Advances in Natural Language Processing
REU 网站:自然语言处理的最新进展
  • 批准号:
    2349452
  • 财政年份:
    2024
  • 资助金额:
    $ 1.68万
  • 项目类别:
    Standard Grant
Studies of speech, image and natural language processing for multimodal spoken document retrieval
多模态语音文档检索的语音、图像和自然语言处理研究
  • 批准号:
    23K11216
  • 财政年份:
    2023
  • 资助金额:
    $ 1.68万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Efficient and Fair Language Modelling for Natural Language Processing, investigating lightweight language modelling approaches and aiming at fairness
自然语言处理的高效公平语言建模,研究轻量级语言建模方法并以公平为目标
  • 批准号:
    2894795
  • 财政年份:
    2023
  • 资助金额:
    $ 1.68万
  • 项目类别:
    Studentship
SBIR Phase I: Sown To Grow - Measuring Growth in Trusting Relationships between Students and Educators with Natural Language Processing and Machine Learning Technologies
SBIR 第一阶段:播种成长 - 使用自然语言处理和机器学习技术衡量学生和教育工作者之间信任关系的增长
  • 批准号:
    2322340
  • 财政年份:
    2023
  • 资助金额:
    $ 1.68万
  • 项目类别:
    Standard Grant
Collaborative Research: EAGER: Developing and Optimizing Reflection-Informed STEM Learning and Instruction by Integrating Learning Technologies with Natural Language Processing
合作研究:EAGER:通过将学习技术与自然语言处理相结合来开发和优化基于反思的 STEM 学习和教学
  • 批准号:
    2329273
  • 财政年份:
    2023
  • 资助金额:
    $ 1.68万
  • 项目类别:
    Standard Grant
Using Natural Mouse Movement to Establish a Developmental "Biomarker" for Corticospinal Damage
利用自然小鼠运动建立皮质脊髓损伤的发育“生物标志物”
  • 批准号:
    10667807
  • 财政年份:
    2023
  • 资助金额:
    $ 1.68万
  • 项目类别:
Harmony AI: Natural Language Processing Enabling Advanced Biomanufacturing
Harmony AI:自然语言处理实现先进生物制造
  • 批准号:
    10761082
  • 财政年份:
    2023
  • 资助金额:
    $ 1.68万
  • 项目类别:
Collaborative Research: EAGER: Developing and Optimizing Reflection-Informed STEM Learning and Instruction by Integrating Learning Technologies with Natural Language Processing
合作研究:EAGER:通过将学习技术与自然语言处理相结合来开发和优化基于反思的 STEM 学习和教学
  • 批准号:
    2329274
  • 财政年份:
    2023
  • 资助金额:
    $ 1.68万
  • 项目类别:
    Standard Grant
CAREER: Data-driven design of graphene oxide for environmental applications enabled by natural language processing and machine learning techniques
职业:通过自然语言处理和机器学习技术实现氧化石墨烯环境应用的数据驱动设计
  • 批准号:
    2238415
  • 财政年份:
    2023
  • 资助金额:
    $ 1.68万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了