Interactive machine learning methods for clinical natural language processing
用于临床自然语言处理的交互式机器学习方法
基本信息
- 批准号:8818096
- 负责人:
- 金额:$ 55.84万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2010
- 资助国家:美国
- 起止时间:2010-05-31 至 2018-09-28
- 项目状态:已结题
- 来源:
- 关键词:AbbreviationsActive LearningAddressAdoptionAlgorithmsAttentionBiomedical ResearchClassificationClinicalClinical DataClinical InformaticsClinical ResearchCognitiveCommunitiesDataData SetDevelopmentDiseaseEducational workshopElectronic Health RecordFaceGoalsGrantHumanHybridsKnowledgeLabelLearningLinguisticsMachine LearningManualsMedicalMethodologyMethodsModelingNamesNatural Language ProcessingPatientsPatternPerformancePharmaceutical PreparationsPhysiciansProcessResearchResearch PersonnelResearch PriorityResourcesSamplingSolutionsSourceSpecific qualifier valueStatistical MethodsStatistical ModelsSystemTechnologyTestingTextTimeUnited States National Library of Medicinebaseclinical applicationclinical phenotypecohortcomputer human interactioncomputerizedcostexperienceimprovedmodel developmentnovelopen sourcestatisticssuccesstoolusability
项目摘要
DESCRIPTION (provided by applicant): Growing deployments of electronic health records (EHRs) systems have made massive clinical data available electronically. However, much of detailed clinical information of patients is embedded in narrative text and is not directly accessible for computerized clinical applications. Therefore, natural language processing (NLP) technologies, which can unlock information in narrative document, have received great attention in the medical domain. Current state-of-the-art NLP approaches often involve building probabilistic models. However, the wide adoption of statistical methods in clinical NLP faces two grand challenges: 1) the lack of large annotated clinical corpora; and 2) the lack of methodologies that can efficiently integrate linguistic and domain knowledge with statistical learning. High-performance statistical NLP methods rely on large scale and high quality annotations of clinical text, but it is time-consuming and costly to create large annotated clinica corpora as it often requires manual review by physicians. Moreover, the medical domain is knowledge intensive. To achieve optimal performance, probabilistic models need to leverage medical domain knowledge. Therefore, methods that can efficiently integrate domain and expert knowledge with machine learning processes to quickly build high-quality probabilistic models with minimum annotation cost would be highly desirable for clinical text processing.
In this study, we propose to investigate interactive machine learning (IML) methods to address the above challenges in clinical NLP. An IML system builds a classification model in an iterative process, which can actively select informative samples for annotation based on models built on previously annotated samples, thus reducing the annotation cost for model development. More importantly, an IML system also involves human inputs to the learning process (e.g., an expert can specify important features for a classification task based on domain knowledge). Thus, IML is an ideal framework for efficiently integrating rule-based (via domain experts specifying features) and statistics-based (via different learning algorithms) approaches to clinical NLP. To achieve our goal, we propose three specific aims. In Aim 1, we plan to investigate different aspects of IML for word sense disambiguation, including developing new active learning algorithms and conducting cognitive usability analysis for efficient feature annotation by users. To demonstrate the broad uses of IML, we further extend IML approaches to two other important clinical NLP classification tasks: named entity recognition and clinical phenoytping in Aim 2. Finally we propose to disseminate the IML methods and tools to the biomedical research community in Aim 3.
描述(由申请人提供):电子健康记录(EHR)系统的不断部署使大量临床数据可以电子化。然而,许多患者的详细临床信息嵌入到叙述性文本中,不能直接用于计算机化的临床应用。因此,能够解锁叙事性文档中信息的自然语言处理技术在医学领域受到了极大的关注。当前最先进的NLP方法通常涉及构建概率模型。然而,统计方法在临床NLP中的广泛采用面临着两大挑战:1)缺乏大型的标注临床语料库;2)缺乏能够有效地将语言和领域知识与统计学习相结合的方法。高性能的统计NLP方法依赖于大规模和高质量的临床文本标注,但创建大型标注的Clinica语料库既耗时又昂贵,因为它经常需要医生手动审查。此外,医学领域是知识密集型的。为了实现最佳性能,概率模型需要利用医学领域知识。因此,能够有效地将领域知识和专家知识与机器学习过程相结合,以最小的标注代价快速构建高质量的概率模型的方法将非常适合于临床文本处理。
在这项研究中,我们建议研究交互式机器学习(IML)方法来解决临床NLP中的上述挑战。IML系统在迭代过程中建立分类模型,它可以根据先前标注的样本建立的模型来主动选择信息丰富的样本进行标注,从而降低了模型开发的标注成本。更重要的是,IML系统还涉及对学习过程的人工输入(例如,专家可以基于领域知识为分类任务指定重要特征)。因此,IML是将基于规则(通过领域专家指定特征)和基于统计(通过不同的学习算法)的方法有效地集成到临床NLP的理想框架。为了实现我们的目标,我们提出了三个具体目标。在目标1中,我们计划研究IML用于词义消歧的不同方面,包括开发新的主动学习算法和进行认知可用性分析,以便用户有效地进行特征标注。为了展示IML的广泛应用,我们进一步将IML方法扩展到目标2中的另外两个重要的临床NLP分类任务:命名实体识别和临床表型识别。最后,我们建议在目标3中向生物医学研究社区传播IML方法和工具。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
HUA XU其他文献
HUA XU的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('HUA XU', 18)}}的其他基金
Leveraging Longitudinal Data and Informatics Technology to Understand the Role of Bilingualism in Cognitive Resilience, Aging and Dementia
利用纵向数据和信息学技术了解双语在认知弹性、衰老和痴呆中的作用
- 批准号:
10583170 - 财政年份:2023
- 资助金额:
$ 55.84万 - 项目类别:
Detecting synergistic effects of pharmacological and non-pharmacological interventions for AD/ADRD
检测 AD/ADRD 药物和非药物干预措施的协同效应
- 批准号:
10501245 - 财政年份:2022
- 资助金额:
$ 55.84万 - 项目类别:
Engagement and outreach to achieve a FAIR data ecosystem for the BICAN
参与和推广,为 BICAN 实现公平的数据生态系统
- 批准号:
10523908 - 财政年份:2022
- 资助金额:
$ 55.84万 - 项目类别:
Interactive machine learning methods for clinical natural language processing
用于临床自然语言处理的交互式机器学习方法
- 批准号:
9132834 - 财政年份:2010
- 资助金额:
$ 55.84万 - 项目类别:
Real-time Disambiguation of Abbreviations in Clinical Notes
临床记录中缩写词的实时消歧
- 批准号:
8077875 - 财政年份:2010
- 资助金额:
$ 55.84万 - 项目类别:
Real-time Disambiguation of Abbreviations in Clinical Notes
临床记录中缩写词的实时消歧
- 批准号:
7866149 - 财政年份:2010
- 资助金额:
$ 55.84万 - 项目类别:
Real-time Disambiguation of Abbreviations in Clinical Notes
临床记录中缩写词的实时消歧
- 批准号:
8589822 - 财政年份:2010
- 资助金额:
$ 55.84万 - 项目类别:
Real-time Disambiguation of Abbreviations in Clinical Notes
临床记录中缩写词的实时消歧
- 批准号:
8305149 - 财政年份:2010
- 资助金额:
$ 55.84万 - 项目类别:
An in-silico method for epidemiological studies using Electronic Medical Records
使用电子病历进行流行病学研究的计算机方法
- 批准号:
8110041 - 财政年份:2009
- 资助金额:
$ 55.84万 - 项目类别:
An in-silico method for epidemiological studies using Electronic Medical Records
使用电子病历进行流行病学研究的计算机方法
- 批准号:
7726747 - 财政年份:2009
- 资助金额:
$ 55.84万 - 项目类别:
相似海外基金
Collaborative Research: New to IUSE: EDU DCL:Diversifying Economics Education through Plug and Play Video Modules with Diverse Role Models, Relevant Research, and Active Learning
协作研究:IUSE 新增功能:EDU DCL:通过具有不同角色模型、相关研究和主动学习的即插即用视频模块实现经济学教育多元化
- 批准号:
2315700 - 财政年份:2024
- 资助金额:
$ 55.84万 - 项目类别:
Standard Grant
Building a Calculus Active Learning Environment Equally Beneficial Across a Diverse Student Population
建立一个对不同学生群体同样有益的微积分主动学习环境
- 批准号:
2315747 - 财政年份:2024
- 资助金额:
$ 55.84万 - 项目类别:
Standard Grant
Collaborative Research: New to IUSE: EDU DCL:Diversifying Economics Education through Plug and Play Video Modules with Diverse Role Models, Relevant Research, and Active Learning
协作研究:IUSE 新增功能:EDU DCL:通过具有不同角色模型、相关研究和主动学习的即插即用视频模块实现经济学教育多元化
- 批准号:
2315699 - 财政年份:2024
- 资助金额:
$ 55.84万 - 项目类别:
Standard Grant
CyberCorps Scholarship for Service: Defending Cyberspace through Active Learning
CyberCorps 服务奖学金:通过主动学习捍卫网络空间
- 批准号:
2336586 - 财政年份:2024
- 资助金额:
$ 55.84万 - 项目类别:
Continuing Grant
Project Visibility: Understanding the Experiences of Black Students in Active Learning Mathematics Courses in a Hispanic-Serving Institution Context
项目可见性:了解黑人学生在西班牙裔服务机构背景下主动学习数学课程的经历
- 批准号:
2337029 - 财政年份:2024
- 资助金额:
$ 55.84万 - 项目类别:
Standard Grant
Collaborative Research: New to IUSE: EDU DCL:Diversifying Economics Education through Plug and Play Video Modules with Diverse Role Models, Relevant Research, and Active Learning
协作研究:IUSE 新增功能:EDU DCL:通过具有不同角色模型、相关研究和主动学习的即插即用视频模块实现经济学教育多元化
- 批准号:
2315697 - 财政年份:2024
- 资助金额:
$ 55.84万 - 项目类别:
Standard Grant
Collaborative Research: New to IUSE: EDU DCL:Diversifying Economics Education through Plug and Play Video Modules with Diverse Role Models, Relevant Research, and Active Learning
协作研究:IUSE 新增功能:EDU DCL:通过具有不同角色模型、相关研究和主动学习的即插即用视频模块实现经济学教育多元化
- 批准号:
2315696 - 财政年份:2024
- 资助金额:
$ 55.84万 - 项目类别:
Standard Grant
Conference: Active Learning Communities in Biochemistry
会议:生物化学主动学习社区
- 批准号:
2411535 - 财政年份:2024
- 资助金额:
$ 55.84万 - 项目类别:
Standard Grant
Collaborative Research: New to IUSE: EDU DCL:Diversifying Economics Education through Plug and Play Video Modules with Diverse Role Models, Relevant Research, and Active Learning
协作研究:IUSE 新增功能:EDU DCL:通过具有不同角色模型、相关研究和主动学习的即插即用视频模块实现经济学教育多元化
- 批准号:
2315698 - 财政年份:2024
- 资助金额:
$ 55.84万 - 项目类别:
Standard Grant
Collaborative Research: New to IUSE: EDU DCL:Diversifying Economics Education through Plug and Play Video Modules with Diverse Role Models, Relevant Research, and Active Learning
协作研究:IUSE 新增功能:EDU DCL:通过具有不同角色模型、相关研究和主动学习的即插即用视频模块实现经济学教育多元化
- 批准号:
2315701 - 财政年份:2024
- 资助金额:
$ 55.84万 - 项目类别:
Standard Grant