Natural Language Processing for Cancer Research Network Surveillance Studies

癌症研究网络监测研究的自然语言处理

基本信息

项目摘要

DESCRIPTION (provided by applicant): This application addresses Broad Challenge Area: (10) Information Technology for Processing Health Care Data and specific Challenge Topic: 10-CA-107 Expand Spectrum of Cancer Surveillance through Informatics Approaches. The proposed project launches a collaborative effort to advance adoption within the HMO Cancer Research Network (CRN) of "industrial-strength" natural language processing (NLP) systems useful for mining valuable, research-grade information from unstructured clinical text. Such text is available for processing, now in the electronic medical record (EMR) systems of affiliated CRN health plans. The proposed NLP methods will create ongoing capacity to tap what has recently been described as "a treasure trove of historical unstructured data that provides essential information for the study of disease progression, treatment effectiveness and long-term outcomes" (5). The vision of advancing widespread NLP capacity across the CRN, as well as the approach we present here for implementing it, grew out of an in-depth strategic planning effort we completed in December 2008. That effort involved participants from six CRN sites guided by a blue-ribbon panel of NLP experts from three of the nation's leading centers of clinical NLP research: University of Pittsburgh Medical Center, Vanderbilt University, and Mayo Clinic. The vision is to deploy a powerful NLP system locally, manage it with newly hired and trained local NLP technical staff, and conduct NLP-based research projects initiated by local investigators, in consultation with higher-level external NLP experts. Our planning efforts suggest this collaborative model is feasible; we will test the model in the context of the proposed project. An important development in April 2009 yielded what we believe is a potentially transformative opportunity to accelerate adoption of NLP capacity in applied research settings: release of the open-source Clinical Text Analysis and Knowledge Extraction System (cTAKES) software. This software was the result of a collaborative effort between IBM and Mayo Clinic. Built on the same framework Mayo Clinic currently uses to process its repository of over 40 million clinical documents, cTAKES dramatically lowers the cost of adopting a comprehensive and flexible NLP system. Deployment and use of such systems was previously only feasible in institutions with large, academically-oriented biomedical informatics research programs. Still, other deployment challenges and the need to acquire NLP training for local staff present residual barriers to adopting comprehensive NLP systems such as cTAKES. In collaboration with five other CRN sites the proposed project mitigates these challenges in two ways: 1) it develops configurable open-source software modules needed to streamline and therefore reduce the cost of deploying cTAKES, and 2) it presents and tests a model for training local staff through hands-on NLP projects overseen by outside NLP expert consultants. The potential impact of this project is evident most clearly in the vast untapped opportunities for text mining represented in CRN-affiliated health plans, where EMR systems have been in place since at least 2005, and whose patients represent 4% of the U.S. population. Clinical text mining offers the potential to provide new or improved data elements for cancer surveillance and other types of research requiring information about patient functional status, medication side-effects, details of therapeutic approaches, and differential information about clinical findings. Another significant impact of this project is its plan to integrate into the cTAKES system an open-source de-identification tool based on state of the art, best of breed NLP approaches developed by the MITRE Corporation. De-identification of clinical text will make it easier for researchers to get access to clinical text, and will also facilitate multi-site collaborations while protecting patient privacy. Finally, if successful, the NLP algorithm we propose as a proof-of-principle project at Group Health-which will classify sets of patient charts as either containing or not containing a diagnosis of recurrent breast cancer-could dramatically reduce the cost of research in this area; currently all recurrent breast cancer endpoints must be established through costly manual chart abstraction. Novel aspects of the proposed project include its talented and transdisciplinary research team, including national experts in NLP, and its resourceful strategy for building the technical resources and "human capital" needed to support an ongoing program of applied NLP research. Natural language processing is itself a highly innovative technology; when successfully established in multiple CRN in the future it will represent a watershed moment in the CRN's already impressive history of exploiting data systems to support innovative research. Newly hired staff positions total approximately 2.0 FTE in each project year, most of which we anticipate will be supported by ongoing new research programs after the proposed project concludes. Project narrative The proposed project develops new measurement technologies for extracting information about disease processes and treatment, currently documented only in clinical text, based on natural language processing approaches. Because these methods are generic they will potentially contribute to public health by advancing research in a wide variety of areas. The "proof of principle" algorithm developed in the project to identify recurrent breast cancer diagnoses will advance epidemiologic and clinical research pertaining to the 2.5 million women currently living with breast cancer.
描述(由申请人提供):本申请涉及广泛的挑战领域:(10)用于处理医疗保健数据的信息技术和特定挑战主题:10-CA-107通过信息学方法扩展癌症监测的频谱。拟议的项目发起了一项合作努力,以促进在HMO癌症研究网络(CRN)内采用“工业强度”的自然语言处理(NLP)系统,该系统有助于从非结构化的临床文本中挖掘有价值的研究级信息。这种文本可以进行处理,现在可以在附属CRN健康计划的电子病历(EMR)系统中进行处理。提出的自然语言处理方法 将建立持续的能力,以挖掘最近被描述为“历史宝库”的资源 为研究疾病进展、治疗提供基本信息的非结构化数据 有效性和长期成果“(5)。在整个CRN推进广泛的NLP能力的愿景,以及我们在这里提出的实施方法,源于我们于2008年12月完成的深入战略规划工作。这项工作涉及来自六个CRN地点的参与者,由来自全国三个领先的临床NLP研究中心的NLP专家组成的蓝丝带小组指导:匹兹堡大学医学中心、范德比尔特大学和梅奥诊所。其愿景是在当地部署一个强大的自然资源规划系统,与新雇用和培训的当地自然资源规划技术人员一起管理该系统,并与更高级别的外部自然资源规划专家协商,开展由当地调查人员发起的基于自然资源规划的研究项目。我们的规划工作表明,这种协作模式是可行的;我们将在拟议的项目背景下测试该模式。2009年4月的一项重要进展带来了我们认为可能具有变革性的机会,以加快在应用研究环境中采用NLP能力:发布开源临床文本分析和知识提取系统(CTAKES)软件。这款软件是IBM和梅奥诊所合作的结果。CTAKES建立在Mayo Clinic目前用来处理其4000多万份临床文档的相同框架上,大大降低了采用全面而灵活的NLP系统的成本。这种系统的部署和使用以前只在拥有大型、学术导向的生物医学信息学研究项目的机构中才是可行的。 尽管如此,其他部署挑战以及需要为当地工作人员提供NLP培训仍然存在 采用cTAKES等综合自然资源规划系统的障碍。该拟议项目与其他五个CRN网站合作,以两种方式减轻了这些挑战:1)它开发了简化cTAKES部署所需的可配置的开源软件模块,从而降低了部署cTAKES的成本;2)它提出并测试了一种通过由外部NLP专家顾问监督的实际NLP项目来培训当地工作人员的模式。该项目的潜在影响在CRN附属医疗计划中代表的大量未开发的文本挖掘机会中最为明显,该计划的EMR系统至少从2005年起就已经存在,其患者占美国人口的4%。临床文本挖掘提供了为癌症监测和其他类型的研究提供新的或改进的数据元素的潜力,这些研究需要关于患者功能状态、药物副作用、治疗方法的细节以及关于临床结果的差异信息的信息。该项目的另一个重大影响是计划融入cTAKES系统 这是一款基于MITRE公司开发的最先进的最佳NLP方法的开源识别工具。取消对临床文本的识别将使研究人员更容易获得临床文本,还将在保护患者隐私的同时促进多站点协作。最后,如果成功,我们在Group Health提出的作为原则证明项目的NLP算法-将患者图表集分类为包含或不包含复发乳腺癌诊断的集-可以极大地降低这一领域的研究成本;目前必须通过昂贵的手动图表提取来建立所有复发乳腺癌终点。 拟议项目的新方面包括其才华横溢的跨学科研究团队, 包括自然语言规划方面的国家专家,以及它的足智多谋的战略,以建立支持正在进行的应用自然语言编程研究计划所需的技术资源和“人力资本”。自然语言处理本身就是一项高度创新的技术;当它在未来成功地建立在多个CRN中时,它将是CRN利用数据系统支持创新研究的已经令人印象深刻的历史的分水岭时刻。每个项目年新招聘的员工职位总数约为2.0FTE,我们预计其中大部分将在拟议的项目结束后得到正在进行的新研究计划的支持。项目简介拟议的项目开发了新的测量技术,用于提取有关疾病过程和治疗的信息,目前仅在临床文本中记录,基于自然语言处理方法。由于这些方法是通用的,它们将通过推进广泛领域的研究而潜在地为公共卫生做出贡献。在该项目中开发的用于识别复发乳腺癌诊断的“原则证明”算法将推进与目前患有乳腺癌的250万妇女有关的流行病学和临床研究。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

DAVID S. CARRELL其他文献

DAVID S. CARRELL的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('DAVID S. CARRELL', 18)}}的其他基金

DAT- Implementing routine screening for cannabis and other drug use disorders in primary care: impact on diagnosis and treatment in a randomized pragmatic trial in 22 clinics
DAT-在初级保健中实施大麻和其他药物使用障碍的常规筛查:22 个诊所的随机实用试验对诊断和治疗的影响
  • 批准号:
    10237870
  • 财政年份:
    2020
  • 资助金额:
    $ 49.45万
  • 项目类别:
DAT- Implementing routine screening for cannabis and other drug use disorders in primary care: impact on diagnosis and treatment in a randomized pragmatic trial in 22 clinics
DAT-在初级保健中实施大麻和其他药物使用障碍的常规筛查:22 个诊所的随机实用试验对诊断和治疗的影响
  • 批准号:
    9884229
  • 财政年份:
    2020
  • 资助金额:
    $ 49.45万
  • 项目类别:
Scalable and Robust Clinical Text De-Identification Tools
可扩展且强大的临床文本去识别工具
  • 批准号:
    8345041
  • 财政年份:
    2012
  • 资助金额:
    $ 49.45万
  • 项目类别:
Scalable and Robust Clinical Text De-Identification Tools
可扩展且强大的临床文本去识别工具
  • 批准号:
    8722030
  • 财政年份:
    2012
  • 资助金额:
    $ 49.45万
  • 项目类别:
Natural Language Processing for Cancer Research Network Surveillance Studies
癌症研究网络监测研究的自然语言处理
  • 批准号:
    7839706
  • 财政年份:
    2009
  • 资助金额:
    $ 49.45万
  • 项目类别:

相似海外基金

How novices write code: discovering best practices and how they can be adopted
新手如何编写代码:发现最佳实践以及如何采用它们
  • 批准号:
    2315783
  • 财政年份:
    2023
  • 资助金额:
    $ 49.45万
  • 项目类别:
    Standard Grant
One or Several Mothers: The Adopted Child as Critical and Clinical Subject
一位或多位母亲:收养的孩子作为关键和临床对象
  • 批准号:
    2719534
  • 财政年份:
    2022
  • 资助金额:
    $ 49.45万
  • 项目类别:
    Studentship
A comparative study of disabled children and their adopted maternal figures in French and English Romantic Literature
英法浪漫主义文学中残疾儿童及其收养母亲形象的比较研究
  • 批准号:
    2633211
  • 财政年份:
    2020
  • 资助金额:
    $ 49.45万
  • 项目类别:
    Studentship
A material investigation of the ceramic shards excavated from the Omuro Ninsei kiln site: Production techniques adopted by Nonomura Ninsei.
对大室仁清窑遗址出土的陶瓷碎片进行材质调查:野野村仁清采用的生产技术。
  • 批准号:
    20K01113
  • 财政年份:
    2020
  • 资助金额:
    $ 49.45万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
A comparative study of disabled children and their adopted maternal figures in French and English Romantic Literature
英法浪漫主义文学中残疾儿童及其收养母亲形象的比较研究
  • 批准号:
    2436895
  • 财政年份:
    2020
  • 资助金额:
    $ 49.45万
  • 项目类别:
    Studentship
A comparative study of disabled children and their adopted maternal figures in French and English Romantic Literature
英法浪漫主义文学中残疾儿童及其收养母亲形象的比较研究
  • 批准号:
    2633207
  • 财政年份:
    2020
  • 资助金额:
    $ 49.45万
  • 项目类别:
    Studentship
The limits of development: State structural policy, comparing systems adopted in two European mountain regions (1945-1989)
发展的限制:国家结构政策,比较欧洲两个山区采用的制度(1945-1989)
  • 批准号:
    426559561
  • 财政年份:
    2019
  • 资助金额:
    $ 49.45万
  • 项目类别:
    Research Grants
Securing a Sense of Safety for Adopted Children in Middle Childhood
确保被收养儿童的中期安全感
  • 批准号:
    2236701
  • 财政年份:
    2019
  • 资助金额:
    $ 49.45万
  • 项目类别:
    Studentship
A Study on Mutual Funds Adopted for Individual Defined Contribution Pension Plans
个人设定缴存养老金计划采用共同基金的研究
  • 批准号:
    19K01745
  • 财政年份:
    2019
  • 资助金额:
    $ 49.45万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Structural and functional analyses of a bacterial protein translocation domain that has adopted diverse pathogenic effector functions within host cells
对宿主细胞内采用多种致病效应功能的细菌蛋白易位结构域进行结构和功能分析
  • 批准号:
    415543446
  • 财政年份:
    2019
  • 资助金额:
    $ 49.45万
  • 项目类别:
    Research Fellowships
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了