POET: Consolidated, Comprehensive Clinical Text Preprocessing

POET:整合、全面的临床文本预处理

基本信息

  • 批准号:
    7570254
  • 负责人:
  • 金额:
    $ 16.93万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2008
  • 资助国家:
    美国
  • 起止时间:
    2008-09-30 至 2010-08-31
  • 项目状态:
    已结题

项目摘要

DESCRIPTION (provided by applicant): As electronic health records (EHRs) continue their expansion into clinical settings, there has been a corresponding increase in interest in mining the data they contain, both for research as well as for clinical decision support. Informaticists are increasingly studying ways to mine EHR textual content. This is an important trend, because there is a wealth of information contained in clinical text not represented anywhere else in the EHR. There is a low level text-as-data issue which presents a significant obstacle to the widespread use of available medical NLP systems: hand-typed clinical narratives in EHRs are usually ungrammatical; short or telegraphic in style; full of abbreviations, acronyms, and misspellings; formatted in a templated or pseudo-tabular form; and contain embedded non-text such as a list of laboratory values cut-and-pasted from elsewhere in the EHR. As we show in the Preliminary Studies Section, this makes high-level processing by popular tools like MedLEE and MetaMap effectively useless for all but a few "clean" document types like discharge summaries or consult reports (e.g., pathology or radiology reports). This in turn explains why there is so little published about what is certainly the preponderance of clinical texts, those that are not as well-behaved lexically and syntactically as a discharge summary. In this application we distinguish clinical narratives (e.g., a progress note) from biomedical narratives (e.g., a PubMed abstract). We are interested in texts that arise in the clinical or research setting; texts that are composed by clinicians and researchers directly into a computer system. We propose to build and publish a tool called POET (Parsable Output Extracted from Text). POET will be designed to accept unstructured textual documents and return structured, linguistic equivalents that are, to the extent possible, parsable by higher-level NLP engines. POET will have an architecture is modular, extensible, and based on open-source platforms and sources (e.g., Java, Perl, UMLS, NegEx, the Stanford Parser, HL7 Clinical Document Architecture, caGRID, etc.). To implement POET, we will collect, program, and evaluate published as well as novel algorithms for: acronym/abbreviation resolution; spelling correction; template and pseudo-table re-writing; and removal of embedded non-text. To test POET we will use a large corpus of cross-discipline (e.g., medical, nursing, pharmacy, etc.) clinical note types, as well as the clinical research texts MedWatch reports and IRB adverse event reports. The development of POET will combine the best practices found in the literature and new research efforts as part of the project. To validate the fidelity of POET processing we plan a formal analysis of information loss and information gain pre- and post-process. To ensure broad access to the tools, POET will be released under an open-source license. Finally, we plan to assess the feasibility of offering POET as a Web service for remote processing.
描述(由申请人提供): 随着电子健康记录(EHR)不断扩展到临床环境,人们对挖掘其中包含的数据的兴趣也相应增加,无论是用于研究还是临床决策支持。信息学家们越来越多地研究挖掘电子病历文本内容的方法。这是一个重要的趋势,因为包含在临床文本中的丰富信息在电子病历的其他任何地方都没有表现出来。有一个低水平的文本即数据问题,这是现有医疗NLP系统广泛使用的一个重大障碍:电子病历中手写的临床叙述通常不符合语法;文体简短或电传;充满缩写、首字母缩写和拼写错误;以模板或伪表格形式格式化;包含嵌入的非文本,如从电子病历其他地方剪切和粘贴的实验室值列表。正如我们在初步研究部分中所展示的,这使得像MedLEE和MetaMap这样的流行工具的高级处理对于除出院总结或参考报告(例如,病理学或放射学报告)等少数“干净”文件类型之外的所有其他“干净”文件类型实际上毫无用处。这反过来解释了为什么关于临床文本的优势的出版如此之少,那些在词汇和句法上不如出院摘要表现良好的文本。 在本申请中,我们将临床叙述(例如,进度说明)与生物医学叙述(例如,PubMed摘要)区分开来。我们感兴趣的是出现在临床或研究环境中的文本;由临床医生和研究人员直接输入计算机系统的文本。我们建议构建并发布一个名为POTE(从文本中提取的可解析输出)的工具。PEET将被设计为接受非结构化文本文档,并返回结构化的、语言上的等价物,这些等价物尽可能地可被更高级别的NLP引擎解析。POTE将拥有一个模块化的、可扩展的、基于开源平台和源代码(例如Java、Perl、UMLS、NegEx、Stanford Parser、HL7临床文档架构、caGRID等)的架构。为了实现POET,我们将收集、编程和评估已发表的和新颖的算法,用于:首字母缩写/缩写解析;拼写更正;模板和伪表重写;以及删除嵌入的非文本。为了测试PEET,我们将使用一个跨学科的大型语料库(例如,医学、护理、药学等)。临床记录类型,以及临床研究文本MedWatch报告和IRB不良事件报告。PEET的开发将结合文献中发现的最佳做法和新的研究努力,作为该项目的一部分。为了验证POTE处理的保真度,我们计划在处理前和处理后对信息损失和信息增益进行正式分析。为了确保这些工具的广泛使用,POTE将在开放源码许可下发布。最后,我们计划评估将POTE作为用于远程处理的Web服务提供的可行性。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

JOHN F. HURDLE其他文献

JOHN F. HURDLE的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('JOHN F. HURDLE', 18)}}的其他基金

University of Utah Biomedical Informatics Training Grant Supplement
犹他大学生物医学信息学培训补助金补充
  • 批准号:
    9380137
  • 财政年份:
    2016
  • 资助金额:
    $ 16.93万
  • 项目类别:
POET-2: High-performance computing for advanced clinical narrative preprocessing
POET-2:用于高级临床叙述预处理的高性能计算
  • 批准号:
    8326648
  • 财政年份:
    2011
  • 资助金额:
    $ 16.93万
  • 项目类别:
POET-2: High-performance computing for advanced clinical narrative preprocessing
POET-2:用于高级临床叙述预处理的高性能计算
  • 批准号:
    8182025
  • 财政年份:
    2011
  • 资助金额:
    $ 16.93万
  • 项目类别:
POET: Consolidated, Comprehensive Clinical Text Preprocessing
POET:整合、全面的临床文本预处理
  • 批准号:
    7689273
  • 财政年份:
    2008
  • 资助金额:
    $ 16.93万
  • 项目类别:
POET: Consolidated, Comprehensive Clinical Text Preprocessing
POET:整合、全面的临床文本预处理
  • 批准号:
    7847940
  • 财政年份:
    2008
  • 资助金额:
    $ 16.93万
  • 项目类别:
Statistical NLP Analysis of Cross-discipline Clinical Text
跨学科临床文本的统计NLP分析
  • 批准号:
    6836781
  • 财政年份:
    2004
  • 资助金额:
    $ 16.93万
  • 项目类别:
Statistical NLP Analysis of Cross-discipline Clinical Text
跨学科临床文本的统计NLP分析
  • 批准号:
    6944955
  • 财政年份:
    2004
  • 资助金额:
    $ 16.93万
  • 项目类别:
University of Utah Biomedical Informatics Training Grant
犹他大学生物医学信息学培训补助金
  • 批准号:
    8681515
  • 财政年份:
    1997
  • 资助金额:
    $ 16.93万
  • 项目类别:
University of Utah Biomedical Informatics Training Grant
犹他大学生物医学信息学培训补助金
  • 批准号:
    8261299
  • 财政年份:
    1997
  • 资助金额:
    $ 16.93万
  • 项目类别:
University of Utah Biomedical Informatics Training Grant
犹他大学生物医学信息学培训补助金
  • 批准号:
    9086432
  • 财政年份:
    1997
  • 资助金额:
    $ 16.93万
  • 项目类别:

相似海外基金

Planar culture of gastrointestinal stem cells for screening pharmaceuticals for adverse event risk
胃肠道干细胞平面培养用于筛选药物不良事件风险
  • 批准号:
    10707830
  • 财政年份:
    2023
  • 资助金额:
    $ 16.93万
  • 项目类别:
Hospital characteristics and Adverse event Rate Measurements (HARM) Evaluated over 21 years.
医院特征和不良事件发生率测量 (HARM) 经过 21 年的评估。
  • 批准号:
    479728
  • 财政年份:
    2023
  • 资助金额:
    $ 16.93万
  • 项目类别:
    Operating Grants
Analysis of ECOG-ACRIN adverse event data to optimize strategies for the longitudinal assessment of tolerability in the context of evolving cancer treatment paradigms (EVOLV)
分析 ECOG-ACRIN 不良事件数据,以优化在不断发展的癌症治疗范式 (EVOLV) 背景下纵向耐受性评估的策略
  • 批准号:
    10884567
  • 财政年份:
    2023
  • 资助金额:
    $ 16.93万
  • 项目类别:
AE2Vec: Medical concept embedding and time-series analysis for automated adverse event detection
AE2Vec:用于自动不良事件检测的医学概念嵌入和时间序列分析
  • 批准号:
    10751964
  • 财政年份:
    2023
  • 资助金额:
    $ 16.93万
  • 项目类别:
Understanding the real-world adverse event risks of novel biosimilar drugs
了解新型生物仿制药的现实不良事件风险
  • 批准号:
    486321
  • 财政年份:
    2022
  • 资助金额:
    $ 16.93万
  • 项目类别:
    Studentship Programs
Pediatric Adverse Event Risk Reduction for High Risk Medications in Children and Adolescents: Improving Pediatric Patient Safety in Dental Practices
降低儿童和青少年高风险药物的儿科不良事件风险:提高牙科诊所中儿科患者的安全
  • 批准号:
    10676786
  • 财政年份:
    2022
  • 资助金额:
    $ 16.93万
  • 项目类别:
Pediatric Adverse Event Risk Reduction for High Risk Medications in Children and Adolescents: Improving Pediatric Patient Safety in Dental Practices
降低儿童和青少年高风险药物的儿科不良事件风险:提高牙科诊所中儿科患者的安全
  • 批准号:
    10440970
  • 财政年份:
    2022
  • 资助金额:
    $ 16.93万
  • 项目类别:
Improving Adverse Event Reporting on Cooperative Oncology Group Trials
改进肿瘤学合作组试验的不良事件报告
  • 批准号:
    10642998
  • 财政年份:
    2022
  • 资助金额:
    $ 16.93万
  • 项目类别:
Planar culture of gastrointestinal stem cells for screening pharmaceuticals for adverse event risk
胃肠道干细胞平面培养用于筛选药物不良事件风险
  • 批准号:
    10482465
  • 财政年份:
    2022
  • 资助金额:
    $ 16.93万
  • 项目类别:
Expanding and Scaling Two-way Texting to Reduce Unnecessary Follow-Up and Improve Adverse Event Identification Among Voluntary Medical Male Circumcision Clients in the Republic of South Africa
扩大和扩大双向短信,以减少南非共和国自愿医疗男性包皮环切术客户中不必要的后续行动并改善不良事件识别
  • 批准号:
    10191053
  • 财政年份:
    2020
  • 资助金额:
    $ 16.93万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了