Semi-structured Information Retrieval in Clinical Text for Cohort Identification
用于队列识别的临床文本中的半结构化信息检索
基本信息
- 批准号:8811565
- 负责人:
- 金额:$ 46.07万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2014
- 资助国家:美国
- 起止时间:2014-09-20 至 2019-07-31
- 项目状态:已结题
- 来源:
- 关键词:AccountingAddressAdoptedAdoptionAsthmaClinicClinicalCollectionCommunitiesComputerized Medical RecordComputersDataDictionaryDiseaseElectronic Health RecordEpidemiologistEpidemiologyEvaluationEventEvidence Based MedicineEvolutionGoalsHealthInformation RetrievalInformation Retrieval SystemsInstitutionInterest GroupInvestigationJudgmentLanguageLearningMachine LearningMeasuresMedicalMedical RecordsMethodologyMethodsMetricModelingModificationMorphologic artifactsNamesNatural Language ProcessingOutcomePatient RecruitmentsPatientsPerformancePharmaceutical PreparationsPhasePhysiciansProcessPublishingQualifyingRecordsResearchResearch PersonnelResourcesRestRetrievalSamplingSemanticsSiteSmokeSourceSpecific qualifier valueStructureSystemTechniquesTestingTextValidationWeightWorkWritingasthmatic patientbasecohortimprovedindexingnovelopen sourcepublic health relevancesyntaxtext searchingtool
项目摘要
DESCRIPTION (provided by applicant): Natural Language Processing (NLP) techniques have shown promise for extracting data from the free text of electronic health records (EHRs), but studies have consistently found that techniques do not readily generalize across application settings. Unfortunately, most of the focus in applying NLP to real use cases has remained on a paradigm of single, well-defined application settings, so that generalizability to unseen use cases remains implicitly unaddressed. We propose to explicitly account for unseen application settings by adopting an information retrieval (IR) perspective with the objective of patient-level cohort identification. To do so, we introduce layered language models, an IR framework that enables the reuse of NLP-produced artifacts. Our long term goal is to accelerate investigations of patient health and disease by providing robust, user- centric tools that are necessary to process, retrieve, and utilize the free text of EHRs. The main goal of this proposal is to accurately retrieve ad hoc, realistic cohorts from clinical text at Mayo Clinic and OHSU, establishing methods, resources, and evaluation for patient-level IR. We hypothesize that cohort identification can be addressed in a generalizable fashion by a new IR framework: layered language models. We will test this hypothesis through four specific aims. In Aim 1, we will make medical NLP artifacts searchable in our layered language IR framework. This involves storing and indexing the NLP artifacts, as well as using statistical language models to retrieve documents based on text and its associated NLP artifacts. In Aim 2, we deal with the practical setting of ad hoc cohort identification, moving to patient-level (rather than document-level) IR. To accurately handle patient cohorts in which qualifying evidence may be spread over multiple documents, we will develop and implement patient-level retrieval models that account for cross- document relational and temporal combinations of events. In Aim 3, we will construct parallel IR test collections using EHR data from two sites; a diverse set of cohort queries written by multiple
people toward various clinical or epidemiological ends; and assessments of which patients are relevant to which queries at both sites. Finally, in Aim 4, we refine and evaluate patient-level layered language IR on the ad hoc cohort identification task, making comparisons across the users, queries, optimization metrics, and institutions. We will draw additional extrinsic comparisons with pre-existing techniques, e.g., for cohorts from the Electronic Medical Records and Genonmics network. The expected outcomes of the proposed work are: (i) An open-source cohort identification tool, usable by clinicians and epidemiologists, that makes principled use of NLP artifacts for unseen queries; ii) A parallel test collection for cohort identification, includig two intra-institutional document collections, diverse test topics and user-produced text queries, and patient-level judgments of relevance to each query; and (iii) Validation of the reusability of medical NLP via the task of retrieving patient cohorts.
描述(由申请人提供):自然语言处理(NLP)技术已显示出从电子健康记录(EHR)的自由文本中提取数据的前景,但研究一致发现,该技术并不容易在应用程序设置中推广。不幸的是,将NLP应用于真实的用例的大部分焦点仍然停留在单一的、定义良好的应用程序设置的范例上,因此对看不见的用例的可推广性仍然隐含地没有得到解决。我们建议通过采用信息检索(IR)的角度,明确考虑到看不见的应用程序设置,目标是患者级别的队列识别。为此,我们引入了分层语言模型,一个IR框架,使NLP产生的工件的重用。我们的长期目标是通过提供强大的、以用户为中心的工具来加速对患者健康和疾病的调查,这些工具是处理、检索和利用EHR的自由文本所必需的。这项建议的主要目标是准确地检索特设的,现实的队列从临床文本在马约诊所和OHSU,建立方法,资源,并评估患者水平的IR。我们假设,队列识别可以解决一个普遍的方式由一个新的IR框架:分层的语言模型。我们将通过四个具体目标来检验这一假设。在目标1中,我们将在分层语言IR框架中搜索医学NLP工件。这涉及到存储和索引NLP工件,以及使用统计语言模型来检索基于文本及其相关NLP工件的文档。在目标2中,我们处理的实际设置的特设队列识别,移动到患者级(而不是文件级)IR。为了准确地处理患者队列中,合格的证据可能会分布在多个文件,我们将开发和实施患者级检索模型,占跨文档的关系和时间组合的事件。在目标3中,我们将使用来自两个站点的EHR数据构建并行IR测试集合;由多个站点编写的不同队列查询集。
人们对各种临床或流行病学的目的;以及评估哪些患者与两个站点的哪些查询相关。最后,在目标4中,我们在特定队列识别任务上改进和评估患者级分层语言IR,在用户、查询、优化指标和机构之间进行比较。我们将与现有技术进行额外的外部比较,例如,电子病历和基因组网络的队列。拟议工作的预期成果是:(i)一个开源的队列识别工具,可供临床医生和流行病学家使用,原则上使用NLP工件进行看不见的查询;(ii)一个并行的队列识别测试集,包括两个机构内的文件集,不同的测试主题和用户生成的文本查询,以及与每个查询相关的患者水平判断;以及(iii)通过检索患者队列的任务验证医学NLP的可重用性。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
HONGFANG LIU其他文献
HONGFANG LIU的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('HONGFANG LIU', 18)}}的其他基金
Learning Precision Medicine for Rare Diseases Empowered by Knowledge-driven Data Mining
通过知识驱动的数据挖掘学习罕见疾病的精准医学
- 批准号:
10732934 - 财政年份:2023
- 资助金额:
$ 46.07万 - 项目类别:
The Data, Evaluation, and Coordination Center (DECC) for Connecting Underrepresented Populations to Clinical Trials (CUSP2CT)
用于将代表性不足的人群与临床试验联系起来的数据、评估和协调中心 (DECC) (CUSP2CT)
- 批准号:
10597291 - 财政年份:2022
- 资助金额:
$ 46.07万 - 项目类别:
Secondary use of EMRs for surgical complication surveillance
EMR 二次用于手术并发症监测
- 批准号:
10202598 - 财政年份:2015
- 资助金额:
$ 46.07万 - 项目类别:
Secondary use of EMRs for surgical complication surveillance
EMR 二次用于手术并发症监测
- 批准号:
10001498 - 财政年份:2015
- 资助金额:
$ 46.07万 - 项目类别:
Secondary use of EMRs for surgical complication surveillance
二次使用 EMR 进行手术并发症监测
- 批准号:
9251814 - 财政年份:2015
- 资助金额:
$ 46.07万 - 项目类别:
Secondary use of EMRs for surgical complication surveillance
EMR 二次用于手术并发症监测
- 批准号:
10471838 - 财政年份:2015
- 资助金额:
$ 46.07万 - 项目类别:
Semi-structured Information Retrieval in Clinical Text for Cohort Identification
用于队列识别的临床文本中的半结构化信息检索
- 批准号:
8928647 - 财政年份:2014
- 资助金额:
$ 46.07万 - 项目类别:
Natural language processing for clinical and translational research
用于临床和转化研究的自然语言处理
- 批准号:
9033918 - 财政年份:2013
- 资助金额:
$ 46.07万 - 项目类别:
Natural language processing for clinical and translational research
用于临床和转化研究的自然语言处理
- 批准号:
8640959 - 财政年份:2013
- 资助金额:
$ 46.07万 - 项目类别:
Natural language processing for clinical and translational research
用于临床和转化研究的自然语言处理
- 批准号:
8920720 - 财政年份:2013
- 资助金额:
$ 46.07万 - 项目类别:
相似海外基金
Rational design of rapidly translatable, highly antigenic and novel recombinant immunogens to address deficiencies of current snakebite treatments
合理设计可快速翻译、高抗原性和新型重组免疫原,以解决当前蛇咬伤治疗的缺陷
- 批准号:
MR/S03398X/2 - 财政年份:2024
- 资助金额:
$ 46.07万 - 项目类别:
Fellowship
Re-thinking drug nanocrystals as highly loaded vectors to address key unmet therapeutic challenges
重新思考药物纳米晶体作为高负载载体以解决关键的未满足的治疗挑战
- 批准号:
EP/Y001486/1 - 财政年份:2024
- 资助金额:
$ 46.07万 - 项目类别:
Research Grant
CAREER: FEAST (Food Ecosystems And circularity for Sustainable Transformation) framework to address Hidden Hunger
职业:FEAST(食品生态系统和可持续转型循环)框架解决隐性饥饿
- 批准号:
2338423 - 财政年份:2024
- 资助金额:
$ 46.07万 - 项目类别:
Continuing Grant
Metrology to address ion suppression in multimodal mass spectrometry imaging with application in oncology
计量学解决多模态质谱成像中的离子抑制问题及其在肿瘤学中的应用
- 批准号:
MR/X03657X/1 - 财政年份:2024
- 资助金额:
$ 46.07万 - 项目类别:
Fellowship
CRII: SHF: A Novel Address Translation Architecture for Virtualized Clouds
CRII:SHF:一种用于虚拟化云的新型地址转换架构
- 批准号:
2348066 - 财政年份:2024
- 资助金额:
$ 46.07万 - 项目类别:
Standard Grant
The Abundance Project: Enhancing Cultural & Green Inclusion in Social Prescribing in Southwest London to Address Ethnic Inequalities in Mental Health
丰富项目:增强文化
- 批准号:
AH/Z505481/1 - 财政年份:2024
- 资助金额:
$ 46.07万 - 项目类别:
Research Grant
ERAMET - Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
ERAMET - 快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
- 批准号:
10107647 - 财政年份:2024
- 资助金额:
$ 46.07万 - 项目类别:
EU-Funded
BIORETS: Convergence Research Experiences for Teachers in Synthetic and Systems Biology to Address Challenges in Food, Health, Energy, and Environment
BIORETS:合成和系统生物学教师的融合研究经验,以应对食品、健康、能源和环境方面的挑战
- 批准号:
2341402 - 财政年份:2024
- 资助金额:
$ 46.07万 - 项目类别:
Standard Grant
Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
- 批准号:
10106221 - 财政年份:2024
- 资助金额:
$ 46.07万 - 项目类别:
EU-Funded
Recite: Building Research by Communities to Address Inequities through Expression
背诵:社区开展研究,通过表达解决不平等问题
- 批准号:
AH/Z505341/1 - 财政年份:2024
- 资助金额:
$ 46.07万 - 项目类别:
Research Grant














{{item.name}}会员




