An in-silico method for epidemiological studies using Electronic Medical Records
使用电子病历进行流行病学研究的计算机方法
基本信息
- 批准号:7726747
- 负责人:
- 金额:$ 27.33万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2009
- 资助国家:美国
- 起止时间:2009-09-03 至 2013-07-31
- 项目状态:已结题
- 来源:
- 关键词:AffectAmerican Cancer SocietyBreastCerealsClinicalClinical DataCohort StudiesColon CarcinomaComputer SimulationComputerized Medical RecordDataData QualityData SourcesDiseaseEpidemiologic StudiesEpidemiologyHealthHospitalsHuman ResourcesInformaticsKnowledgeMalignant NeoplasmsManualsMethodsNIH Program AnnouncementsNatural Language ProcessingPatientsPopulationPreventionRandomized Controlled TrialsRecordsReportingResearchRisk FactorsSelection BiasStatistical MethodsSystemTechnologyTextTimeValidationanticancer researchbasecancer therapycancer typecostinterestprevent
项目摘要
DESCRIPTION: Observational epidemiological studies are effective methods for identifying factors affecting the health and illness of populations, as well as for determining optimal treatments for diseases, such as cancers. However, conventional epidemiological research usually involves personnel-intensive effort (such as manual chart and public records review) and can be very time consuming before conclusive results are obtained. Recently, a large amount of detailed longitudinal clinical data has been accumulated at hospitals' Electronic Medical Records (EMR) systems and it has become a valuable data source for epidemiological studies. However, there are two obstacles that prevent the wide usage of EMR data in epidemiological studies. First, most of the detailed clinical information in EMRs is embedded in narrative text and it is very costly to extract that information manually. Second, EMRs usually have data quality problems such as selection bias and missing data, which require adaptation of conventional statistical methods developed for randomized controlled trials.
In this study, we propose an in silico informatics-based approach for observational epidemiological studies using EMR data. We hypothesize that existing EMR data can be used for certain types of epidemiological studies in a very efficient manner with the help of informatics methods. The informatics-based approach will contain two major components. One is an NLP (Natural Language Processing) based information extraction system that can automatically extract detailed clinical information from EMR and another is a set of statistical and informatics methods that can be used to analyze EMR-derived data. If the feasibility of this approach is proven, it will change the standard paradigm of observational epidemiological research, because it has the capability to answer an epidemiological question in a very short time at a very low cost. The specific aim of this study is to develop an automated informatics approach to extract both fine-grained cancer findings and general clinical information from EMRs and use them to conduct cancer related epidemiological studies. We will perform both casecontrol and cohort studies related to prevention and treatment of breast and colon cancers using EMR data. The informatics approach will be validated on EMRs from two major hospitals to demonstrate its generalizability. Epidemiological findings from our study will be compared to reported findings for validation.
Narrative: According to the American Cancer Society, about 7.6 million people died from various types of cancer in the world during 2007. It is very important to identify risk factors of
cancers and to determine optimal treatments of cancers, and epidemiological study is
one of the methods to achieve it. This proposed study will use natural language
processing technologies to automatically extract fine-grained cancer information from
existing patient electronic medical records and use it to conduct cancer related
epidemiological studies, thus accelerating knowledge accumulation of cancer research.
SPECIAL REVIEW NOTE: In order to conform to the scientific objectives outlined in the program announcement RFA-GM-09-008, EUREKA applications submitted to the NCI were initially evaluated by a group of reviewers representing diverse scientific interests. The priority score reflects the average of all the scores given by the full committee after a thorough discussion.
描述:观察性流行病学研究是识别影响人群健康和疾病的因素的有效方法,以及确定癌症等疾病的最佳治疗方法。但是,传统的流行病学研究通常涉及人事密集型努力(例如手动图表和公共记录审查),并且在获得结论结果之前可能非常耗时。最近,在医院的电子病历(EMR)系统中积累了大量详细的纵向临床数据,它已成为流行病学研究的宝贵数据源。但是,在流行病学研究中,有两个障碍阻止EMR数据的广泛使用。首先,EMR中的大多数详细临床信息都嵌入了叙事文本中,手动提取该信息是非常昂贵的。其次,EMR通常存在数据质量问题,例如选择偏差和缺失数据,这些数据需要适应为随机对照试验开发的常规统计方法。
在这项研究中,我们提出了一种基于硅信息学的方法,用于使用EMR数据进行观察性流行病学研究。我们假设现有的EMR数据可在信息学方法的帮助下以非常有效的方式用于某些类型的流行病学研究。基于信息学的方法将包含两个主要组成部分。一种是基于NLP(自然语言处理)的信息提取系统,可以自动从EMR中提取详细的临床信息,而另一种是一组统计和信息学方法,可用于分析EMR衍生的数据。如果证明了这种方法的可行性,它将改变观察性流行病学研究的标准范式,因为它可以在很短的时间内以非常低的成本在很短的时间内回答流行病学问题。这项研究的具体目的是开发一种自动化信息学方法,以从EMR中提取细粒度的癌症发现和一般临床信息,并使用它们来进行与癌症相关的流行病学研究。我们将使用EMR数据进行与乳腺癌和结肠癌的预防和治疗有关的CASECORTROL和队列研究。信息学方法将在两家主要医院的EMR上进行验证,以证明其普遍性。我们的研究的流行病学发现将与报告的发现结果进行比较。
叙述:根据美国癌症协会的研究,2007年,大约760万人死于世界各种类型的癌症。确定的危险因素非常重要
癌症并确定癌症的最佳治疗方法,流行病学研究是
实现它的方法之一。这项拟议的研究将使用自然语言
处理技术可以自动从中提取细粒癌信息
现有的患者电子病历并将其用于进行癌症相关
流行病学研究,从而加速了癌症研究的知识积累。
特别审核说明:为了符合计划公告RFA-GM-09-008中概述的科学目标,提交给NCI的Eureka应用程序最初是由一组代表多样化科学利益的审阅者评估的。优先分数反映了全面讨论后,全体委员会给出的所有分数的平均值。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
HUA XU其他文献
HUA XU的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('HUA XU', 18)}}的其他基金
Leveraging Longitudinal Data and Informatics Technology to Understand the Role of Bilingualism in Cognitive Resilience, Aging and Dementia
利用纵向数据和信息学技术了解双语在认知弹性、衰老和痴呆中的作用
- 批准号:
10583170 - 财政年份:2023
- 资助金额:
$ 27.33万 - 项目类别:
Detecting synergistic effects of pharmacological and non-pharmacological interventions for AD/ADRD
检测 AD/ADRD 药物和非药物干预措施的协同效应
- 批准号:
10501245 - 财政年份:2022
- 资助金额:
$ 27.33万 - 项目类别:
Engagement and outreach to achieve a FAIR data ecosystem for the BICAN
参与和推广,为 BICAN 实现公平的数据生态系统
- 批准号:
10523908 - 财政年份:2022
- 资助金额:
$ 27.33万 - 项目类别:
Interactive machine learning methods for clinical natural language processing
用于临床自然语言处理的交互式机器学习方法
- 批准号:
8818096 - 财政年份:2010
- 资助金额:
$ 27.33万 - 项目类别:
Interactive machine learning methods for clinical natural language processing
用于临床自然语言处理的交互式机器学习方法
- 批准号:
9132834 - 财政年份:2010
- 资助金额:
$ 27.33万 - 项目类别:
Real-time Disambiguation of Abbreviations in Clinical Notes
临床记录中缩写词的实时消歧
- 批准号:
8077875 - 财政年份:2010
- 资助金额:
$ 27.33万 - 项目类别:
Real-time Disambiguation of Abbreviations in Clinical Notes
临床记录中缩写词的实时消歧
- 批准号:
7866149 - 财政年份:2010
- 资助金额:
$ 27.33万 - 项目类别:
Real-time Disambiguation of Abbreviations in Clinical Notes
临床记录中缩写词的实时消歧
- 批准号:
8589822 - 财政年份:2010
- 资助金额:
$ 27.33万 - 项目类别:
Real-time Disambiguation of Abbreviations in Clinical Notes
临床记录中缩写词的实时消歧
- 批准号:
8305149 - 财政年份:2010
- 资助金额:
$ 27.33万 - 项目类别:
An in-silico method for epidemiological studies using Electronic Medical Records
使用电子病历进行流行病学研究的计算机方法
- 批准号:
8110041 - 财政年份:2009
- 资助金额:
$ 27.33万 - 项目类别:
相似海外基金
Development of Mouse and Humanized Models to Study Sex Disparities in Tumor Progression and Treatment of NSCLC
开发小鼠和人源化模型来研究肿瘤进展和非小细胞肺癌治疗中的性别差异
- 批准号:
10727735 - 财政年份:2023
- 资助金额:
$ 27.33万 - 项目类别:
Understanding and addressing rejection of personalized cancer risk information
了解并解决拒绝个性化癌症风险信息的问题
- 批准号:
10639183 - 财政年份:2023
- 资助金额:
$ 27.33万 - 项目类别:
PATTERNS OF CARE (POC) STUDY: DIAGNOSIS YEAR 2020 (BREAST CANCER AND COLORECTAL CANCER)
护理模式 (POC) 研究:2020 年诊断年(乳腺癌和结直肠癌)
- 批准号:
10928958 - 财政年份:2022
- 资助金额:
$ 27.33万 - 项目类别: