Integrative data science approaches for rare disease discovery in health records
用于发现健康记录中罕见疾病的综合数据科学方法
基本信息
- 批准号:10626148
- 负责人:
- 金额:$ 23.65万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-06-01 至 2025-05-31
- 项目状态:未结题
- 来源:
- 关键词:AccelerationAdultAffectAmericanAwardBasic ScienceBehavioralBioinformaticsClinicalClinical DataClinical ResearchComputing MethodologiesConsensusDataData ScienceData SetDetectionDiagnosisDiagnosticDiagnostics ResearchDiseaseEconomic BurdenElectronic Health RecordEnvironmentFacultyFamilyFrequenciesGenesGeneticGenomicsGenotypeGoalsHealthcareHealthcare SystemsIndividualInformaticsKnowledgeMachine LearningManuscriptsMarkov ChainsMedicalMedical GeneticsMedicineMental disordersMentorsMentorshipMethodsMiningModelingMolecularNamesNatural Language ProcessingNatural Language Processing pipelineOntologyOutcomePacific NorthwestPatient RecruitmentsPatientsPatternPersonsPhasePhenotypePopulationPositioning AttributePrevalencePrincipal InvestigatorRare DiseasesRecording of previous eventsResearchResearch PersonnelStandardizationSymptomsSystemTestingTrainingUniversitiesValidationVisualizationVocabularyWashingtonWorkaccurate diagnosisbiomedical data sciencebiomedical informaticscareercausal variantclinical data warehouseclinical decision-makingcohortdiagnostic accuracydisease phenotypeearly onset disorderexome sequencinggene discoverygenomic datahealth care deliveryhealth datahealth recordimprovedmembermultimodal datanovelopen sourcepatient health informationphenotypic dataprototypepsychologicrare conditionrare genetic disorderrecruitskillssoftware developmentsupport toolstooltrait
项目摘要
ABSTRACT: There are nearly 7,000 diseases that have a prevalence of only one in 2,000 individuals or less.
Yet, such rare diseases are estimated to collectively affect over 300 million people worldwide, representing a
significant healthcare concern. Although rare diseases have predominantly genetic origins, nearly half of them
do not manifest symptoms until adulthood and frequently confound discovery and diagnosis. Even in the case
of early onset disorders, the sheer number of possible diagnoses can often overwhelm clinicians. As a result,
rare diseases are often diagnosed with delay, misdiagnosed or even remain undiagnosed, not only disrupting
patient lives but also hindering progress on our understanding of such diseases. Data science methods that
mine large-scale retrospective health record data for phenotypic information will aid in timely and accurate
diagnoses of rare diseases, especially when combined with additional data types, thus, having significant real-
world impact. This proposal will integrate electronic health record (EHR) data sets with publicly available
vocabularies and ontologies, and genomic data for the improved identification and characterization of patients
with rare diseases, using approaches from machine learning, natural language processing (NLP) and basic
bioinformatics. The work has three specific aims and will be carried out in two phases. During the mentored
phase, the principal investigator (PI) will develop data-driven methods to extract standardized concepts related
to rare diseases from clinical notes and infer the occurrence of each disease (Aim 1). He will also develop data
science approaches to compare and contrast longitudinal patterns associated with patients' journeys through
the healthcare system when seeking a diagnosis for a rare disease, and aid in clinical decision-making by
leveraging these patterns (Aim 2). During the independent phase (Aim 3), computational methods will be
developed for the integrated modeling and analysis of genotypic (from Aim 3) and phenotypic information (from
Aims 1 and 2). Cohorts to be sequenced will cover diseases for which causal genes or disease definitions are
unclear (discovery), as well as those for which these are well known (validation). This work will be carried out
under the mentorship of four faculty members with complementary expertise in biomedical informatics, data
science, NLP, and rare disease genomics at the University of Washington, the largest medical system in the
Pacific Northwest (four million EHRs), world-renowned researchers in medical genetics, and a robust data
science environment. In addition, under the direction of the mentoring team, the PI will complete advanced
coursework, receive training in translational bioinformatics and clinical research informatics, submit
manuscripts, and seek an independent research position. This proposal will yield preliminary results for
subsequent studies on data-driven phenotyping and enable the realization of the PI's career goals by providing
him with the necessary training to build on his machine learning and basic bioinformatics expertise to transition
into an independent investigator in biomedical data science.
摘要:近7000种疾病的患病率仅为每2000人中有一人或更少。
然而,据估计,这些罕见的疾病总共影响了全球超过3亿人,相当于
重大的医疗保健问题。尽管罕见疾病主要源于基因,但其中近一半
直到成年才表现出症状,经常混淆发现和诊断。即使在这种情况下
对于早发性疾病,可能诊断的数量往往会让临床医生不知所措。结果,
罕见病往往延误诊断、误诊甚至漏诊,不仅扰乱
这不仅影响了患者的生命,也阻碍了我们对此类疾病的了解。数据科学方法
挖掘大规模追溯健康记录数据以获取表型信息将有助于及时和准确地
罕见疾病的诊断,特别是与其他数据类型相结合时,因此具有重要的真实-
世界影响力。这项提议将电子健康记录(EHR)数据集与公共可用数据集相结合
词汇表和本体以及基因组数据,用于改进患者的识别和表征
对于罕见疾病,使用机器学习、自然语言处理(NLP)和Basic
生物信息学。这项工作有三个具体目标,将分两个阶段进行。在接受指导的过程中
在阶段,首席调查员(PI)将开发数据驱动的方法,以提取相关的标准化概念
从临床记录中推断罕见疾病,并推断每种疾病的发生(目标1)。他还将开发数据
比较和对比与患者行程相关的纵向模式的科学方法
在寻求对一种罕见疾病的诊断时,医疗保健系统,并通过
利用这些模式(目标2)。在独立阶段(目标3),计算方法将是
为综合建模和分析基因类型(来自目标3)和表型信息(来自
目标1和2)。将被测序的队列将涵盖因果基因或疾病定义为
不清楚(发现),以及那些众所周知的(验证)。这项工作将会进行
在四名在生物医学信息学方面具有互补专业知识的教职员工的指导下,Data
华盛顿大学的科学、NLP和罕见疾病基因组学,这是世界上最大的医疗系统
太平洋西北(400万EHR),世界知名的医学遗传学研究人员,以及强大的数据
科学环境。此外,在指导团队的指导下,PI将完成进阶
课程作业,接受翻译生物信息学和临床研究信息学的培训,提交
手稿,并寻求独立的研究地位。这项提案将产生以下初步结果
关于数据驱动表型的后续研究,并通过以下方式实现PI的职业目标
他接受了必要的培训,以建立他的机器学习和基本的生物信息学专业知识来过渡
成为生物医学数据科学领域的独立研究员。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Vikas Rao Pejaver其他文献
Vikas Rao Pejaver的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Vikas Rao Pejaver', 18)}}的其他基金
Integrative data science approaches for rare disease discovery in health records
用于发现健康记录中罕见疾病的综合数据科学方法
- 批准号:
10541283 - 财政年份:2022
- 资助金额:
$ 23.65万 - 项目类别:
Integrative data science approaches for rare disease discovery in health records
用于发现健康记录中罕见疾病的综合数据科学方法
- 批准号:
9884791 - 财政年份:2019
- 资助金额:
$ 23.65万 - 项目类别:
相似海外基金
Co-designing a lifestyle, stop-vaping intervention for ex-smoking, adult vapers (CLOVER study)
为戒烟的成年电子烟使用者共同设计生活方式、戒烟干预措施(CLOVER 研究)
- 批准号:
MR/Z503605/1 - 财政年份:2024
- 资助金额:
$ 23.65万 - 项目类别:
Research Grant
RAPID: Affective Mechanisms of Adjustment in Diverse Emerging Adult Student Communities Before, During, and Beyond the COVID-19 Pandemic
RAPID:COVID-19 大流行之前、期间和之后不同新兴成人学生社区的情感调整机制
- 批准号:
2402691 - 财政年份:2024
- 资助金额:
$ 23.65万 - 项目类别:
Standard Grant
Early Life Antecedents Predicting Adult Daily Affective Reactivity to Stress
早期生活经历预测成人对压力的日常情感反应
- 批准号:
2336167 - 财政年份:2024
- 资助金额:
$ 23.65万 - 项目类别:
Standard Grant
Elucidation of Adult Newt Cells Regulating the ZRS enhancer during Limb Regeneration
阐明成体蝾螈细胞在肢体再生过程中调节 ZRS 增强子
- 批准号:
24K12150 - 财政年份:2024
- 资助金额:
$ 23.65万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Migrant Youth and the Sociolegal Construction of Child and Adult Categories
流动青年与儿童和成人类别的社会法律建构
- 批准号:
2341428 - 财政年份:2024
- 资助金额:
$ 23.65万 - 项目类别:
Standard Grant
Understanding how platelets mediate new neuron formation in the adult brain
了解血小板如何介导成人大脑中新神经元的形成
- 批准号:
DE240100561 - 财政年份:2024
- 资助金额:
$ 23.65万 - 项目类别:
Discovery Early Career Researcher Award
Laboratory testing and development of a new adult ankle splint
新型成人踝关节夹板的实验室测试和开发
- 批准号:
10065645 - 财政年份:2023
- 资助金额:
$ 23.65万 - 项目类别:
Collaborative R&D
Usefulness of a question prompt sheet for onco-fertility in adolescent and young adult patients under 25 years old.
问题提示表对于 25 岁以下青少年和年轻成年患者的肿瘤生育力的有用性。
- 批准号:
23K09542 - 财政年份:2023
- 资助金额:
$ 23.65万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Identification of new specific molecules associated with right ventricular dysfunction in adult patients with congenital heart disease
鉴定与成年先天性心脏病患者右心室功能障碍相关的新特异性分子
- 批准号:
23K07552 - 财政年份:2023
- 资助金额:
$ 23.65万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Issue identifications and model developments in transitional care for patients with adult congenital heart disease.
成人先天性心脏病患者过渡护理的问题识别和模型开发。
- 批准号:
23K07559 - 财政年份:2023
- 资助金额:
$ 23.65万 - 项目类别:
Grant-in-Aid for Scientific Research (C)