Leveraging large language models and knowledge graphs on clinical, pathological, and sequencing data to inform precision cancer therapy
利用临床、病理和测序数据的大型语言模型和知识图为精准癌症治疗提供信息
基本信息
- 批准号:10888730
- 负责人:
- 金额:$ 30万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2007
- 资助国家:美国
- 起止时间:2007-01-20 至 2023-12-31
- 项目状态:已结题
- 来源:
- 关键词:AdoptedAmerican Association of Cancer ResearchBehavioralBenchmarkingBiologicalBiological AssayCancer BiologyCancer ModelCancer PatientCharacteristicsClinicalClinical Cancer CenterClinical Practice GuidelineCoinCommunitiesComputer ModelsComputing MethodologiesDNA Sequence AlterationDangerousnessDataDemocracyDrug ScreeningEffectivenessEncapsulatedEnsureEquityExhibitsFeedbackGENIEGenesGenomicsGoalsGraphHallucinationsHumanIndividualInstructionKnowledgeLanguageLearningLlamaMalignant NeoplasmsManualsMemorial Sloan-Kettering Cancer CenterModalityModelingMutationOncogenicOutputPathologicPatientsPharmaceutical PreparationsPrediction of Response to TherapyProcessPsychological reinforcementPublishingRecommendationRecording of previous eventsReportingResearchResearch PersonnelResourcesRiskRisk FactorsSamplingSomatic MutationStructureTextTherapeuticTherapeutic InterventionTrainingTriplet Multiple BirthWeightanticancer researchcancer genomecancer genomicscancer therapycancer typechatbotclinical decision-makingclinical sequencingclinically significantcohortdesigngenomic datagenomic profilesindividualized medicineinsightinterestknowledge baseknowledge graphknowledge integrationlanguage trainingmultimodalitynovel strategiesopen sourceoperationpersonalized cancer therapypersonalized medicineprecision drugsprecision medicineprecision oncologypredictive modelingresponsetargeted treatmenttraittreatment strategytumor
项目摘要
Project Summary:
Precision medicine and targeted therapy are emerging domains in cancer biology that aim to incorporate
individual-level clinical, pathological and genomic profiles to tailor treatment strategies for cancer patients.
Several precision oncology knowledge bases, like OncoKB, My Cancer Genome, have been established to
democratize clinical decision-making by leveraging expert curation of biological and clinical significance of
alterations using publicly available resources. These knowledge bases, while extremely powerful, have their
limitations, including the scope of annotated genes and alterations, as well as identifying precise therapies for
specific combinations of a patient's genomic and clinical profiles. In this proposal, we plan to develop new
computational methodologies that will integrate (i) the broad range of implicit cancer knowledge accrued
by Large Language Models (LLMs) with (ii) the explicit structured clinical, pathological, and genomic
knowledge derived from cancer patients in the Memorial Sloan Kettering Cancer Center’s (MSKCC)
Clinical Sequencing cohort and AACR Project GENIE cohort. This will further be reinforced by expert
curation, with the aim to predict combinations of genomic alterations and clinical or pathological profiles
that can be matched to a specific cancer therapy. The goal of this research is to develop computational
models fundamentally anchored around knowledge graphs and LLMs to bridge the gap between clinical and
functional risk factors of cancer and cancer therapeutics, and to inform and enhance personalized therapies.
The first aim of this proposal is to develop a knowledge graph, MSK-CancerKG, based on patient-specific clinical,
pathological, and genomic alteration information from more than 100,000 patients from the MSKCC Clinical
Sequencing Cohort and the AACR GENIE Project cohort. This multi-relational knowledge graph will integrate a
wide spectrum of clinical features associated with each patient, abstracted features from pathological reports
corresponding to the patient-derived tumor samples, along with comprehensive characterization of genomic
alterations and the implicated genes. The second aim will be geared towards the fine-tuning of pre-trained Large
Language Models (LLMs) using the structured, detailed and more reliable cancer-specific knowledge from MSK-
CancerKG. We will meticulously benchmark these fine-tuned models against 4 state-of-the art pre-trained
language models, ultimately deriving an optimized combined predictive model, coined MSK-CancerLLM. The
benchmarking step will include successful clinical, alteration and treatment prediction accuracy on held-out
patient data. The third aim of the proposal will be to further fine-tune MSK-CancerLLM using clinical practice
guidelines and feedback to model output from cancer domain experts. The resulting model will be integrated into
an AI chatbot, called MSK-Assistant, to facilitate seamless integration and interaction between the backend
model and a frontend chatbot interface. Like the ChatGPT application, this will allow the research community to
query about cancer biology and personalized drug recommendations and therapeutic interventions.
项目概要:
精准医学和靶向治疗是癌症生物学中的新兴领域,旨在将
个体水平的临床、病理和基因组图谱,为癌症患者量身定制治疗策略。
已经建立了几个精确的肿瘤学知识库,如OncoKB,My Cancer Genome,
通过利用生物学和临床意义的专家策展使临床决策民主化,
利用公共资源进行改造。这些知识库,虽然非常强大,
限制,包括注释基因和改变的范围,以及确定精确的治疗方法,
患者基因组和临床特征的特定组合。在本提案中,我们计划开发新的
计算方法,将整合(i)广泛的隐性癌症知识积累
大型语言模型(LLM)与(ii)明确的结构化临床,病理和基因组
来自纪念斯隆凯特琳癌症中心(MSKCC)癌症患者的知识
临床测序队列和AACR项目GENIE队列。专家将进一步加强这一点。
策展,旨在预测基因组改变和临床或病理特征的组合
与特定的癌症疗法相匹配这项研究的目标是开发计算
模型从根本上围绕知识图和LLM进行锚定,以弥合临床和
癌症和癌症治疗的功能性风险因素,并告知和加强个性化治疗。
该提案的第一个目的是开发一个知识图谱,MSK-CancerKG,基于患者特定的临床,
来自MSKCC Clinical的100,000多名患者的病理和基因组改变信息
测序队列和AACR GENIE项目队列。这个多关系知识图将集成一个
与每个患者相关的广泛的临床特征,从病理报告中提取的特征
对应于患者来源的肿瘤样品,沿着基因组的全面表征,
改变和相关基因。第二个目标是对预先训练好的大型
语言模型(LLM)使用MSK的结构化,详细和更可靠的癌症特异性知识-
CancerKG.我们将精心基准这些微调模型对4个国家的最先进的预先训练
语言模型,最终得出一个优化的组合预测模型,创造了MSK-CancerLLM。的
基准步骤将包括成功的临床、变更和治疗预测准确性
患者数据。该提案的第三个目的是利用临床实践进一步微调MSK-CancerLLM
癌症领域专家的模型输出指南和反馈。由此产生的模型将被集成到
AI聊天机器人,称为MSK助手,以促进后端之间的无缝集成和交互
模型和前端聊天机器人界面。与ChatGPT应用程序一样,这将允许研究社区
询问癌症生物学和个性化药物建议和治疗干预。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
SELWYN M VICKERS其他文献
SELWYN M VICKERS的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('SELWYN M VICKERS', 18)}}的其他基金
Clinical Managment and Trials Core and Advocacy Sub-Core
临床管理和试验核心和宣传子核心
- 批准号:
7962152 - 财政年份:2010
- 资助金额:
$ 30万 - 项目类别: