权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

MACE2K - Molecular And Clinical Extraction: A Natural Language Processing Tool for Personalized Medicine

MACE2K - 分子和临床提取：个性化医疗的自然语言处理工具

基本信息

批准号：
9146381
负责人：
Subha Madhavan
金额：
$ 45.71万
依托单位：
GEORGETOWN UNIVERSITY
依托单位国家：
美国
项目类别：
财政年份：
2015
资助国家：
美国
起止时间：
2015-09-22 至 2018-05-31
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/9146381
关键词：
Address Algorithms Big Data Big Data to Knowledge Biological Biomedical Research Cancer Center Clinical Clinical Decision Support Systems Clinical Trials Computer software Computing Methodologies Crowding Data Data Aggregation Databases Dictionary Disease Exclusion Criteria Gene Expression Gene Mutation Genome Goals Gold Health Informatics Information Retrieval Investments Letters Literature Malignant Neoplasms Maps Meta-Analysis Methods Molecular Molecular Profiling Molecular Target Mutation National Cancer Institute Natural Language Processing Oncologist Online Systems Outcome Patients Peer Review Pharmaceutical Preparations Pharmacotherapy Phosphorylation Process PubMed Publications Recording of previous events Reporting Research Research Design Research Personnel Software Validation Source Structure System Systems Biology Testing Therapeutic Time United States National Institutes of Health abstracting base crowdsourcing data to knowledge data wrangling design improved inclusion criteria innovation interest knowledge base meetings novel novel strategies personalized cancer care personalized cancer therapy personalized medicine programs protein expression search engine software development symposium targeted treatment tool user friendly software verification and validation

项目摘要

DESCRIPTION (provided by applicant): The velocity, variety, volume and veracity of data from relevant information sources make it extremely challenging for oncologists to collect and review pertinent data that can support routine personalized treatment for their patients. There is an urgent need to develop data wrangling approaches including Natural Language Processing and information retrieval methods to extract and curate personalized-therapy related publications and clinical trials. Once curated, the structured data can be used by biomedical researchers to generate novel scientific hypotheses, design new studies, obtain a better understanding of biological mechanisms of disease, perform meta-analyses, and create clinical decision support systems. There is an urgent need to develop improved search interfaces specific to the field of personalized therapy, including ways to display, rank, and save results by end users. While several database and web-based keyword search engine algorithms exist, there is a lack of tools that meet the unique challenges of personalized medicine. There is also an urgent need to develop software that allows for verification and validation of information extracted and ranked through computational methods using subject matter expertise to improve the gold standard corpus that can be used for biomedical research into personalized therapies. To address these issues, we will build an innovative software stack (MACE2K) to adapt and extend widely tested Biocreative natural language processing (NLP) tools to automatically retrieve and pre-process targeted therapy information from clinicaltrials.gov, PubMed abstracts as well as open access articles, and conference proceedings. We will build an entity extraction cartridge to accurately parse gene mutations, translocations, gene expression, protein expression, and protein phosphorylation. A marker disambiguation cartridge will be built to assess for trial inclusion or exclusion criteria and to determine marker-related primary endpoints. We will include a ranking cartridge that uses the disambiguated information on markers, drugs and trials to provide a rigorous scoring of trials and studies according to their relevance for personalized medicine. A novel gamification cartridge will be built to allow subject matter experts to verify and validate the information corpus. Our research leverages National Cancer Institute's investments in several programs (many of which we are involved in) including the NCI drug dictionary, National Cancer Informatics Program (NCIP), I-SPY trials, and Center for cancer systems biology (CCSB) to efficiently accomplish our aims.

描述（由申请人提供）：来自相关信息来源的数据的速度、种类、数量和准确性使得肿瘤学家收集和审查可以支持患者常规个性化治疗的相关数据极具挑战性。迫切需要开发数据处理方法，包括自然语言处理和信息检索方法，以提取和管理个性化治疗相关的出版物和临床试验。一旦得到管理，生物医学研究人员可以使用结构化数据来生成新的科学假设，设计新的研究，更好地了解疾病的生物学机制，进行荟萃分析，并创建临床决策支持系统。迫切需要开发专用于个性化治疗领域的改进的搜索界面，包括通过以下方式来显示、排名和保存结果的方式：最终用户。虽然存在几种基于数据库和网络的关键词搜索引擎算法，但缺乏满足个性化医疗独特挑战的工具。还迫切需要开发软件，允许使用主题专业知识通过计算方法对提取和排序的信息进行验证和确认，以改进可用于个性化疗法的生物医学研究的金标准语料库。为了解决这些问题，我们将构建一个创新的软件栈（MACE 2K），以适应和扩展经过广泛测试的生物创造性自然语言处理（NLP）工具，以自动检索和预处理来自clinicaltrials.gov的靶向治疗信息，PubMed摘要以及开放获取文章和会议记录。我们将建立一个实体提取盒，以准确解析基因突变，易位，基因表达，蛋白质表达和蛋白质磷酸化。将构建标记物歧义消除测试卡片，以评估试验入选或排除标准，并确定标记物相关的主要终点。我们将包括一个排名盒，它使用标记物，药物和试验的消歧信息，根据其与个性化医疗的相关性提供严格的试验和研究评分。将建立一个新的游戏化模块，以允许主题专家验证和确认信息语料库。我们的研究利用了国家癌症研究所在几个项目（其中许多我们都参与了）中的投资，包括NCI药物词典，国家癌症信息学项目（NCIP），I-SPY试验和癌症系统生物学中心（CCSB），以有效地实现我们的目标。