权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Semantic Text Analytics for Quality Controlled Extraction of Clinical Phenotype Information in Healthcare Integrated Biobanking STACI2B2

用于医疗保健综合生物银行中临床表型信息的质量控制提取的语义文本分析 STACI2B2

基本信息

批准号：
315098900
负责人：
Professor Dr. Udo Hahn
金额：
--
依托单位：
Institut für Germanistische Sprachwissenschaft
依托单位国家：
德国
项目类别：
Research Grants
财政年份：
2016
资助国家：
德国
起止时间：
2015-12-31 至 2022-12-31
项目状态：
已结题

来源：
https://gepris.dfg.de/gepris/projekt/315098900?language=en
关键词：
Semantic Text Analytics Quality Controlled

项目摘要

The growing availability of high-quality biomaterials is a prerequisite for sustainable and reproducible results in translational biomedicine. This observation not only holds for exploratory but also, increasingly, for validating research outcomes. However, some skepticism has already been expressed whether the plethora of results from preclinical studies hold their promises when transferred into clinical practice.Consider, as an example, ongoing research on biomarkers. There is an apparent discrepancy between the multitude of studies on novel biomarkers and the number of clinically validated applications. One major problem is the lacking concern for quality differences in ancillary samples and the insufficient validation of potential markers based on comparison collectives with well-defined diseases and comorbidities differing from the target disease.Whereas in the past the infrastructure for high-quality collection and warehousing of such bio samples has been established at many clinical sites by building and maintaining professional biobanks, what is still lacking are routine workflows to sample valid phenotype data, to determine valid comparison collectives and to properly select samples for high-quality biobanking. In our project, we propose to extract such information from clinical documents using methods from automatic natural language processing. We plan to build a text analytics pipeline using semi-supervised machine learning techniques to harvest medically relevant named entities (such as diseases, drugs, diagnoses) and relations among these entities (such as the effectiveness or dosage of medications relative to a disease and a patient, lab and test data for diagnosis, etc.) from unstructured clinical documents (such as discharge summaries, radiology or pathology reports, etc.). Automatic text analysis will thus form the basis for computing medically relevant context data from the documents contained in the clinical information system of the university hospital in Jena and will instantaneously feed structured evaluation procedures for the real-time selection of samples of well-defined collectives of patients when they enter routine laboratories. At the same time, residual material not needed for further diagnosis can be utilized for building up a repository of comparison samples.Such an information extraction system for German-language clinical documents and its integration into routine clinical workflows is currently not available in any German hospital. Moreover, we stipulate that such a combined effort will have far-reaching implications for future progress in translational medicine which goes beyond the exemplary application to determine validated phenotype data, to select well-defined collectives of patients and to produce high-quality bio materials.

高质量生物材料的不断增长是转化生物医学可持续和可重复结果的先决条件。这一观察结果不仅适用于探索性研究，而且越来越多地适用于验证研究成果。然而，已经有人对临床前研究的大量结果是否能在临床实践中得到应用表示怀疑，例如，正在进行的生物标志物研究。关于新型生物标志物的大量研究与临床验证应用的数量之间存在明显的差异。一个主要问题是缺乏对辅助样本质量差异的关注，以及基于与目标疾病不同的明确定义的疾病和合并症的比较集合的潜在标记物的验证不足。尽管在过去，通过建立和维护专业生物库，已经在许多临床站点建立了用于高质量收集和储存此类生物样本的基础设施，仍然缺乏的是常规工作流程，以采样有效的表型数据，确定有效的比较集合，并适当地选择用于高质量生物库的样品。在我们的项目中，我们建议使用自动自然语言处理的方法从临床文档中提取这些信息。我们计划使用半监督机器学习技术构建一个文本分析管道，以收集与医学相关的命名实体（如疾病、药物、诊断）以及这些实体之间的关系（如药物相对于疾病和患者的有效性或剂量、用于诊断的实验室和测试数据等）。从非结构化的临床文档（如出院总结、放射学或病理学报告等）。因此，自动文本分析将成为从耶拿大学医院临床信息系统中包含的文件中计算医学相关背景数据的基础，并将在患者进入常规实验室时即时提供结构化评估程序，用于实时选择定义明确的患者集体样本。与此同时，不需要进一步诊断的剩余材料可用于建立比较样本库。目前，德国任何一家医院都没有这样的德语临床文件信息提取系统，并将其整合到常规临床工作流程中。此外，我们规定，这种联合努力将对转化医学的未来进展产生深远的影响，超越了确定验证表型数据的示范性应用，选择明确定义的患者群体和生产高质量的生物材料。