Named Entity Recognition and Relationship Extraction in Biomedicine

生物医学中的命名实体识别和关系提取

基本信息

  • 批准号:
    9796762
  • 负责人:
  • 金额:
    $ 225.49万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
  • 资助国家:
    美国
  • 起止时间:
  • 项目状态:
    未结题

项目摘要

Mining useful knowledge from the biomedical literature holds potentials for helping literature searching, automating biological data curation and many other scientific tasks. We have therefore focused on recognizing various types of biological entities in free text, such as gene/proteins, disease/conditions, and drug/chemicals, etc, and their relationships. Synonyms pose another challenge for high quality relevance searches. This is a problem for ordinary words, but it is even more of a difficult for entities that can be named in a number of different ways. LitVar address this problem for genetic variants. For example, searching for one of A146T, c.436G>A, or rs121913527 also finds instances of the other two. The goal is to extend this ability to other entity types. We participated in The CHEMPROT track at BioCreative VI, which aims to assess the state of the art in automatically extracting the chemicalprotein relations in running text (PubMed abstracts). We proposed an ensemble of three systems, including a support vector machine, a convolutional neural network, and a recurrent neural network. Their output is combined using majority voting or stacking for final predictions. Our system obtained 0.7266 in precision and 0.5735 in recall for an F-score of 0.6410 during the challenge, achieving the highest performance among all team submissions during the challenge. In addition to tackling relation extraction tasks with supervised machine-learning methods, we proposed a novel adversarial learning algorithm for unsupervised domain adaptation tasks where no labeled data are available in the target domain. We show domain invariant features can be learned in the latest neural networks such that classifiers trained for one relation type (proteinprotein) can be re-purposed to others (drugdrug). Compared to prior convolutional and recurrent NN-based relation classification methods without domain adaptation, we achieve improvements as high as 30% in F1-score. To further assist NLP tasks without pre-existing training data, we developed ezTag, a web-based annotation tool that allows users to perform annotation and provide training data with humans in the loop. ezTag supports both abstracts in PubMed and full-text articles in PubMed Central. Negative and uncertain medical findings are frequent in radiology reports, but discriminating them from positive findings remains challenging for information extraction. Here, we propose a new algorithm, NegBio, to detect negative and uncertain findings in radiology reports. Unlike previous rule-based methods, NegBio utilizes patterns on universal dependencies to identify the scope of triggers that are indicative of negation or uncertainty. We evaluated NegBio on four datasets, including two public benchmarking corpora of radiology reports, a new radiology corpus that we annotated for this work, and a public corpus of general clinical texts. Evaluation on these datasets demonstrates that NegBio is highly accurate for detecting negative and uncertain findings and compares favorably to the current state of the art. One promising application area for text mining research is to assist manual literature curation, a highly time-consuming and labor-intensive process. In this regard, we applied automated deep learning techniques to the literature triage process of UniProtKB/Swiss-Prot and the NHGRI-EBI GWAS Catalog for genomic variation by collaborating with their database curators. Both the manual curation teams confirmed that our method achieved higher precision than their previous query-based triage methods without compromising recall. Both results show that our method is more efficient and can replace the traditional query-based triage methods of manually curated databases. Our method can give human curators more time to focus on more challenging tasks such as actual curation as well as the discovery of novel papers/experimental techniques to consider for inclusion. Deep learning, a class of machine learning algorithms, has showed impressive results in several of our recent studies as shown above in FY18. In addition to its applications in natural language processing, we have also seen its success in our medical image analysis such as processing chest X-ray images and colors fundus photographs.
从生物医学文献中挖掘有用的知识有助于文献检索,自动化生物数据管理和许多其他科学任务。因此,我们专注于识别自由文本中的各种类型的生物实体,如基因/蛋白质,疾病/条件,药物/化学品等,以及它们之间的关系。 同义词对高质量的相关性搜索提出了另一个挑战。这对于普通的单词来说是个问题,但是对于可以以多种不同方式命名的实体来说,这就更加困难了。LitVar解决了遗传变异的这个问题。例如,搜索A146 T、c.436G>A或rs 121913527中的一个也会找到其他两个的实例。我们的目标是将这种能力扩展到其他实体类型。 我们参加了BioCreative VI的CHEMPROT跟踪,旨在评估自动提取运行文本中的化学蛋白质关系的最新技术水平(PubMed摘要)。我们提出了三个系统的集成,包括支持向量机,卷积神经网络和递归神经网络。他们的输出使用多数投票或叠加进行最终预测。我们的系统在挑战期间获得了0.7266的精确度和0.5735的召回率,F分数为0.6410,在挑战期间的所有团队提交中实现了最高的性能。 除了使用监督机器学习方法处理关系提取任务外,我们还提出了一种新的对抗性学习算法,用于无监督域自适应任务,其中目标域中没有标记数据。我们证明了域不变特征可以在最新的神经网络中学习,这样为一种关系类型(蛋白质)训练的分类器可以重新用于其他关系类型(药物)。与先前的卷积和基于递归NN的关系分类方法相比,没有域自适应,我们在F1分数上实现了高达30%的改进。为了在没有预先存在的训练数据的情况下进一步帮助NLP任务,我们开发了ezTag,这是一种基于Web的注释工具,允许用户执行注释并提供训练数据。ezTag支持PubMed中的摘要和PubMed Central中的全文文章。 在放射学报告中经常出现阴性和不确定的医学发现,但将其与阳性发现区分开来仍然是信息提取的挑战。在这里,我们提出了一种新的算法NegBio,以检测放射学报告中的阴性和不确定结果。与以前的基于规则的方法不同,NegBio利用普遍依赖性的模式来识别指示否定或不确定性的触发器的范围。我们在四个数据集上评估了NegBio,包括两个放射学报告的公共基准语料库,一个我们为这项工作注释的新放射学语料库,以及一个一般临床文本的公共语料库。对这些数据集的评价表明,NegBio在检测阴性和不确定结果方面具有高度准确性,与当前最先进的技术相比具有优势。 文本挖掘研究的一个很有前途的应用领域是辅助人工文献管理,这是一个非常耗时和劳动密集型的过程。在这方面,我们将自动化深度学习技术应用于UniProtKB/Swiss-Prot和NHGRI-EBI GWAS目录的文献分类过程中,通过与其数据库管理员合作进行基因组变异。两个人工策展团队都证实,我们的方法比他们以前基于查询的分类方法实现了更高的精度,而不会影响召回率。这两个结果都表明,我们的方法是更有效的,可以取代传统的基于查询的分类方法的手动策划的数据库。我们的方法可以让人类策展人有更多的时间专注于更具挑战性的任务,例如实际的策展以及发现新的论文/实验技术以考虑纳入。 深度学习是一类机器学习算法,在我们最近的几项研究中显示了令人印象深刻的结果,如上文所示。除了在自然语言处理中的应用外,我们还看到了它在医学图像分析中的成功,例如处理胸部X光图像和彩色眼底照片。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Zhiyong Lu其他文献

Zhiyong Lu的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Zhiyong Lu', 18)}}的其他基金

Named Entity Recognition and Relationship Extraction in Biomedicine
生物医学中的命名实体识别和关系提取
  • 批准号:
    9362446
  • 财政年份:
  • 资助金额:
    $ 225.49万
  • 项目类别:
Query Log Analysis for Improving User Access to NCBI Web Services
用于改善用户对 NCBI Web 服务的访问的查询日志分析
  • 批准号:
    9564626
  • 财政年份:
  • 资助金额:
    $ 225.49万
  • 项目类别:
Machine Learning and Natural Language Processing for Biomedical Applications
生物医学应用的机器学习和自然语言处理
  • 批准号:
    10927050
  • 财政年份:
  • 资助金额:
    $ 225.49万
  • 项目类别:
Named Entity Recognition and Relationship Extraction in Biomedicine
生物医学中的命名实体识别和关系提取
  • 批准号:
    10007525
  • 财政年份:
  • 资助金额:
    $ 225.49万
  • 项目类别:
Automatic Analysis and Annotation of Document Keywords in Biomedical Literature
生物医学文献中文档关键词的自动分析与标注
  • 批准号:
    8149607
  • 财政年份:
  • 资助金额:
    $ 225.49万
  • 项目类别:
Named Entity Recognition and Relationship Extraction in Biomedicine
生物医学中的命名实体识别和关系提取
  • 批准号:
    8558092
  • 财政年份:
  • 资助金额:
    $ 225.49万
  • 项目类别:
Query Log Analysis for Improving User Access to NCBI Web Services
用于改善用户对 NCBI Web 服务的访问的查询日志分析
  • 批准号:
    8344934
  • 财政年份:
  • 资助金额:
    $ 225.49万
  • 项目类别:
Query Log Analysis for Improving User Access to NCBI Web Services
用于改善用户对 NCBI Web 服务的访问的查询日志分析
  • 批准号:
    8943212
  • 财政年份:
  • 资助金额:
    $ 225.49万
  • 项目类别:
Named Entity Recognition and Relationship Extraction in Biomedicine
生物医学中的命名实体识别和关系提取
  • 批准号:
    8943240
  • 财政年份:
  • 资助金额:
    $ 225.49万
  • 项目类别:
Query Log Analysis for Improving User Access to NCBI Web Services
用于改善用户对 NCBI Web 服务的访问的查询日志分析
  • 批准号:
    8558091
  • 财政年份:
  • 资助金额:
    $ 225.49万
  • 项目类别:

相似海外基金

Approximate algorithms and architectures for area efficient system design
区域高效系统设计的近似算法和架构
  • 批准号:
    LP170100311
  • 财政年份:
    2018
  • 资助金额:
    $ 225.49万
  • 项目类别:
    Linkage Projects
AMPS: Rank Minimization Algorithms for Wide-Area Phasor Measurement Data Processing
AMPS:用于广域相量测量数据处理的秩最小化算法
  • 批准号:
    1736326
  • 财政年份:
    2017
  • 资助金额:
    $ 225.49万
  • 项目类别:
    Standard Grant
Low Power, Area Efficient, High Speed Algorithms and Architectures for Computer Arithmetic, Pattern Recognition and Cryptosystems
用于计算机算术、模式识别和密码系统的低功耗、面积高效、高速算法和架构
  • 批准号:
    1686-2013
  • 财政年份:
    2017
  • 资助金额:
    $ 225.49万
  • 项目类别:
    Discovery Grants Program - Individual
Rigorous simulation of speckle fields caused by large area rough surfaces using fast algorithms based on higher order boundary element methods
使用基于高阶边界元方法的快速算法对大面积粗糙表面引起的散斑场进行严格模拟
  • 批准号:
    375876714
  • 财政年份:
    2017
  • 资助金额:
    $ 225.49万
  • 项目类别:
    Research Grants
Low Power, Area Efficient, High Speed Algorithms and Architectures for Computer Arithmetic, Pattern Recognition and Cryptosystems
用于计算机算术、模式识别和密码系统的低功耗、面积高效、高速算法和架构
  • 批准号:
    1686-2013
  • 财政年份:
    2016
  • 资助金额:
    $ 225.49万
  • 项目类别:
    Discovery Grants Program - Individual
Low Power, Area Efficient, High Speed Algorithms and Architectures for Computer Arithmetic, Pattern Recognition and Cryptosystems
用于计算机算术、模式识别和密码系统的低功耗、面积高效、高速算法和架构
  • 批准号:
    1686-2013
  • 财政年份:
    2015
  • 资助金额:
    $ 225.49万
  • 项目类别:
    Discovery Grants Program - Individual
Low Power, Area Efficient, High Speed Algorithms and Architectures for Computer Arithmetic, Pattern Recognition and Cryptosystems
用于计算机算术、模式识别和密码系统的低功耗、面积高效、高速算法和架构
  • 批准号:
    1686-2013
  • 财政年份:
    2014
  • 资助金额:
    $ 225.49万
  • 项目类别:
    Discovery Grants Program - Individual
AREA: Optimizing gene expression with mRNA free energy modeling and algorithms
区域:利用 mRNA 自由能建模和算法优化基因表达
  • 批准号:
    8689532
  • 财政年份:
    2014
  • 资助金额:
    $ 225.49万
  • 项目类别:
CPS: Synergy: Collaborative Research: Distributed Asynchronous Algorithms and Software Systems for Wide-Area Monitoring of Power Systems
CPS:协同:协作研究:用于电力系统广域监控的分布式异步算法和软件系统
  • 批准号:
    1329780
  • 财政年份:
    2013
  • 资助金额:
    $ 225.49万
  • 项目类别:
    Standard Grant
CPS: Synergy: Collaborative Research: Distributed Asynchronous Algorithms and Software Systems for Wide-Area Mentoring of Power Systems
CPS:协同:协作研究:用于电力系统广域指导的分布式异步算法和软件系统
  • 批准号:
    1329745
  • 财政年份:
    2013
  • 资助金额:
    $ 225.49万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了