Named Entity Recognition and Relationship Extraction in Biomedicine

生物医学中的命名实体识别和关系提取

基本信息

  • 批准号:
    8558092
  • 负责人:
  • 金额:
    $ 97.61万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
  • 资助国家:
    美国
  • 起止时间:
  • 项目状态:
    未结题

项目摘要

Mining useful knowledge from the biomedical literature holds potentials for helping literature searching, automating biological data curation and many other scientific tasks. Hence, it is important to be able to recognize various types of biological entities in free text, such as gene/proteins, disease/conditions, and drug/chemicals, etc. Indeed, our previous PubMed log analysis revealed that people search certain biomedical concepts more often than others and that there exist strong associations between different concepts. For example, a disease name often co-occurs with gene/proteins and drug names. Our own research in the past has mostly focused on identifying genes and species in PubMed citations. In 2011-2012, while continuing our efforts in improving gene name recognition, we also turned our attention to disease name detection. Like genes, disease names are also irregular and ambiguous, making them difficult to be identified through simple dictionary look-up methods and an interesting task for the text-mining community. However, due to the lack of adequate training data, there has not been much work focused on disease name identification. To this end, we created a large-scale disease corpus consisting of 6,900 disease names in 793 PubMed abstracts. Developed by a team of 12 annotators (two people per annotation), our data corpus contains rich annotations for every disease occurrence in PubMed abstracts. Furthermore, disease names are categorized into four distinct groups: Specific Disease, Disease Class, Composite Mention and Disease Modifier. When used as the gold standard data for training state-of-the-art machine-learning algorithms, significantly higher performance was found on our data than an existing one with limited annotations. Such characteristics make our disease name corpus a valuable resource for mining disease-related information from biomedical text. Following named entity recognition, we also continued our research from previous years for automatically identifying relationships between various biological entities as an effort to build an end-to-end system that includes both entity recognition and relationship extraction. This year, our research emphasized on extracting pharmacogenomics (PGx) information from free text. Specifically, we developed a systematic approach to automatically identify PGx relationships between genes, drugs and diseases from trial records in ClinicalTrials.gov. In our evaluation, we found that our extracted relationships overlap significantly with the curated factual knowledge through the literature in a PGx database and that most relationships appear on average 5 years earlier in clinical trials than in their corresponding publications, suggesting that clinical trials may be valuable for both validating known and capturing new PGx related information in a more timely manner. Furthermore, two human reviewers judged a portion of computer-generated relationships and found an overall accuracy of 74% for our text-mining approach. This work has practical implications in enriching our existing knowledge on PGx gene-drug-disease relationships as well as suggesting crosslinks between ClinicalTrials.gov and other PGx knowledge bases. As mentioned earlier, one promising application area for text mining research is to assist manual literature curation, a highly time-consuming and labor-intensive process. In this regard, we conducted two separate investigations, one aiming to understand the needs of the curation community and the other directly improve links between literature and biological data. Together with colleagues outside of the NIH, we organized the BioCreative 2012 workshop on Interactive Text Mining in the Biocuration Workflow, an international event for bringing together the biocuration and text mining communities towards the development and evaluation of interactive text mining tools and systems to improve utility and usability in the biocuration workflow. Specifically, we chaired the Workshop Track II entitled Biocuration Workflows and Text Mining where we invited submissions of written descriptions of curation workflows from expert curated databases. We received seven qualified contributions, primarily from model organism databases such as FlyBase. Based on these descriptions, we identified commonalities and differences across the workflows, the common ontologies and controlled vocabularies used and the current and desired uses of text mining for biocuration. Compared to a similar study in 2009, our 2012 results show that many more databases are now using text mining in parts of their curation workflows. In addition, the Track II participants identified text-mining aids for finding gene names and symbols (gene indexing), prioritization of documents for curation (document triage), and ontology concept assignment as those most desired by the biocurators. Our second curation-oriented text mining research focused on directly improving links between literature and biological data. As we all know that in todays biomedical search, high-throughput experiments and bioinformatics techniques are creating an exploding volume of data that are becoming overwhelming to keep track of for biologists and researchers who need to access, analyze and process existing data. Much of the available data are being deposited in specialized databases, such as the Gene Expression Omnibus (GEO) for microarrays or the Protein Data Bank (PDB) for protein structures and coordinates. Data sets are also being described by their authors in publications archived in literature databases such as MEDLINE and PubMed Central. Currently, the curation of links between biological databases and the literature mainly relies on manual labor, which makes it a time-consuming and daunting task. Herein, we analyzed the current state of link curation between GEO, PDB and MEDLINE. We found that the link curation is heterogeneous depending on the sources and databases involved, and that overlap between sources is low, less than 50% for PDB and GEO. Furthermore, we showed that text-mining tools can automatically provide valuable evidence to help curators broaden the scope of articles and database entries that they review. As a result, we made recommendations to improve the coverage of curated links, as well as the consistency of information available from different databases while maintaining high-quality curation.

项目成果

期刊论文数量(6)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Systematic identification of pharmacogenomics information from clinical trials.
  • DOI:
    10.1016/j.jbi.2012.04.005
  • 发表时间:
    2012-10
  • 期刊:
  • 影响因子:
    4.5
  • 作者:
    Li, Jiao;Lu, Zhiyong
  • 通讯作者:
    Lu, Zhiyong
A textual representation scheme for identifying clinical relationships in patient records.
用于识别患者记录中临床关系的文本表示方案。
Author keywords in biomedical journal articles.
生物医学期刊文章中的作者关键词。
A context-blocks model for identifying clinical relationships in patient records.
  • DOI:
    10.1186/1471-2105-12-s3-s3
  • 发表时间:
    2011-06-09
  • 期刊:
  • 影响因子:
    3
  • 作者:
    Islamaj Doğan R;Névéol A;Lu Z
  • 通讯作者:
    Lu Z
SR4GN: a species recognition software tool for gene normalization.
  • DOI:
    10.1371/journal.pone.0038460
  • 发表时间:
    2012
  • 期刊:
  • 影响因子:
    3.7
  • 作者:
    Wei CH;Kao HY;Lu Z
  • 通讯作者:
    Lu Z
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Zhiyong Lu其他文献

Zhiyong Lu的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Zhiyong Lu', 18)}}的其他基金

Named Entity Recognition and Relationship Extraction in Biomedicine
生物医学中的命名实体识别和关系提取
  • 批准号:
    9362446
  • 财政年份:
  • 资助金额:
    $ 97.61万
  • 项目类别:
Query Log Analysis for Improving User Access to NCBI Web Services
用于改善用户对 NCBI Web 服务的访问的查询日志分析
  • 批准号:
    9564626
  • 财政年份:
  • 资助金额:
    $ 97.61万
  • 项目类别:
Machine Learning and Natural Language Processing for Biomedical Applications
生物医学应用的机器学习和自然语言处理
  • 批准号:
    10927050
  • 财政年份:
  • 资助金额:
    $ 97.61万
  • 项目类别:
Named Entity Recognition and Relationship Extraction in Biomedicine
生物医学中的命名实体识别和关系提取
  • 批准号:
    10007525
  • 财政年份:
  • 资助金额:
    $ 97.61万
  • 项目类别:
Automatic Analysis and Annotation of Document Keywords in Biomedical Literature
生物医学文献中文档关键词的自动分析与标注
  • 批准号:
    8149607
  • 财政年份:
  • 资助金额:
    $ 97.61万
  • 项目类别:
Named Entity Recognition and Relationship Extraction in Biomedicine
生物医学中的命名实体识别和关系提取
  • 批准号:
    9796762
  • 财政年份:
  • 资助金额:
    $ 97.61万
  • 项目类别:
Query Log Analysis for Improving User Access to NCBI Web Services
用于改善用户对 NCBI Web 服务的访问的查询日志分析
  • 批准号:
    8344934
  • 财政年份:
  • 资助金额:
    $ 97.61万
  • 项目类别:
Query Log Analysis for Improving User Access to NCBI Web Services
用于改善用户对 NCBI Web 服务的访问的查询日志分析
  • 批准号:
    8943212
  • 财政年份:
  • 资助金额:
    $ 97.61万
  • 项目类别:
Named Entity Recognition and Relationship Extraction in Biomedicine
生物医学中的命名实体识别和关系提取
  • 批准号:
    8943240
  • 财政年份:
  • 资助金额:
    $ 97.61万
  • 项目类别:
Query Log Analysis for Improving User Access to NCBI Web Services
用于改善用户对 NCBI Web 服务的访问的查询日志分析
  • 批准号:
    8558091
  • 财政年份:
  • 资助金额:
    $ 97.61万
  • 项目类别:

相似海外基金

Sediment Drilling Facility for environmental and genetic archives
环境和遗传档案沉积物钻探设施
  • 批准号:
    LE240100064
  • 财政年份:
    2024
  • 资助金额:
    $ 97.61万
  • 项目类别:
    Linkage Infrastructure, Equipment and Facilities
Aerial Archives of Race and American-Occupied Japan
种族和美国占领的日本的航空档案
  • 批准号:
    24K03721
  • 财政年份:
    2024
  • 资助金额:
    $ 97.61万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
CAREER: Understanding biosphere-geosphere coevolution through carbonate-associated phosphate, community archives, and open-access education in rural schools
职业:通过碳酸盐相关磷酸盐、社区档案和农村学校的开放教育了解生物圈-地圈协同进化
  • 批准号:
    2338055
  • 财政年份:
    2024
  • 资助金额:
    $ 97.61万
  • 项目类别:
    Continuing Grant
Designing a Bridging Model Using Learning Content Information LOD to Link School Education and Digital Archives
使用学习内容信息 LOD 设计桥接模型来链接学校教育和数字档案
  • 批准号:
    23H03695
  • 财政年份:
    2023
  • 资助金额:
    $ 97.61万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Doris Lessing's Archives: Communism, Decolonisation and Literary Practice
多丽丝·莱辛档案:共产主义、非殖民化和文学实践
  • 批准号:
    2888789
  • 财政年份:
    2023
  • 资助金额:
    $ 97.61万
  • 项目类别:
    Studentship
Building a sustainable future for anthropology's archives: Researching primary source data lifecycles, infrastructures, and reuse
为人类学档案构建可持续的未来:研究主要源数据生命周期、基础设施和重用
  • 批准号:
    2314762
  • 财政年份:
    2023
  • 资助金额:
    $ 97.61万
  • 项目类别:
    Standard Grant
Reading Writing Lives: Publishing & Preserving Australian Literary Archives
阅读写作生活:出版
  • 批准号:
    DP230101797
  • 财政年份:
    2023
  • 资助金额:
    $ 97.61万
  • 项目类别:
    Discovery Projects
Integrated High-Definition Visualization of Digital Archives for Borobudur Temple
婆罗浮屠寺数字档案集成高清可视化
  • 批准号:
    22KJ3026
  • 财政年份:
    2023
  • 资助金额:
    $ 97.61万
  • 项目类别:
    Grant-in-Aid for JSPS Fellows
Research on multilingual data integration for digital archives of Japanese culture
日本文化数字档案多语言数据集成研究
  • 批准号:
    23K11780
  • 财政年份:
    2023
  • 资助金额:
    $ 97.61万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
A Preliminary Study for Constructing International Network of Image Archives on Afghan Cultural Heritages
构建阿富汗文化遗产国际图像档案网络的初步研究
  • 批准号:
    23K00915
  • 财政年份:
    2023
  • 资助金额:
    $ 97.61万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了