Extraction of biomedical knowledge from literature and its systematization

文献中生物医学知识的提取及其系统化

基本信息

  • 批准号:
    12208001
  • 负责人:
  • 金额:
    $ 117.25万
  • 依托单位:
  • 依托单位国家:
    日本
  • 项目类别:
    Grant-in-Aid for Scientific Research on Priority Areas
  • 财政年份:
    2000
  • 资助国家:
    日本
  • 起止时间:
    2000 至 2004
  • 项目状态:
    已结题

项目摘要

It is indispensable to develop databases of gene and protein interactions and their functions extracted from literature so that we can systematically understand lives based on flood of biological data such as genome sequences, gene expressions, and interactions between molecules. From this perspective, we have been tackling two challenges, that is, 1) automatically extracting knowledge of biological functions from literature and 2) representing and utilizing the extracted knowledge on computers. Followings are brief descriptions of our efforts.a)We developed a knowledge extraction system. We almost established a method of extracting information of gene / protein / chemical compounds interaction from literature. Our system achieved a recall of about 50 % and a precision of about 90 %.b)We developed dictionaries of gene names and gene family names that are used for identifying those names in literature. GENA, one of the dictionaries, stores about 880,000-gene names and, depending on organisms, covers 90-95 % of all the genes appearing in literature). By using the dictionaries and the above mentioned extraction system, we developed and published an interaction database called PRIME and a dictionary of biological functional terms. PRIME stores about three million interactions of six eukaryotes such as human and rat.c)We prepared a corpus and an ontology for knowledge extraction. To develop and evaluate a knowledge extraction system, a tagged corpus and an ontology of defining domain specific terms are needed. We, therefore, developed and published the GENIA corpus that is composed from 2,000 MEDLINE abstracts whose terms are given semantic and part-of-speech tags accordingly. In addition, we developed the GENIA ontology to be used for adding semantic tags to terms in literature.
从文献中提取基因和蛋白质相互作用及其功能的数据库是必不可少的,这样我们就可以根据大量的生物数据,如基因组序列,基因表达和分子之间的相互作用,系统地了解生命。从这个角度来看,我们一直在解决两个挑战,即1)从文献中自动提取生物功能的知识,2)在计算机上表示和利用提取的知识。以下是我们的工作的简要描述。a)我们开发了一个知识提取系统。我们几乎建立了一种从文献中提取基因/蛋白质/化合物相互作用信息的方法。我们的系统实现了约50%的召回率和约90%的准确率。B)我们开发了基因名称和基因家族名称的词典,用于在文献中识别这些名称。GENA是字典之一,存储了大约880,000个基因名称,并且根据生物体的不同,涵盖了文献中出现的所有基因的90- 95%)。利用这些词典和上述提取系统,我们开发并出版了一个名为PRIME的交互作用数据库和一本生物功能术语词典。PRIME存储了人类和大鼠等六种真核生物的大约300万次交互。c)我们准备了一个语料库和一个本体用于知识提取。为了开发和评估一个知识抽取系统,需要一个带标签的语料库和一个定义特定领域术语的本体。因此,我们开发并发布了GENIA语料库,该语料库由2,000篇MEDLINE摘要组成,其术语相应地被赋予语义和词性标签。此外,我们还开发了GENIA本体,用于为文献中的术语添加语义标签。

项目成果

期刊论文数量(285)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
An Integrative Model for Representation of Signaling Pathways on the Basis of Device Ontology
A Machine Learning Approach to Acronym Generation
  • DOI:
    10.3115/1641484.1641488
  • 发表时间:
    2005-06
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yoshimasa Tsuruoka;S. Ananiadou;Junichi Tsujii
  • 通讯作者:
    Yoshimasa Tsuruoka;S. Ananiadou;Junichi Tsujii
Assessment of prediction accuracy of protein function from protein-protein interaction data
  • DOI:
    10.1002/yea.706
  • 发表时间:
    2001-04-01
  • 期刊:
  • 影响因子:
    2.6
  • 作者:
    Hishigaki, H;Nakai, K;Takagi, T
  • 通讯作者:
    Takagi, T
ALICE: An algorithm to extract abbreviations from MEDLINE
JSNP: a database of common gene variations in the Japanese population
  • DOI:
    10.1093/nar/30.1.158
  • 发表时间:
    2002-01-01
  • 期刊:
  • 影响因子:
    14.9
  • 作者:
    Hirakawa, M;Tanaka, T;Nakamura, Y
  • 通讯作者:
    Nakamura, Y
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

TAKAGI Toshihisa其他文献

TAKAGI Toshihisa的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('TAKAGI Toshihisa', 18)}}的其他基金

Reconstruction and Analysis of Life Systems Using Knowledge-Processing Technology
利用知识处理技术重建和分析生命系统
  • 批准号:
    17017002
  • 财政年份:
    2005
  • 资助金额:
    $ 117.25万
  • 项目类别:
    Grant-in-Aid for Scientific Research on Priority Areas
Support for data analysis and sharing
支持数据分析和共享
  • 批准号:
    17020001
  • 财政年份:
    2005
  • 资助金额:
    $ 117.25万
  • 项目类别:
    Grant-in-Aid for Scientific Research on Priority Areas
Systems genomics towards system-level understanding of life
系统基因组学对生命的系统级理解
  • 批准号:
    16063101
  • 财政年份:
    2004
  • 资助金额:
    $ 117.25万
  • 项目类别:
    Grant-in-Aid for Scientific Research on Priority Areas
Genome Information Science
基因组信息科学
  • 批准号:
    12207001
  • 财政年份:
    2000
  • 资助金额:
    $ 117.25万
  • 项目类别:
    Grant-in-Aid for Scientific Research on Priority Areas

相似海外基金

III: Small: Collaborative Research: Supporting Efficient Discrete Box Queries for Sequence Analysis on Large Scale Genome Databases
III:小型:协作研究:支持高效离散框查询以进行大规模基因组数据库的序列分析
  • 批准号:
    1319909
  • 财政年份:
    2013
  • 资助金额:
    $ 117.25万
  • 项目类别:
    Standard Grant
III: Small: Collaborative Research: Supporting Efficient Discrete Box Queries for Sequence Analysis on Large Scale Genome Databases
III:小型:协作研究:支持高效离散框查询以进行大规模基因组数据库的序列分析
  • 批准号:
    1320078
  • 财政年份:
    2013
  • 资助金额:
    $ 117.25万
  • 项目类别:
    Standard Grant
Estimation of an alternative biosynthetic pathway for co-factors by close investigation of genome databases.
通过仔细研究基因组数据库来估计辅因子的替代生物合成途径。
  • 批准号:
    24651235
  • 财政年份:
    2012
  • 资助金额:
    $ 117.25万
  • 项目类别:
    Grant-in-Aid for Challenging Exploratory Research
The MetaCyc & BioCyc Pathway/Genome Databases (SRI Proposal ECU 14-630)
元循环
  • 批准号:
    8886807
  • 财政年份:
    2007
  • 资助金额:
    $ 117.25万
  • 项目类别:
The MetaCyc & BioCyc Pathway/Genome Databases (SRI Proposal ECU 14-630)
元循环
  • 批准号:
    9066710
  • 财政年份:
    2007
  • 资助金额:
    $ 117.25万
  • 项目类别:
The MetaCyc and BioCyc Pathway/Genome Databases [SRI Proposal ECU 10-626]
MetaCyc 和 BioCyc 通路/基因组数据库 [SRI 提案 ECU 10-626]
  • 批准号:
    8109015
  • 财政年份:
    2007
  • 资助金额:
    $ 117.25万
  • 项目类别:
The MetaCyc and BioCyc Pathway/Genome Databases
MetaCyc 和 BioCyc 通路/基因组数据库
  • 批准号:
    7810709
  • 财政年份:
    2007
  • 资助金额:
    $ 117.25万
  • 项目类别:
BULK-LOADING & PERFORMANCE STUDIES OF THE ND-TREE FOR LARGE GENOME DATABASES
散装
  • 批准号:
    7610287
  • 财政年份:
    2007
  • 资助金额:
    $ 117.25万
  • 项目类别:
The MetaCyc & BioCyc Pathway/Genome Databases
元循环
  • 批准号:
    10242121
  • 财政年份:
    2007
  • 资助金额:
    $ 117.25万
  • 项目类别:
The MetaCyc and BioCyc Pathway/Genome Databases
MetaCyc 和 BioCyc 通路/基因组数据库
  • 批准号:
    7450885
  • 财政年份:
    2007
  • 资助金额:
    $ 117.25万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了