Textpresso, information retrieval and extraction system for biological literature

Textpresso,生物文献信息检索和提取系统

基本信息

项目摘要

DESCRIPTION (provided by applicant): An information retrieval and extraction system that processes the full text of biological papers will be developed. A prototype system has been in operation at WormBase for over a year, used by C. elegans researchers as well as WormBase biological curators, and has recently been implemented for yeast at SGD. The system, called Textpresso, separates text into sentences, and labels words and phrases according to an ontology (an organized lexicon), and allows queries to be performed on a database of labeled sentences. The current ontology comprises 37 categories of terms, such as "gene," "regulation," "method," etc. Extraction of particular biological facts, such as gene-gene interactions, can be accelerated significantly by ontologies, with Textpresso automatically performing nearly as well as expert curators to identify sentences; in searches for two uniquely named genes and an interaction term, the ontology confers a threefold increase of search efficiency. This system will be further developed in three ways. First, the core system will be refined and altered to allow expansion to multiple domains of interest, e.g., model organisms, human disease. Simple modifications to the system and website functionality will be made, including synonym, search phrases, and case-sensitivity. A software package for local installation will be supported. The project team will maintain the Textpresso site (www.textpresso.org). which will include C. elegans and pilot systems, but software package will be available for installation of Textpresso at local sites, e.g., SGD, Flybase etc. Second, the ontology will be structured somewhat more deeply and lexica expanded for organism and field specific terms. Third, algorithms for information extraction will be implemented. One approach will be the implementation of similarity measures using categories (high level nodes) of the Textpresso ontology to reduce the dimensionality of associated vector spaces. A second approach will be the development of hidden Markov models to fill slots of a fact template based on the marked-up text. Information extracted will be presented to the user or expert curator. Public Description: The quality and pace of research depends upon rapid access to published information. This project will provide researchers with a search engine that rapidly gives them detailed, technical information they want by indexing the complete text of research articles.
描述(由申请人提供): 一个处理生物学论文全文的信息检索和提取系统将在 开发一个原型系统已经在WormBase上运行了一年多,由C。elegans 研究人员以及WormBase生物策展人,最近已在SGD的酵母中实施。该系统被称为文本分析,将文本分成句子,并根据本体(有组织的词典)标记单词和短语,并允许在标记句子的数据库上执行查询。目前的本体包括37个类别的术语,如“基因”,“调控”,”方法”等。提取特定的生物事实,如基因-基因相互作用,可以通过本体显着加速,与文本自动执行几乎以及专家策展人识别句子;在搜索两个唯一命名的基因和一个相互作用项时,本体赋予搜索效率三倍的增加。将从三个方面进一步发展这一制度。首先,核心系统将被改进和改变,以允许扩展到多个感兴趣的领域,例如,模式生物人类疾病将对系统和网站功能进行简单修改,包括同义词、搜索短语和区分大小写。将支持本地安装的软件包。项目小组将维护Textbook网站(www.textpresso.org)。其中包括C。elegans和试点系统,但软件包将可用于在本地站点安装TextData,例如,其次,本体将被更深入地构造,并且词汇将针对有机体和领域进行扩展 具体条款。第三,将实施信息提取算法。一种方法将是使用文本本体的类别(高级节点)来实现相似性度量,以降低相关向量空间的维度。第二种方法是开发隐马尔可夫模型,以根据标记文本填充事实模板的插槽。提取的信息将呈现给用户或专家策展人。 公开说明:研究的质量和速度取决于对已发表信息的快速访问。该项目将为研究人员提供一个搜索引擎,通过索引研究文章的全文,迅速为他们提供所需的详细技术信息。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

PAUL Warren STERNBERG其他文献

PAUL Warren STERNBERG的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('PAUL Warren STERNBERG', 18)}}的其他基金

Curation at scale: Integrating AI into community curation
大规模策展:将人工智能融入社区策展
  • 批准号:
    10621338
  • 财政年份:
    2021
  • 资助金额:
    $ 29.13万
  • 项目类别:
Curation at scale: Integrating AI into community curation
大规模策展:将人工智能融入社区策展
  • 批准号:
    10344771
  • 财政年份:
    2021
  • 资助金额:
    $ 29.13万
  • 项目类别:
Bipartite gene expression system for C. elegans genetic and neural circuit analysis
用于线虫遗传和神经回路分析的二分基因表达系统
  • 批准号:
    9437389
  • 财政年份:
    2017
  • 资助金额:
    $ 29.13万
  • 项目类别:
Genetics 2012: Model Organism to Human Cancer
遗传学 2012:人类癌症模型生物
  • 批准号:
    8319996
  • 财政年份:
    2012
  • 资助金额:
    $ 29.13万
  • 项目类别:
C. elegans transcriptional regulatory elements
线虫转录调控元件
  • 批准号:
    8064423
  • 财政年份:
    2010
  • 资助金额:
    $ 29.13万
  • 项目类别:
C. elegans transcriptional regulatory elements
线虫转录调控元件
  • 批准号:
    8258290
  • 财政年份:
    2010
  • 资助金额:
    $ 29.13万
  • 项目类别:
C. elegans transcriptional regulatory elements
线虫转录调控元件
  • 批准号:
    8460166
  • 财政年份:
    2010
  • 资助金额:
    $ 29.13万
  • 项目类别:
C. elegans transcriptional regulatory elements
线虫转录调控元件
  • 批准号:
    7785896
  • 财政年份:
    2010
  • 资助金额:
    $ 29.13万
  • 项目类别:
Textpresso, information retrieval and extraction system for biological literature
Textpresso,生物文献信息检索和提取系统
  • 批准号:
    7347569
  • 财政年份:
    2006
  • 资助金额:
    $ 29.13万
  • 项目类别:
Textpresso, an information retrieval and extraction system for biological literat
Textpresso,生物文学信息检索和提取系统
  • 批准号:
    7047977
  • 财政年份:
    2006
  • 资助金额:
    $ 29.13万
  • 项目类别:

相似海外基金

How novices write code: discovering best practices and how they can be adopted
新手如何编写代码:发现最佳实践以及如何采用它们
  • 批准号:
    2315783
  • 财政年份:
    2023
  • 资助金额:
    $ 29.13万
  • 项目类别:
    Standard Grant
One or Several Mothers: The Adopted Child as Critical and Clinical Subject
一位或多位母亲:收养的孩子作为关键和临床对象
  • 批准号:
    2719534
  • 财政年份:
    2022
  • 资助金额:
    $ 29.13万
  • 项目类别:
    Studentship
A material investigation of the ceramic shards excavated from the Omuro Ninsei kiln site: Production techniques adopted by Nonomura Ninsei.
对大室仁清窑遗址出土的陶瓷碎片进行材质调查:野野村仁清采用的生产技术。
  • 批准号:
    20K01113
  • 财政年份:
    2020
  • 资助金额:
    $ 29.13万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
A comparative study of disabled children and their adopted maternal figures in French and English Romantic Literature
英法浪漫主义文学中残疾儿童及其收养母亲形象的比较研究
  • 批准号:
    2633211
  • 财政年份:
    2020
  • 资助金额:
    $ 29.13万
  • 项目类别:
    Studentship
A comparative study of disabled children and their adopted maternal figures in French and English Romantic Literature
英法浪漫主义文学中残疾儿童及其收养母亲形象的比较研究
  • 批准号:
    2436895
  • 财政年份:
    2020
  • 资助金额:
    $ 29.13万
  • 项目类别:
    Studentship
A comparative study of disabled children and their adopted maternal figures in French and English Romantic Literature
英法浪漫主义文学中残疾儿童及其收养母亲形象的比较研究
  • 批准号:
    2633207
  • 财政年份:
    2020
  • 资助金额:
    $ 29.13万
  • 项目类别:
    Studentship
A Study on Mutual Funds Adopted for Individual Defined Contribution Pension Plans
个人设定缴存养老金计划采用共同基金的研究
  • 批准号:
    19K01745
  • 财政年份:
    2019
  • 资助金额:
    $ 29.13万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
The limits of development: State structural policy, comparing systems adopted in two European mountain regions (1945-1989)
发展的限制:国家结构政策,比较欧洲两个山区采用的制度(1945-1989)
  • 批准号:
    426559561
  • 财政年份:
    2019
  • 资助金额:
    $ 29.13万
  • 项目类别:
    Research Grants
Securing a Sense of Safety for Adopted Children in Middle Childhood
确保被收养儿童的中期安全感
  • 批准号:
    2236701
  • 财政年份:
    2019
  • 资助金额:
    $ 29.13万
  • 项目类别:
    Studentship
Structural and functional analyses of a bacterial protein translocation domain that has adopted diverse pathogenic effector functions within host cells
对宿主细胞内采用多种致病效应功能的细菌蛋白易位结构域进行结构和功能分析
  • 批准号:
    415543446
  • 财政年份:
    2019
  • 资助金额:
    $ 29.13万
  • 项目类别:
    Research Fellowships
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了