High Performance Text Mining for Translator

翻译者的高性能文本挖掘

基本信息

  • 批准号:
    10334356
  • 负责人:
  • 金额:
    $ 47.12万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2020
  • 资助国家:
    美国
  • 起止时间:
    2020-01-23 至 2022-01-22
  • 项目状态:
    已结题

项目摘要

We propose to build a knowledge provider that will seek out, integrate and provide AI-ready, BioLink-compatible models via high-performance text-mining of the biomedical literature. Problems with Translator’s current mining of the biomedical literature that we intend to solve include: (1) weaknesses in framework extensibility and benchmarking that make integrating and validating new text-mining approaches difficult; (2) problematic licensing of software, terminologies and other resources that do not adequately support FAIR (and TLC) best practices; (3) processing only PubMed titles and abstracts, not full text publications; (4) Translator’s use of older NLP technology with relatively poor performance; (5) lack of a mechanism for community feedback regarding errors and other problems; (6) lack of continuous updates to add knowledge from new publications; (7) output knowledge representation that is simplistic and vague, failing to reflect the richness of what is expressed in scientific documents. Plan for implementation: Our team has a long history of productive NLP research, successful open source software projects, effective benchmarking and broad community engagement. We will build on the results of NLM-funded work in information extraction, our gold-standard Colorado Richly Annotated Full Text (CRAFT) corpus, a recent BioNLP Open Shared Task (BioNLP-OST) that we organized, and recent advances in state-of-the-art NLP. For Segment 1, we will: (1) Demonstrate BioStacks, an extensible, cloud-based text-mining framework that produces knowledge graphs grounded in the Open Biomedical Ontologies (OBOs). This BioStacks demo will include a state-of-the-art OBO concept recognizer for multiple ontologies, a state-of-the-art semantic relationship prediction tool, and a state-of-the-art structural analysis tool. All generated assertions will have provenance metadata linking the assertion to a particular text span in a document specified by PMCID. (2) Demonstrate CRAFTST, a cloud-based text-mining evaluation system that evaluates the performance of text-mining systems against the CRAFT gold standard. (3) Demonstrate an adaptive machine learning process illustrating how to efficiently create tools to extract BioLink association types. For Segment 2, we propose to extend the text-mining and evaluation frameworks to align with BioLink and the Translator community, improve text-mining quality and expand the collection of source documents mined. Specifically, we propose to target 10 long term milestones: (1) Align CRAFT to BioLink. (2) Develop new tools for extracting associations from text. (3) Develop and manage a community engagement process on text-mining for Translator. (4) Extend benchmarking. (5) Improve recall. (6) Improve precision. (7) Improve computational efficiency. (8) Expand BioStacks to include all available full text biomedical journal articles. (9) Expand document collections to include Patents & Regulatory filings. (10) Develop a scientist-based movement to improve document access for text-mining from non-open publishers. The types of questions the resulting knowledge graph can be used to address are extremely broad, as it is generated by mining a large part of the biomedical literature. Questions that can be answered include those about specific assertions (e.g. is this drug an agonist-activator of this protein?), general relations (are these two proteins often mentioned together?), and documents (which publications mention this gene, mutation and drug?). Integration: We are long-time contributors to the open-science community and have longstanding collaborations with existing awardees; we were participants in the NIH Data Commons Pilot. We propose to align the output of text-mining tools to the BioLink model via OBO terms. We propose to implement our frameworks in NIH Cloud Computing environments. We propose to adopt the CD2H Contributor Attribution Model to foreground community contributions. We plan to coordinate with the NLM’s nascent benchmarking activities and the SmartAPI effort to build Translator standard interfaces. Challenges and gaps: High-performance mining of rich, contextualized knowledge from the literature remains a difficult task, and is unlikely to be solved in the next five years. Many important publications remain inaccessible to text-mining due to restrictive licensing.
我们建议建立一个知识提供者,将寻找,整合和提供人工智能就绪,

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

LAWRENCE E HUNTER其他文献

LAWRENCE E HUNTER的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('LAWRENCE E HUNTER', 18)}}的其他基金

Scientific Questions: A New Target for Biomedical NLP
科学问题:生物医学 NLP 的新目标
  • 批准号:
    10223438
  • 财政年份:
    2020
  • 资助金额:
    $ 47.12万
  • 项目类别:
Scientific Questions: A New Target for Biomedical NLP
科学问题:生物医学 NLP 的新目标
  • 批准号:
    10454968
  • 财政年份:
    2020
  • 资助金额:
    $ 47.12万
  • 项目类别:
High Performance Text Mining for Translator
翻译者的高性能文本挖掘
  • 批准号:
    10548337
  • 财政年份:
    2020
  • 资助金额:
    $ 47.12万
  • 项目类别:
Colorado Biomedical Informatics Training Program
科罗拉多州生物医学信息学培训计划
  • 批准号:
    9526127
  • 财政年份:
    2017
  • 资助金额:
    $ 47.12万
  • 项目类别:
Automated Literature Mining for Validation of High-Throughput Function Prediction
用于验证高通量函数预测的自动文献挖掘
  • 批准号:
    7843633
  • 财政年份:
    2009
  • 资助金额:
    $ 47.12万
  • 项目类别:
Construction of a Full Text Corpus for Biomedical Text Mining
生物医学文本挖掘全文语料库的构建
  • 批准号:
    7872692
  • 财政年份:
    2009
  • 资助金额:
    $ 47.12万
  • 项目类别:
Computational Bioscience Program Training Grant
计算生物科学计划培训补助金
  • 批准号:
    7824978
  • 财政年份:
    2009
  • 资助金额:
    $ 47.12万
  • 项目类别:
Computational Bioscience Program Training Grant
计算生物科学计划培训补助金
  • 批准号:
    7877947
  • 财政年份:
    2007
  • 资助金额:
    $ 47.12万
  • 项目类别:
Colorado Biomedical Informatics Training Program
科罗拉多州生物医学信息学培训计划
  • 批准号:
    8261523
  • 财政年份:
    2007
  • 资助金额:
    $ 47.12万
  • 项目类别:
Ontologies and Biomedical Language Processing
本体论和生物医学语言处理
  • 批准号:
    7364235
  • 财政年份:
    2007
  • 资助金额:
    $ 47.12万
  • 项目类别:

相似海外基金

How novices write code: discovering best practices and how they can be adopted
新手如何编写代码:发现最佳实践以及如何采用它们
  • 批准号:
    2315783
  • 财政年份:
    2023
  • 资助金额:
    $ 47.12万
  • 项目类别:
    Standard Grant
One or Several Mothers: The Adopted Child as Critical and Clinical Subject
一位或多位母亲:收养的孩子作为关键和临床对象
  • 批准号:
    2719534
  • 财政年份:
    2022
  • 资助金额:
    $ 47.12万
  • 项目类别:
    Studentship
A material investigation of the ceramic shards excavated from the Omuro Ninsei kiln site: Production techniques adopted by Nonomura Ninsei.
对大室仁清窑遗址出土的陶瓷碎片进行材质调查:野野村仁清采用的生产技术。
  • 批准号:
    20K01113
  • 财政年份:
    2020
  • 资助金额:
    $ 47.12万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
A comparative study of disabled children and their adopted maternal figures in French and English Romantic Literature
英法浪漫主义文学中残疾儿童及其收养母亲形象的比较研究
  • 批准号:
    2633211
  • 财政年份:
    2020
  • 资助金额:
    $ 47.12万
  • 项目类别:
    Studentship
A comparative study of disabled children and their adopted maternal figures in French and English Romantic Literature
英法浪漫主义文学中残疾儿童及其收养母亲形象的比较研究
  • 批准号:
    2436895
  • 财政年份:
    2020
  • 资助金额:
    $ 47.12万
  • 项目类别:
    Studentship
A comparative study of disabled children and their adopted maternal figures in French and English Romantic Literature
英法浪漫主义文学中残疾儿童及其收养母亲形象的比较研究
  • 批准号:
    2633207
  • 财政年份:
    2020
  • 资助金额:
    $ 47.12万
  • 项目类别:
    Studentship
A Study on Mutual Funds Adopted for Individual Defined Contribution Pension Plans
个人设定缴存养老金计划采用共同基金的研究
  • 批准号:
    19K01745
  • 财政年份:
    2019
  • 资助金额:
    $ 47.12万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
The limits of development: State structural policy, comparing systems adopted in two European mountain regions (1945-1989)
发展的限制:国家结构政策,比较欧洲两个山区采用的制度(1945-1989)
  • 批准号:
    426559561
  • 财政年份:
    2019
  • 资助金额:
    $ 47.12万
  • 项目类别:
    Research Grants
Securing a Sense of Safety for Adopted Children in Middle Childhood
确保被收养儿童的中期安全感
  • 批准号:
    2236701
  • 财政年份:
    2019
  • 资助金额:
    $ 47.12万
  • 项目类别:
    Studentship
Structural and functional analyses of a bacterial protein translocation domain that has adopted diverse pathogenic effector functions within host cells
对宿主细胞内采用多种致病效应功能的细菌蛋白易位结构域进行结构和功能分析
  • 批准号:
    415543446
  • 财政年份:
    2019
  • 资助金额:
    $ 47.12万
  • 项目类别:
    Research Fellowships
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了