Incorporating Image-based Features into Biomedical Document Classification

将基于图像的特征纳入生物医学文档分类

基本信息

  • 批准号:
    9762175
  • 负责人:
  • 金额:
    $ 46.3万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2017
  • 资助国家:
    美国
  • 起止时间:
    2017-09-14 至 2021-08-31
  • 项目状态:
    已结题

项目摘要

The proposed research aims to develop and advance tools for using image-data appearing in scientific publications, in addition to text, in order to support beneficial, targeted access to the biomedical literature. The number of biomedical publications grows at a rate of over one million new publications per year. Identifying relevant information requires scientists and physicians to scan daily through a myriad of papers. For scientific database curators (bio-curators, in organizations such as Jackson Labs or UniProt), the task is particularly onerous, as they must identify articles most significant to the database, locate within them high-quality evidence concerning disease, genes/proteins and mutations, and curate the findings in database entries along with references to relevant evidence in the articles. Notably, much of the evidence within publications lies in figures. Thus, images are rich and essential indicators for relevance. While biomedical text mining tools are being developed to expedite search for information within publications, several competitive shared tasks underscored the need for more effective tools to overcome the bottleneck for bio-curation and for scientific discovery. Moreover, bio-curators point-out the importance of images as a key information source. While image analysis is an active research field, most current work on biomedical image processing focuses on image identification, understanding and indexing; Not on images as aids to document analysis. Similarly, most work on biomedical literature mining focuses on text alone. Thus, little has been done so far to utilize, in addition to text, images within publications that provide important cues about the relevance of the information embedded in articles. Our premise, supported by bio-curators experience, is that information derived from images can (and should) be directly incorporated into biomedical document retrieval and classification, and will improve accurate identification of relevant articles (for a given user’s needs) while pin-pointing significant evidence within them. We will comprehensively identify, develop and compare informative image-features, develop methods and tools for representing both images and documents based on such features, and introduce means to effectively integrate image-based data into the text-based document classification process. The work will comprise the following fundamental tasks: A) Building robust tools for harvesting images from PDF articles and segmenting compound figures into individual image-panels; B) Identification and investigation of highly-informative features for biomedical image-representation, and categorization of biomedical images into significant types and classes; C) Effective representation of documents using text and image, and integration of text-based and image-based classifiers. We anchor our research in genuine needs, secure access to much image data, and strive for broad-applicability of the results, by working within several broad and diverse curation-areas within institutes with which we collaborate: Evidence for gene-expression & phenotypes in Mouse (Jackson Labs) and in worm (WormBase), and experimental evidence for protein-protein interaction (Protein Information Resource). The work on this project will result in new methods and tools that take advantage of both image- and text-data, facilitating more effective and focused retrieval and mining, thus better supporting bio-curation and data-intensive biomedical discovery.
拟议的研究旨在开发和推进使用科学出版物中出现的图像数据的工具,以及文本,以支持对生物医学文献的有益的,有针对性的访问。生物医学出版物的数量以每年超过100万份新出版物的速度增长。识别相关信息需要科学家和医生每天浏览无数的论文。对于科学数据库策展人(生物策展人,在杰克逊实验室或UniProt等组织中)来说,任务特别繁重,因为他们必须识别对数据库最重要的文章,在其中找到关于疾病,基因/蛋白质和突变的高质量证据,并在数据库条目中策划发现沿着文章中相关证据的引用。值得注意的是,出版物中的许多证据都是数字。因此,图像是丰富和重要的相关性指标。 虽然正在开发生物医学文本挖掘工具,以加快搜索出版物中的信息,几个竞争性的共同任务强调,需要更有效的工具,以克服瓶颈的生物管理和科学发现。此外,生物策展人指出图像作为关键信息源的重要性。虽然图像分析是一个活跃的研究领域,但目前生物医学图像处理的大多数工作都集中在图像识别,理解和索引上,而不是将图像作为文档分析的辅助工具。同样,大多数生物医学文献挖掘工作仅关注文本。因此,到目前为止,除了文字之外,几乎没有做过什么来利用出版物中的图像,这些图像提供了关于文章中嵌入的信息的相关性的重要线索。 我们的前提,支持生物策展人的经验,是来自图像的信息可以(也应该)直接纳入生物医学文档检索和分类,并将提高相关文章的准确识别(对于给定的用户的需求),同时指出其中的重要证据。我们将全面识别,开发和比较信息图像功能,开发方法和工具,用于表示基于这些功能的图像和文档,并引入有效地将基于图像的数据集成到基于文本的文档分类过程中的方法。这项工作将包括以下基本任务:A)建立强大的工具,用于从PDF文章中获取图像,并将复合图形分割成单独的图像面板; B)识别和调查生物医学图像表示的高信息量特征,并将生物医学图像分类为重要类型和类别; C)使用文本和图像有效地表示文档,以及基于文本和基于图像的分类器的集成。我们锚我们的研究在真正的需求,安全访问大量的图像数据,并争取广泛的适用性的结果,通过在几个广泛的和不同的策展领域内与我们合作的机构:在小鼠(杰克逊实验室)和蠕虫(WormBase)的基因表达和表型的证据,蛋白质相互作用的实验证据(蛋白质信息资源)。该项目的工作将产生新的方法和工具,利用图像和文本数据,促进更有效和更有针对性的检索和挖掘,从而更好地支持生物管理和数据密集型生物医学发现。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Georgeta-Elisabeta Marai其他文献

Georgeta-Elisabeta Marai的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Georgeta-Elisabeta Marai', 18)}}的其他基金

Incorporating Image-based Features into Biomedical Document Classification
将基于图像的特征纳入生物医学文档分类
  • 批准号:
    9457095
  • 财政年份:
    2017
  • 资助金额:
    $ 46.3万
  • 项目类别:

相似国自然基金

层出镰刀菌氮代谢调控因子AreA 介导伏马菌素 FB1 生物合成的作用机理
  • 批准号:
    2021JJ40433
  • 批准年份:
    2021
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
寄主诱导梢腐病菌AreA和CYP51基因沉默增强甘蔗抗病性机制解析
  • 批准号:
    32001603
  • 批准年份:
    2020
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
AREA国际经济模型的移植.改进和应用
  • 批准号:
    18870435
  • 批准年份:
    1988
  • 资助金额:
    2.0 万元
  • 项目类别:
    面上项目

相似海外基金

Onboarding Rural Area Mathematics and Physical Science Scholars
农村地区数学和物理科学学者的入职
  • 批准号:
    2322614
  • 财政年份:
    2024
  • 资助金额:
    $ 46.3万
  • 项目类别:
    Standard Grant
TRACK-UK: Synthesized Census and Small Area Statistics for Transport and Energy
TRACK-UK:交通和能源综合人口普查和小区域统计
  • 批准号:
    ES/Z50290X/1
  • 财政年份:
    2024
  • 资助金额:
    $ 46.3万
  • 项目类别:
    Research Grant
Wide-area low-cost sustainable ocean temperature and velocity structure extraction using distributed fibre optic sensing within legacy seafloor cables
使用传统海底电缆中的分布式光纤传感进行广域低成本可持续海洋温度和速度结构提取
  • 批准号:
    NE/Y003365/1
  • 财政年份:
    2024
  • 资助金额:
    $ 46.3万
  • 项目类别:
    Research Grant
Point-scanning confocal with area detector
点扫描共焦与区域检测器
  • 批准号:
    534092360
  • 财政年份:
    2024
  • 资助金额:
    $ 46.3万
  • 项目类别:
    Major Research Instrumentation
Collaborative Research: Scalable Manufacturing of Large-Area Thin Films of Metal-Organic Frameworks for Separations Applications
合作研究:用于分离应用的大面积金属有机框架薄膜的可扩展制造
  • 批准号:
    2326714
  • 财政年份:
    2024
  • 资助金额:
    $ 46.3万
  • 项目类别:
    Standard Grant
Collaborative Research: Scalable Manufacturing of Large-Area Thin Films of Metal-Organic Frameworks for Separations Applications
合作研究:用于分离应用的大面积金属有机框架薄膜的可扩展制造
  • 批准号:
    2326713
  • 财政年份:
    2024
  • 资助金额:
    $ 46.3万
  • 项目类别:
    Standard Grant
Unlicensed Low-Power Wide Area Networks for Location-based Services
用于基于位置的服务的免许可低功耗广域网
  • 批准号:
    24K20765
  • 财政年份:
    2024
  • 资助金额:
    $ 46.3万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
RAPID: Collaborative Research: Multifaceted Data Collection on the Aftermath of the March 26, 2024 Francis Scott Key Bridge Collapse in the DC-Maryland-Virginia Area
RAPID:协作研究:2024 年 3 月 26 日 DC-马里兰-弗吉尼亚地区 Francis Scott Key 大桥倒塌事故后果的多方面数据收集
  • 批准号:
    2427233
  • 财政年份:
    2024
  • 资助金额:
    $ 46.3万
  • 项目类别:
    Standard Grant
RAPID: Collaborative Research: Multifaceted Data Collection on the Aftermath of the March 26, 2024 Francis Scott Key Bridge Collapse in the DC-Maryland-Virginia Area
RAPID:协作研究:2024 年 3 月 26 日 DC-马里兰-弗吉尼亚地区 Francis Scott Key 大桥倒塌事故后果的多方面数据收集
  • 批准号:
    2427232
  • 财政年份:
    2024
  • 资助金额:
    $ 46.3万
  • 项目类别:
    Standard Grant
RAPID: Collaborative Research: Multifaceted Data Collection on the Aftermath of the March 26, 2024 Francis Scott Key Bridge Collapse in the DC-Maryland-Virginia Area
RAPID:协作研究:2024 年 3 月 26 日 DC-马里兰-弗吉尼亚地区 Francis Scott Key 大桥倒塌事故后果的多方面数据收集
  • 批准号:
    2427231
  • 财政年份:
    2024
  • 资助金额:
    $ 46.3万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了