Incorporating Image-based Features into Biomedical Document Classification
将基于图像的特征纳入生物医学文档分类
基本信息
- 批准号:9457095
- 负责人:
- 金额:$ 48.82万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-09-14 至 2021-08-31
- 项目状态:已结题
- 来源:
- 关键词:AddressAreaBiological PhenomenaCategoriesCerealsClassificationCollaborationsComputer-Assisted Image AnalysisCuesDataData SetDatabasesDevelopmentDiseaseFailureFluorescence MicroscopyFoundationsGelGene ExpressionGene MutationGene ProteinsGenomicsGeometryGoalsGray unit of radiation doseHarvestImageImage AnalysisIndividualInformaticsInformation ResourcesInstitutesInvestigationLettersLiteratureMedicalMethodsMiningModalityModelingMusMutationOutcomes ResearchPaperPhenotypePhysiciansPositioning AttributeProcessProteinsProteomicsPubMedPublicationsPublishingResearchResource InformaticsRetrievalRoleScanningSchemeScientistSecureShapesSolidSourceSpeedSystemTextTextureTrainingWorkbasebioimagingbiomedical scientistdecision researchevaluation/testingexperienceexperimental studyimage processingimprovedindexingmouse genomenew therapeutic targetnovelnovel therapeuticsprotein protein interactionprotein structuretext searchingtool
项目摘要
The proposed research aims to develop and advance tools for using image-data appearing in scientific publications, in addition to text, in order to support beneficial, targeted access to the biomedical literature. The number of biomedical publications grows at a rate of over one million new publications per year. Identifying relevant information requires scientists and physicians to scan daily through a myriad of papers. For scientific database curators (bio-curators, in organizations such as Jackson Labs or UniProt), the task is particularly onerous, as they must identify articles most significant to the database, locate within them high-quality evidence concerning disease, genes/proteins and mutations, and curate the findings in database entries along with references to relevant evidence in the articles. Notably, much of the evidence within publications lies in figures. Thus, images are rich and essential indicators for relevance.
While biomedical text mining tools are being developed to expedite search for information within publications, several competitive shared tasks underscored the need for more effective tools to overcome the bottleneck for bio-curation and for scientific discovery. Moreover, bio-curators point-out the importance of images as a key information source. While image analysis is an active research field, most current work on biomedical image processing focuses on image identification, understanding and indexing; Not on images as aids to document analysis. Similarly, most work on biomedical literature mining focuses on text alone. Thus, little has been done so far to utilize, in addition to text, images within publications that provide important cues about the relevance of the information embedded in articles.
Our premise, supported by bio-curators experience, is that information derived from images can (and should) be directly incorporated into biomedical document retrieval and classification, and will improve accurate identification of relevant articles (for a given user’s needs) while pin-pointing significant evidence within them. We will comprehensively identify, develop and compare informative image-features, develop methods and tools for representing both images and documents based on such features, and introduce means to effectively integrate image-based data into the text-based document classification process. The work will comprise the following fundamental tasks: A) Building robust tools for harvesting images from PDF articles and segmenting compound figures into individual image-panels; B) Identification and investigation of highly-informative features for biomedical image-representation, and categorization of biomedical images into significant types and classes; C) Effective representation of documents using text and image, and integration of text-based and image-based classifiers. We anchor our research in genuine needs, secure access to much image data, and strive for broad-applicability of the results, by working within several broad and diverse curation-areas within institutes with which we collaborate: Evidence for gene-expression & phenotypes in Mouse (Jackson Labs) and in worm (WormBase), and experimental evidence for protein-protein interaction (Protein Information Resource). The work on this project will result in new methods and tools that take advantage of both image- and text-data, facilitating more effective and focused retrieval and mining, thus better supporting bio-curation and data-intensive biomedical discovery.
拟议的研究旨在开发和改进使用科学出版物中出现的图像数据以及文本的工具,以支持有益的、有针对性的生物医学文献获取。生物医学出版物的数量以每年超过一百万份的速度增长。识别相关信息需要科学家和医生每天浏览大量论文。对于科学数据库管理员(Jackson Labs 或 UniProt 等组织中的生物管理员)来说,这项任务尤其繁重,因为他们必须识别对数据库最重要的文章,在其中找到有关疾病、基因/蛋白质和突变的高质量证据,并在数据库条目中整理发现结果以及文章中相关证据的引用。值得注意的是,出版物中的大部分证据都来自于数字。因此,图像是丰富且重要的相关性指标。
虽然正在开发生物医学文本挖掘工具来加快出版物中信息的搜索,但一些竞争性共享任务强调需要更有效的工具来克服生物管理和科学发现的瓶颈。此外,生物策展人指出了图像作为关键信息源的重要性。虽然图像分析是一个活跃的研究领域,但目前生物医学图像处理的大多数工作都集中在图像识别、理解和索引上。不适用于作为文档分析辅助工具的图像。同样,大多数生物医学文献挖掘工作仅关注文本。因此,到目前为止,除了文本之外,还没有利用出版物中的图像来提供有关文章中嵌入信息的相关性的重要线索。
我们的前提是,在生物策展人经验的支持下,从图像中获取的信息可以(并且应该)直接纳入生物医学文档检索和分类中,并将提高相关文章的准确识别(针对给定用户的需求),同时精确定位其中的重要证据。我们将全面识别、开发和比较信息丰富的图像特征,开发基于这些特征表示图像和文档的方法和工具,并引入有效地将基于图像的数据集成到基于文本的文档分类过程中的方法。这项工作将包括以下基本任务:A) 构建强大的工具,用于从 PDF 文章中获取图像并将复合图形分割成单独的图像面板; B)识别和研究生物医学图像表示的高信息特征,并将生物医学图像分类为重要的类型和类别; C)使用文本和图像有效地表示文档,以及基于文本和基于图像的分类器的集成。我们将研究立足于真正的需求,安全地获取大量图像数据,并通过在与我们合作的机构内的多个广泛且多样化的管理领域开展工作,努力实现结果的广泛适用性:小鼠(Jackson Labs)和蠕虫(WormBase)中基因表达和表型的证据,以及蛋白质-蛋白质相互作用的实验证据(蛋白质信息资源)。该项目的工作将产生利用图像和文本数据的新方法和工具,促进更有效和更有针对性的检索和挖掘,从而更好地支持生物管理和数据密集型生物医学发现。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Georgeta-Elisabeta Marai其他文献
Georgeta-Elisabeta Marai的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Georgeta-Elisabeta Marai', 18)}}的其他基金
Incorporating Image-based Features into Biomedical Document Classification
将基于图像的特征纳入生物医学文档分类
- 批准号:
9762175 - 财政年份:2017
- 资助金额:
$ 48.82万 - 项目类别:
相似国自然基金
层出镰刀菌氮代谢调控因子AreA 介导伏马菌素 FB1 生物合成的作用机理
- 批准号:2021JJ40433
- 批准年份:2021
- 资助金额:0.0 万元
- 项目类别:省市级项目
寄主诱导梢腐病菌AreA和CYP51基因沉默增强甘蔗抗病性机制解析
- 批准号:32001603
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
AREA国际经济模型的移植.改进和应用
- 批准号:18870435
- 批准年份:1988
- 资助金额:2.0 万元
- 项目类别:面上项目
相似海外基金
Onboarding Rural Area Mathematics and Physical Science Scholars
农村地区数学和物理科学学者的入职
- 批准号:
2322614 - 财政年份:2024
- 资助金额:
$ 48.82万 - 项目类别:
Standard Grant
Point-scanning confocal with area detector
点扫描共焦与区域检测器
- 批准号:
534092360 - 财政年份:2024
- 资助金额:
$ 48.82万 - 项目类别:
Major Research Instrumentation
TRACK-UK: Synthesized Census and Small Area Statistics for Transport and Energy
TRACK-UK:交通和能源综合人口普查和小区域统计
- 批准号:
ES/Z50290X/1 - 财政年份:2024
- 资助金额:
$ 48.82万 - 项目类别:
Research Grant
Wide-area low-cost sustainable ocean temperature and velocity structure extraction using distributed fibre optic sensing within legacy seafloor cables
使用传统海底电缆中的分布式光纤传感进行广域低成本可持续海洋温度和速度结构提取
- 批准号:
NE/Y003365/1 - 财政年份:2024
- 资助金额:
$ 48.82万 - 项目类别:
Research Grant
Collaborative Research: Scalable Manufacturing of Large-Area Thin Films of Metal-Organic Frameworks for Separations Applications
合作研究:用于分离应用的大面积金属有机框架薄膜的可扩展制造
- 批准号:
2326714 - 财政年份:2024
- 资助金额:
$ 48.82万 - 项目类别:
Standard Grant
Collaborative Research: Scalable Manufacturing of Large-Area Thin Films of Metal-Organic Frameworks for Separations Applications
合作研究:用于分离应用的大面积金属有机框架薄膜的可扩展制造
- 批准号:
2326713 - 财政年份:2024
- 资助金额:
$ 48.82万 - 项目类别:
Standard Grant
Unlicensed Low-Power Wide Area Networks for Location-based Services
用于基于位置的服务的免许可低功耗广域网
- 批准号:
24K20765 - 财政年份:2024
- 资助金额:
$ 48.82万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
RAPID: Collaborative Research: Multifaceted Data Collection on the Aftermath of the March 26, 2024 Francis Scott Key Bridge Collapse in the DC-Maryland-Virginia Area
RAPID:协作研究:2024 年 3 月 26 日 DC-马里兰-弗吉尼亚地区 Francis Scott Key 大桥倒塌事故后果的多方面数据收集
- 批准号:
2427233 - 财政年份:2024
- 资助金额:
$ 48.82万 - 项目类别:
Standard Grant
Postdoctoral Fellowship: OPP-PRF: Tracking Long-Term Changes in Lake Area across the Arctic
博士后奖学金:OPP-PRF:追踪北极地区湖泊面积的长期变化
- 批准号:
2317873 - 财政年份:2024
- 资助金额:
$ 48.82万 - 项目类别:
Standard Grant
RAPID: Collaborative Research: Multifaceted Data Collection on the Aftermath of the March 26, 2024 Francis Scott Key Bridge Collapse in the DC-Maryland-Virginia Area
RAPID:协作研究:2024 年 3 月 26 日 DC-马里兰-弗吉尼亚地区 Francis Scott Key 大桥倒塌事故后果的多方面数据收集
- 批准号:
2427232 - 财政年份:2024
- 资助金额:
$ 48.82万 - 项目类别:
Standard Grant














{{item.name}}会员




