Collaborative Research: CIBR: Leaping the Specimen Digitization Gap: Connecting Novel Tools, Machine Learning and Public Participation to Label Digitization Efforts
合作研究:CIBR:跨越标本数字化差距:将新工具、机器学习和公众参与与标签数字化工作联系起来
基本信息
- 批准号:2027228
- 负责人:
- 金额:$ 9.97万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-01-15 至 2023-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
National efforts to digitize natural history collections have transformed previously siloed, unstandardized resources into a networked, openly available information nexus usable to meet grand scientific and societal challenges. Despite these enormous strides, major bottlenecks in this digitization process still exist, especially in areas where automation approaches have been most challenging. In particular, capturing analog specimen data into digital format and converting text descriptions of collecting locations into mappable geocoordinates, have remained boutique efforts. Because of these bottlenecks, as many as 91% of digitized specimens are missing key elements that hamper ability to use these specimen records more effectively. This project will develop key workflows to dramatically increase the speed at which specimen data can be captured and made available broadly to data providers and consumers. These workflows include novel approaches that use both computer and human intelligence to advance our ability to capture specimen information. One key workflow focuses on the challenge of automated conversion of imaged specimen labels into properly formatted and usable digital text. Critical to the success of this workflow are human validation checkpoints that will be implemented using a popular citizen science platform, Notes from Nature. A second workflow focuses on new tools that take advantage of previous efforts to assign mappable coordinates based on specimen collection location to automatically add such mapping information for specimens missing those data. Finally, this effort will create tools for easy access to these new data in and out of common use databases, making the data immediately available for museum providers and researchers alike. This effort will connect public participation in science to these novel tools and technologies. Further, it will train diverse graduate students and undergraduate students in bioinformatics and museum science.This effort has three design goals that together will dramatically reduce the digitization gap in museum specimen data. The first design goal will combine machine learning methods with public participation in scientific research (PPSR) via the successful Notes from Nature (NfN) project to speed up label digitization and facilitate obtaining locality data. A key part of the first design goal utilizes supervised machine learning approaches and object character recognition (OCR) when possible but also includes “humans in the loop” using the NfN platform to gather fast quality feedback from human volunteers at key points. This approach also provides a means to create high-quality training datasets needed for improving automation steps, ultimately further reducing human effort. The second design goal will integrate locality data interpretation through GEOLocate with a Biodiversity Enhanced Locality Service (BELS), which will make it possible to look up pre-existing localities that have been georeferenced using best practices. A third goal is to connect these workflows and services to Symbiota, a community digitization hub, to allow easy inflow and outflow of content back to digitization networks. Providers will be able to easily access new data along with associated metadata about processing steps, all returned using established standards and best practices. The key to this effort will be engagement with the community, including researchers, collections staff, and Zooniverse volunteers. Engagement will focus on virtual training and working with an advisory committee in order to grow capacity and community involvement.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
国家对自然历史收藏的努力已经将以前孤立的、不标准化的资源转变为网络化的、公开可用的信息纽带,可用于应对重大的科学和社会挑战。尽管取得了这些巨大的进步,但数字化过程中的主要瓶颈仍然存在,特别是在自动化方法最具挑战性的领域。特别是,将模拟标本数据转换为数字格式,并将采集地点的文字说明转换为可绘制地图的地理坐标,仍然是精品工作。由于这些瓶颈,多达91%的数字化标本缺少关键元素,阻碍了更有效地使用这些标本记录的能力。该项目将开发关键的工作流程,以大幅提高样本数据的采集速度,并将其广泛提供给数据提供者和消费者。 这些工作流程包括使用计算机和人类智能的新方法,以提高我们捕获标本信息的能力。 一个关键的工作流程集中在自动转换成适当的格式和可用的数字文本的图像样本标签的挑战。 这个工作流程成功的关键是人类验证检查点,这些检查点将使用流行的公民科学平台Notes from Nature来实现。 第二个工作流程侧重于新的工具,这些工具利用以前的努力,根据标本采集位置分配可映射的坐标,以自动添加缺少这些数据的标本的映射信息。 最后,这项工作将创建工具,方便访问这些新的数据和通用数据库,使数据立即提供给博物馆供应商和研究人员。这项工作将把公众参与科学与这些新工具和技术联系起来。此外,它还将培养生物信息学和博物馆科学方面的各种研究生和本科生。这项工作有三个设计目标,它们将大大缩小博物馆标本数据的数字化差距。第一个设计目标将通过成功的自然笔记(NfN)项目将联合收割机机器学习方法与公众参与科学研究(PPSR)相结合,以加快标签数字化并促进获取位置数据。第一个设计目标的一个关键部分是在可能的情况下利用监督机器学习方法和对象字符识别(OCR),但也包括使用NfN平台的“人类参与”,以在关键点收集来自人类志愿者的快速质量反馈。这种方法还提供了一种方法来创建改进自动化步骤所需的高质量训练数据集,最终进一步减少人工工作。第二个设计目标将通过GEOLocate将地点数据解释与生物多样性增强地点服务(BELS)结合起来,这将使人们能够查找使用最佳做法进行地理参考的现有地点。第三个目标是将这些工作流程和服务连接到社区数字化中心Symbiota,以允许内容轻松流入和流出数字化网络。供应商将能够轻松地访问新数据沿着有关处理步骤的相关元数据,所有这些数据都是使用已建立的标准和最佳实践返回的。这项工作的关键将是与社区的接触,包括研究人员、收藏工作人员和Zooniverse志愿者。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Nelson Rios其他文献
Propithecus verreauxi demography spanning 40 years at Bezà Mahafaly Special Reserve, southwest Madagascar
马达加斯加西南部贝扎马哈法利特别保护区 40 年来狐猴种群统计学研究
- DOI:
10.1038/s41597-024-04230-y - 发表时间:
2025-01-24 - 期刊:
- 影响因子:6.900
- 作者:
Joelisoa Ratsirarson;Jeannin Ranaivonasy;Richard Lawler;Isabella Fiorentino;Nelson Rios;Alison Richard - 通讯作者:
Alison Richard
Native Andean potatoes (Solanum tuberosum L.): Phytonutrients in Peel, Pulp and Potato Cooking Water
原生安第斯马铃薯(Solanum tuberosum L.):皮、果肉和马铃薯烹饪水中的植物营养素
- DOI:
10.3923/ajsr.2020.44.49 - 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
Carmen Rojas;Victor Vasquez;V. Ninaquispe;Julio Cesar Rojas;Nelson Rios;Pedro Lujan;Jesus Obregon - 通讯作者:
Jesus Obregon
Nelson Rios的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Nelson Rios', 18)}}的其他基金
Collaborative Research: LightningBug, An Integrated Pipeline to Overcome The Biodiversity Digitization Gap
合作研究:LightningBug,克服生物多样性数字化差距的综合管道
- 批准号:
2104149 - 财政年份:2021
- 资助金额:
$ 9.97万 - 项目类别:
Continuing Grant
ABI Sustaining: Geolocate for the Biodiversity Research Community
ABI 维持:生物多样性研究界的地理定位
- 批准号:
1759959 - 财政年份:2018
- 资助金额:
$ 9.97万 - 项目类别:
Standard Grant
相似国自然基金
复杂电子产品超精密加工及检测关键技术研究与应用
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
基于合成生物学的动物底盘品种优化及中试应用研究
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
运用组学整合技术探索萆薢分清散联合化疗治疗晚期胰腺癌的临床研究
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
九里香等提取物多靶向制剂抗肺癌的作用及机制研究
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
升血小板方治疗原发免疫性血小板减少症的临床研究
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
八髎穴微波热疗在女性膀胱过度活动症治疗中的价值研究
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
基于 miR-455-5p 介导的氧化应激机制探讨糖尿病视网膜病变中医分型治疗的临床研究
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
基于 UPLC-Q-TOF-MS/MS 分析的 异功散活性成分评价及提取工艺研究
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
无创电针对于痉挛型双瘫脑 瘫患儿的有效性与安全性研究:一项随机 单盲前瞻性队列研究
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
弹压式手法与体外冲击波治疗肱骨外上髁炎的对比研究
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
相似海外基金
Collaborative Research: CIBR: Leaping the Specimen Digitization Gap: Connecting Novel Tools, Machine Learning and Public Participation to Label Digitization Efforts
合作研究:CIBR:跨越标本数字化差距:将新工具、机器学习和公众参与与标签数字化工作联系起来
- 批准号:
2027241 - 财政年份:2021
- 资助金额:
$ 9.97万 - 项目类别:
Standard Grant
Collaborative Research: CIBR: Leaping the Specimen Digitization Gap: Connecting Novel Tools, Machine Learning and Public Participation to Label Digitization Efforts
合作研究:CIBR:跨越标本数字化差距:将新工具、机器学习和公众参与与标签数字化工作联系起来
- 批准号:
2027234 - 财政年份:2021
- 资助金额:
$ 9.97万 - 项目类别:
Standard Grant
Collaborative Research: CIBR: Incorporating Crystallography and Cryo-EM Tools in Foldit
合作研究:CIBR:在 Foldit 中结合晶体学和冷冻电镜工具
- 批准号:
2051305 - 财政年份:2021
- 资助金额:
$ 9.97万 - 项目类别:
Standard Grant
Collaborative Research: CIBR: The OpenBehavior Project
合作研究:CIBR:开放行为项目
- 批准号:
1948181 - 财政年份:2021
- 资助金额:
$ 9.97万 - 项目类别:
Continuing Grant
Collaborative Research: CIBR: Incorporating Crystallography and Cryo-EM tools into Foldit
合作研究:CIBR:将晶体学和冷冻电镜工具纳入 Foldit
- 批准号:
2051282 - 财政年份:2021
- 资助金额:
$ 9.97万 - 项目类别:
Standard Grant
Collaborative Research: CIBR: Building Capacity for Data-driven Neuroscience Research
合作研究:CIBR:数据驱动神经科学研究能力建设
- 批准号:
1935771 - 财政年份:2020
- 资助金额:
$ 9.97万 - 项目类别:
Standard Grant
Collaborative Research: CIBR: VectorByte: A Global Informatics Platform for studying the Ecology of Vector-Borne Diseases
合作研究:CIBR:VectorByte:研究媒介传播疾病生态学的全球信息学平台
- 批准号:
2016282 - 财政年份:2020
- 资助金额:
$ 9.97万 - 项目类别:
Continuing Grant
Collaborative Research: CIBR: Computational resources for modeling and analysis of realistic cell membranes
合作研究:CIBR:用于真实细胞膜建模和分析的计算资源
- 批准号:
2011234 - 财政年份:2020
- 资助金额:
$ 9.97万 - 项目类别:
Standard Grant
Collaborative Research: CIBR: VectorByte: A Global Informatics Platform for studying the Ecology of Vector-Borne Diseases
合作研究:CIBR:VectorByte:研究媒介传播疾病生态学的全球信息学平台
- 批准号:
2016265 - 财政年份:2020
- 资助金额:
$ 9.97万 - 项目类别:
Continuing Grant
Collaborative research: CIBR: Computational resources for modeling and analysis of realistic cell membranes
合作研究:CIBR:用于真实细胞膜建模和分析的计算资源
- 批准号:
2010851 - 财政年份:2020
- 资助金额:
$ 9.97万 - 项目类别:
Standard Grant