Collaborative Research: CIBR: Leaping the Specimen Digitization Gap: Connecting Novel Tools, Machine Learning and Public Participation to Label Digitization Efforts
合作研究:CIBR:跨越标本数字化差距:将新工具、机器学习和公众参与与标签数字化工作联系起来
基本信息
- 批准号:2027234
- 负责人:
- 金额:$ 29.24万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-01-15 至 2024-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
National efforts to digitize natural history collections have transformed previously siloed, unstandardized resources into a networked, openly available information nexus usable to meet grand scientific and societal challenges. Despite these enormous strides, major bottlenecks in this digitization process still exist, especially in areas where automation approaches have been most challenging. In particular, capturing analog specimen data into digital format and converting text descriptions of collecting locations into mappable geocoordinates, have remained boutique efforts. Because of these bottlenecks, as many as 91% of digitized specimens are missing key elements that hamper ability to use these specimen records more effectively. This project will develop key workflows to dramatically increase the speed at which specimen data can be captured and made available broadly to data providers and consumers. These workflows include novel approaches that use both computer and human intelligence to advance our ability to capture specimen information. One key workflow focuses on the challenge of automated conversion of imaged specimen labels into properly formatted and usable digital text. Critical to the success of this workflow are human validation checkpoints that will be implemented using a popular citizen science platform, Notes from Nature. A second workflow focuses on new tools that take advantage of previous efforts to assign mappable coordinates based on specimen collection location to automatically add such mapping information for specimens missing those data. Finally, this effort will create tools for easy access to these new data in and out of common use databases, making the data immediately available for museum providers and researchers alike. This effort will connect public participation in science to these novel tools and technologies. Further, it will train diverse graduate students and undergraduate students in bioinformatics and museum science.This effort has three design goals that together will dramatically reduce the digitization gap in museum specimen data. The first design goal will combine machine learning methods with public participation in scientific research (PPSR) via the successful Notes from Nature (NfN) project to speed up label digitization and facilitate obtaining locality data. A key part of the first design goal utilizes supervised machine learning approaches and object character recognition (OCR) when possible but also includes “humans in the loop” using the NfN platform to gather fast quality feedback from human volunteers at key points. This approach also provides a means to create high-quality training datasets needed for improving automation steps, ultimately further reducing human effort. The second design goal will integrate locality data interpretation through GEOLocate with a Biodiversity Enhanced Locality Service (BELS), which will make it possible to look up pre-existing localities that have been georeferenced using best practices. A third goal is to connect these workflows and services to Symbiota, a community digitization hub, to allow easy inflow and outflow of content back to digitization networks. Providers will be able to easily access new data along with associated metadata about processing steps, all returned using established standards and best practices. The key to this effort will be engagement with the community, including researchers, collections staff, and Zooniverse volunteers. Engagement will focus on virtual training and working with an advisory committee in order to grow capacity and community involvement.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
国家对自然历史收藏的努力已经将以前孤立的、不标准化的资源转变为网络化的、公开可用的信息纽带,可用于应对重大的科学和社会挑战。尽管取得了这些巨大的进步,但数字化过程中的主要瓶颈仍然存在,特别是在自动化方法最具挑战性的领域。特别是,将模拟标本数据转换为数字格式,并将采集地点的文字说明转换为可绘制地图的地理坐标,仍然是精品工作。由于这些瓶颈,多达91%的数字化标本缺少关键元素,阻碍了更有效地使用这些标本记录的能力。该项目将开发关键的工作流程,以大幅提高样本数据的采集速度,并将其广泛提供给数据提供者和消费者。 这些工作流程包括使用计算机和人类智能的新方法,以提高我们捕获标本信息的能力。 一个关键的工作流程集中在自动转换成适当的格式和可用的数字文本的图像样本标签的挑战。 这个工作流程成功的关键是人类验证检查点,这些检查点将使用流行的公民科学平台Notes from Nature来实现。 第二个工作流程侧重于新的工具,这些工具利用以前的努力,根据标本采集位置分配可映射的坐标,以自动添加缺少这些数据的标本的映射信息。 最后,这项工作将创建工具,方便访问这些新的数据和通用数据库,使数据立即提供给博物馆供应商和研究人员。这项工作将把公众参与科学与这些新工具和技术联系起来。此外,它还将培养生物信息学和博物馆科学方面的各种研究生和本科生。这项工作有三个设计目标,它们将大大缩小博物馆标本数据的数字化差距。第一个设计目标将通过成功的自然笔记(NfN)项目将联合收割机机器学习方法与公众参与科学研究(PPSR)相结合,以加快标签数字化并促进获取位置数据。第一个设计目标的一个关键部分是在可能的情况下利用监督机器学习方法和对象字符识别(OCR),但也包括使用NfN平台的“人类参与”,以在关键点收集来自人类志愿者的快速质量反馈。这种方法还提供了一种方法来创建改进自动化步骤所需的高质量训练数据集,最终进一步减少人工工作。第二个设计目标将通过GEOLocate将地点数据解释与生物多样性增强地点服务(BELS)结合起来,这将使人们能够查找使用最佳做法进行地理参考的现有地点。第三个目标是将这些工作流程和服务连接到社区数字化中心Symbiota,以允许内容轻松流入和流出数字化网络。供应商将能够轻松地访问新数据沿着有关处理步骤的相关元数据,所有这些数据都是使用已建立的标准和最佳实践返回的。这项工作的关键将是与社区的接触,包括研究人员、收藏工作人员和Zooniverse志愿者。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Robert Guralnick其他文献
Modular characters, hall subgroups, and normal complements
- DOI:
10.1007/s13398-024-01690-0 - 发表时间:
2024-12-27 - 期刊:
- 影响因子:1.600
- 作者:
Robert Guralnick;Gabriel Navarro - 通讯作者:
Gabriel Navarro
Reimagining species on the move across space and time
- DOI:
10.1016/j.tree.2025.03.015 - 发表时间:
2025-07-01 - 期刊:
- 影响因子:17.300
- 作者:
Alexa L. Fredston;Morgan W. Tingley;Montague H.C. Neate-Clegg;Luke J. Evans;Laura H. Antão;Natalie C. Ban;I-Ching Chen;Yi-Wen Chen;Lise Comte;David P. Edwards;Birgitta Evengard;Belen Fadrique;Sophie H. Falkeis;Robert Guralnick;David H. Klinges;Jonas J. Lembrechts;Jonathan Lenoir;Juliano Palacios-Abrantes;Aníbal Pauchard;Gretta Pecl;Brett R. Scheffers - 通讯作者:
Brett R. Scheffers
Primitive monodromy groups of genus at most two
- DOI:
10.1016/j.jalgebra.2014.06.020 - 发表时间:
2014-11-01 - 期刊:
- 影响因子:
- 作者:
Daniel Frohardt;Robert Guralnick;Kay Magaard - 通讯作者:
Kay Magaard
On rational and concise words
- DOI:
10.1016/j.jalgebra.2015.02.003 - 发表时间:
2015-05-01 - 期刊:
- 影响因子:
- 作者:
Robert Guralnick;Pavel Shumyatsky - 通讯作者:
Pavel Shumyatsky
The automorphism groups of a family of maximal curves
- DOI:
10.1016/j.jalgebra.2012.03.036 - 发表时间:
2012-07-01 - 期刊:
- 影响因子:
- 作者:
Robert Guralnick;Beth Malmskog;Rachel Pries - 通讯作者:
Rachel Pries
Robert Guralnick的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Robert Guralnick', 18)}}的其他基金
IntBIO Collaborative Research: Assessing drivers of the nitrogen-fixing symbiosis at continental scales
IntBIO 合作研究:评估大陆尺度固氮共生的驱动因素
- 批准号:
2316267 - 财政年份:2023
- 资助金额:
$ 29.24万 - 项目类别:
Standard Grant
Collaborative Research: Ranges: Building Capacity to Extend Mammal Specimens from Western North America
合作研究:范围:建设能力以扩展北美西部的哺乳动物标本
- 批准号:
2228392 - 财政年份:2023
- 资助金额:
$ 29.24万 - 项目类别:
Continuing Grant
Collaborative Research: Phenobase: Community, infrastructure, and data for global-scale analyses of plant phenology
合作研究:Phenobase:用于全球范围植物物候分析的社区、基础设施和数据
- 批准号:
2223512 - 财政年份:2022
- 资助金额:
$ 29.24万 - 项目类别:
Continuing Grant
Collaborative Research: LightningBug, An Integrated Pipeline to Overcome The Biodiversity Digitization Gap
合作研究:LightningBug,克服生物多样性数字化差距的综合管道
- 批准号:
2104152 - 财政年份:2021
- 资助金额:
$ 29.24万 - 项目类别:
Continuing Grant
Collaborative Research: Origins and drivers of extinction of Caribbean Avifauna
合作研究:加勒比鸟类灭绝的起源和驱动因素
- 批准号:
2033905 - 财政年份:2021
- 资助金额:
$ 29.24万 - 项目类别:
Continuing Grant
Collaborative Research: Genealogy of Odonata (GEODE): Dispersal and color as drivers of 300 million years of global dragonfly evolution
合作研究:蜻蜓目 (GEODE) 谱系:传播和颜色是 3 亿年全球蜻蜓进化的驱动力
- 批准号:
2002457 - 财政年份:2020
- 资助金额:
$ 29.24万 - 项目类别:
Continuing Grant
IIBR RoL: Collaborative Research: A Rules Of Life Engine (RoLE) Model to Uncover Fundamental Processes Governing Biodiversity
IIBR RoL:协作研究:揭示生物多样性基本过程的生命规则引擎 (RoLE) 模型
- 批准号:
1927286 - 财政年份:2019
- 资助金额:
$ 29.24万 - 项目类别:
Standard Grant
Cohomology and Representations of Finite and Algebraic Groups with Applications
有限代数群的上同调和表示及其应用
- 批准号:
1901595 - 财政年份:2019
- 资助金额:
$ 29.24万 - 项目类别:
Continuing Grant
Collaborative Research: ABI Innovation: FuTRES, an Ontology-Based Functional Trait Resource for Paleo- and Neo-biologists
合作研究:ABI 创新:FuTRES,为古生物学家和新生物学家提供的基于本体的功能性状资源
- 批准号:
1759898 - 财政年份:2018
- 资助金额:
$ 29.24万 - 项目类别:
Standard Grant
Cohomology, Representations, and Coverings of Curves
曲线的上同调、表示和覆盖
- 批准号:
1600056 - 财政年份:2016
- 资助金额:
$ 29.24万 - 项目类别:
Continuing Grant
相似国自然基金
Research on Quantum Field Theory without a Lagrangian Description
- 批准号:24ZR1403900
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
Cell Research
- 批准号:31224802
- 批准年份:2012
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research
- 批准号:31024804
- 批准年份:2010
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research (细胞研究)
- 批准号:30824808
- 批准年份:2008
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
- 批准号:10774081
- 批准年份:2007
- 资助金额:45.0 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: CIBR: Leaping the Specimen Digitization Gap: Connecting Novel Tools, Machine Learning and Public Participation to Label Digitization Efforts
合作研究:CIBR:跨越标本数字化差距:将新工具、机器学习和公众参与与标签数字化工作联系起来
- 批准号:
2027241 - 财政年份:2021
- 资助金额:
$ 29.24万 - 项目类别:
Standard Grant
Collaborative Research: CIBR: Incorporating Crystallography and Cryo-EM Tools in Foldit
合作研究:CIBR:在 Foldit 中结合晶体学和冷冻电镜工具
- 批准号:
2051305 - 财政年份:2021
- 资助金额:
$ 29.24万 - 项目类别:
Standard Grant
Collaborative Research: CIBR: The OpenBehavior Project
合作研究:CIBR:开放行为项目
- 批准号:
1948181 - 财政年份:2021
- 资助金额:
$ 29.24万 - 项目类别:
Continuing Grant
Collaborative Research: CIBR: Incorporating Crystallography and Cryo-EM tools into Foldit
合作研究:CIBR:将晶体学和冷冻电镜工具纳入 Foldit
- 批准号:
2051282 - 财政年份:2021
- 资助金额:
$ 29.24万 - 项目类别:
Standard Grant
Collaborative Research: CIBR: Leaping the Specimen Digitization Gap: Connecting Novel Tools, Machine Learning and Public Participation to Label Digitization Efforts
合作研究:CIBR:跨越标本数字化差距:将新工具、机器学习和公众参与与标签数字化工作联系起来
- 批准号:
2027228 - 财政年份:2021
- 资助金额:
$ 29.24万 - 项目类别:
Standard Grant
Collaborative Research: CIBR: Building Capacity for Data-driven Neuroscience Research
合作研究:CIBR:数据驱动神经科学研究能力建设
- 批准号:
1935771 - 财政年份:2020
- 资助金额:
$ 29.24万 - 项目类别:
Standard Grant
Collaborative Research: CIBR: VectorByte: A Global Informatics Platform for studying the Ecology of Vector-Borne Diseases
合作研究:CIBR:VectorByte:研究媒介传播疾病生态学的全球信息学平台
- 批准号:
2016282 - 财政年份:2020
- 资助金额:
$ 29.24万 - 项目类别:
Continuing Grant
Collaborative Research: CIBR: Computational resources for modeling and analysis of realistic cell membranes
合作研究:CIBR:用于真实细胞膜建模和分析的计算资源
- 批准号:
2011234 - 财政年份:2020
- 资助金额:
$ 29.24万 - 项目类别:
Standard Grant
Collaborative Research: CIBR: VectorByte: A Global Informatics Platform for studying the Ecology of Vector-Borne Diseases
合作研究:CIBR:VectorByte:研究媒介传播疾病生态学的全球信息学平台
- 批准号:
2016265 - 财政年份:2020
- 资助金额:
$ 29.24万 - 项目类别:
Continuing Grant
Collaborative research: CIBR: Computational resources for modeling and analysis of realistic cell membranes
合作研究:CIBR:用于真实细胞膜建模和分析的计算资源
- 批准号:
2010851 - 财政年份:2020
- 资助金额:
$ 29.24万 - 项目类别:
Standard Grant