Efficient and Reproducible Image Annotation for Supervised Deep Learning with Small Data
用于小数据监督深度学习的高效且可重复的图像注释
基本信息
- 批准号:RGPIN-2021-02428
- 负责人:
- 金额:$ 1.75万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2021
- 资助国家:加拿大
- 起止时间:2021-01-01 至 2022-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Machine learning is currently one of the hottest and most quickly evolving technologies. Everyone wants to use "deep learning" to develop systems for everything: face recognition, identity tracking, quality control in manufacturing, tracking objects in videos, diagnosing disease, precision agriculture, etc. Deep learning works best when there are very large numbers of images where the "correct answer" is known from which the machine learning algorithm can learn. It is relatively easy for people to tag an image of a car, a boat, or a plane, or even different types of cars with an appropriate label. It is much more challenging to create large datasets to train systems for diagnosing cancer, or counting the number of flowers on a plant. Labeling medical images requires highly experienced experts. When marking all of the flowers on a plant, it is easy to miss one, or for a flower to be partially occluded by a leaf or another flower, or to be imprecise in specifying a flower's exact location. Moreover, in order to obtain enough annotated images to learn from, typically multiple annotators label a dataset but each image is annotated by only one annotator. In such circumstances we have no way of determining whether different annotators are annotating consistently, that is, we can't quantify the inter-annotator agreement. Annotator disagreement arises from biases in their annotations, and degrades the quality of the training dataset. Inter-rater agreement is affected by the difficulty of the annotation task and the nature of the instructions given to annotators. If we can find ways of obtaining more consistent annotations across multiple annotators, training dataset quality will improve, and hence the performance of the learned system also improves. The proposed research program will study how to obtain better quality annotations from annotators with higher inter- and intra-annotator agreement. We will create augmented annotation tools that provide problem-specific semi-automation to assist annotators and quantify the resulting benefits to annotator agreement and trained system performance. We will quantify the relationship between annotator agreement and model performance. We will explore the degree to which contextual factors such as annotation type, instructions given, pressure, and distractions can influence annotator agreement and develop best practices for mitigating their effects. By studying factors that influence annotator agreement and the performance of the systems that are learned from annotated datasets, we will be able to develop new standardized methodologies for training "deep learning" models with limited data. This will allow better prediction of the optimal amount of resources to invest in annotation, reduce the reliance on trial-and-error methods to obtain the best trained system performance, and make successful machine learning less reliant on deep technical expertise.
机器学习是目前最热门、发展最快的技术之一。每个人都希望使用“深度学习”来开发各种系统:人脸识别,身份跟踪,制造质量控制,跟踪视频中的对象,诊断疾病,精准农业等。对于人们来说,用适当的标签来标记汽车、船或飞机,甚至不同类型的汽车的图像相对容易。创建大型数据集来训练诊断癌症或计算植物上花朵数量的系统更具挑战性。医学图像标记需要经验丰富的专家。当标记植物上的所有花时,很容易错过一朵,或者花被叶子或另一朵花部分遮挡,或者在指定花的确切位置时不精确。此外,为了获得足够多的注释图像来学习,通常多个注释器标记数据集,但每个图像仅由一个注释器注释。在这种情况下,我们无法确定不同的注释者是否一致地进行注释,也就是说,我们无法量化注释者之间的一致性。注释者的分歧来自于他们注释中的偏见,并降低了训练数据集的质量。评分者之间的一致性受到注释任务的难度和注释者所受指示的性质的影响。如果我们能够找到在多个注释器之间获得更一致的注释的方法,那么训练数据集的质量将会提高,因此学习系统的性能也会提高。拟议的研究计划将研究如何获得更高质量的注释注释者之间和内部注释协议。我们将创建增强的注释工具,提供特定问题的半自动化来帮助注释者,并量化注释者协议和训练系统性能的收益。我们将量化注释者协议和模型性能之间的关系。我们将探讨上下文因素(如注释类型,给出的指示,压力和分心)可以影响注释者协议的程度,并制定减轻其影响的最佳实践。 通过研究影响注释者协议的因素以及从注释数据集学习的系统性能,我们将能够开发新的标准化方法来训练具有有限数据的“深度学习”模型。这将允许更好地预测用于注释的最佳资源量,减少对试错方法的依赖以获得最佳训练系统性能,并使成功的机器学习不那么依赖于深度技术专业知识。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Eramian, Mark其他文献
Enhancement of Textural Differences Based on Morphological Component Analysis
- DOI:
10.1109/tip.2015.2427514 - 发表时间:
2015-09-01 - 期刊:
- 影响因子:10.6
- 作者:
Chi, Jianning;Eramian, Mark - 通讯作者:
Eramian, Mark
Iterative image segmentation of plant roots for high-throughput phenotyping.
- DOI:
10.1038/s41598-022-19754-9 - 发表时间:
2022-10-04 - 期刊:
- 影响因子:4.6
- 作者:
Seidenthal, Kyle;Panjvani, Karim;Chandnani, Rahul;Kochian, Leon;Eramian, Mark - 通讯作者:
Eramian, Mark
Enhancing textural differences using wavelet-based texture characteristics morphological component analysis: A preprocessing method for improving image segmentation
- DOI:
10.1016/j.cviu.2017.01.006 - 发表时间:
2017-05-01 - 期刊:
- 影响因子:4.5
- 作者:
Chi, Jianning;Eramian, Mark - 通讯作者:
Eramian, Mark
Eramian, Mark的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Eramian, Mark', 18)}}的其他基金
Efficient and Reproducible Image Annotation for Supervised Deep Learning with Small Data
用于小数据监督深度学习的高效且可重复的图像注释
- 批准号:
RGPIN-2021-02428 - 财政年份:2022
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Computer assisted diagnosis using ultrasonography
使用超声检查的计算机辅助诊断
- 批准号:
262027-2007 - 财政年份:2012
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Computer assisted diagnosis using ultrasonography
使用超声检查的计算机辅助诊断
- 批准号:
262027-2007 - 财政年份:2010
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Computer assisted diagnosis using ultrasonography
使用超声检查的计算机辅助诊断
- 批准号:
262027-2007 - 财政年份:2009
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Computer assisted diagnosis using ultrasonography
使用超声检查的计算机辅助诊断
- 批准号:
262027-2007 - 财政年份:2008
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Computer assisted diagnosis using ultrasonography
使用超声检查的计算机辅助诊断
- 批准号:
262027-2007 - 财政年份:2007
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Weighted automata
加权自动机
- 批准号:
262027-2003 - 财政年份:2006
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Weighted automata
加权自动机
- 批准号:
262027-2003 - 财政年份:2005
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Weighted automata
加权自动机
- 批准号:
262027-2003 - 财政年份:2004
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Weighted automata
加权自动机
- 批准号:
262027-2003 - 财政年份:2003
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
相似海外基金
Collaborative Research: GEO OSE Track 2: Project Pythia and Pangeo: Building an inclusive geoscience community through accessible, reusable, and reproducible workflows
合作研究:GEO OSE 第 2 轨道:Pythia 和 Pangeo 项目:通过可访问、可重用和可重复的工作流程构建包容性的地球科学社区
- 批准号:
2324304 - 财政年份:2024
- 资助金额:
$ 1.75万 - 项目类别:
Standard Grant
STARS: Sharing Tools and Artifacts for Reproducible Simulation
STARS:共享可重复模拟的工具和工件
- 批准号:
MR/Z503915/1 - 财政年份:2024
- 资助金额:
$ 1.75万 - 项目类别:
Research Grant
Collaborative Research: GEO OSE Track 2: Project Pythia and Pangeo: Building an inclusive geoscience community through accessible, reusable, and reproducible workflows
合作研究:GEO OSE 第 2 轨道:Pythia 和 Pangeo 项目:通过可访问、可重用和可重复的工作流程构建包容性的地球科学社区
- 批准号:
2324302 - 财政年份:2024
- 资助金额:
$ 1.75万 - 项目类别:
Standard Grant
Collaborative Research: GEO OSE Track 1: Facilitating Reproducible Open GeoScience
合作研究:GEO OSE 第 1 轨道:促进可重复的开放地球科学
- 批准号:
2324732 - 财政年份:2024
- 资助金额:
$ 1.75万 - 项目类别:
Standard Grant
Collaborative Research: GEO OSE Track 2: Project Pythia and Pangeo: Building an inclusive geoscience community through accessible, reusable, and reproducible workflows
合作研究:GEO OSE 第 2 轨道:Pythia 和 Pangeo 项目:通过可访问、可重用和可重复的工作流程构建包容性的地球科学社区
- 批准号:
2324303 - 财政年份:2024
- 资助金额:
$ 1.75万 - 项目类别:
Standard Grant
Collaborative Research: GEO OSE Track 1: Facilitating Reproducible Open GeoScience
合作研究:GEO OSE 第 1 轨道:促进可重复的开放地球科学
- 批准号:
2324733 - 财政年份:2024
- 资助金额:
$ 1.75万 - 项目类别:
Standard Grant
Generating Reproducible Real-World Evidence with Multi-Source Data to Capture Unstructured Clinical Endpoints for Chronic Diseases
利用多源数据生成可重复的真实世界证据,以捕获慢性病的非结构化临床终点
- 批准号:
10797849 - 财政年份:2023
- 资助金额:
$ 1.75万 - 项目类别:
Unified, Scalable, and Reproducible Neurostatistical Software
统一、可扩展且可重复的神经统计软件
- 批准号:
10725500 - 财政年份:2023
- 资助金额:
$ 1.75万 - 项目类别:
Collaborative Research: Synthetic microbial consortia for organismal resilience and reproducible ecosystem services in changing environments
合作研究:在不断变化的环境中实现有机体恢复力和可再生生态系统服务的合成微生物群落
- 批准号:
2300058 - 财政年份:2023
- 资助金额:
$ 1.75万 - 项目类别:
Standard Grant
Collaborative Research: Synthetic microbial consortia for organismal resilience and reproducible ecosystem services in changing environments
合作研究:在不断变化的环境中实现有机体恢复力和可再生生态系统服务的合成微生物群落
- 批准号:
2300059 - 财政年份:2023
- 资助金额:
$ 1.75万 - 项目类别:
Standard Grant