权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Efficient and Reproducible Image Annotation for Supervised Deep Learning with Small Data

用于小数据监督深度学习的高效且可重复的图像注释

基本信息

批准号：
RGPIN-2021-02428
负责人：
Eramian, Mark
金额：
$ 1.75万
依托单位：
University of Saskatchewan
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=749810
关键词：
Efficient Reproducible Image Annotation Supervised

项目摘要

Machine learning is currently one of the hottest and most quickly evolving technologies. Everyone wants to use "deep learning" to develop systems for everything: face recognition, identity tracking, quality control in manufacturing, tracking objects in videos, diagnosing disease, precision agriculture, etc. Deep learning works best when there are very large numbers of images where the "correct answer" is known from which the machine learning algorithm can learn. It is relatively easy for people to tag an image of a car, a boat, or a plane, or even different types of cars with an appropriate label. It is much more challenging to create large datasets to train systems for diagnosing cancer, or counting the number of flowers on a plant. Labeling medical images requires highly experienced experts. When marking all of the flowers on a plant, it is easy to miss one, or for a flower to be partially occluded by a leaf or another flower, or to be imprecise in specifying a flower's exact location. Moreover, in order to obtain enough annotated images to learn from, typically multiple annotators label a dataset but each image is annotated by only one annotator. In such circumstances we have no way of determining whether different annotators are annotating consistently, that is, we can't quantify the inter-annotator agreement. Annotator disagreement arises from biases in their annotations, and degrades the quality of the training dataset. Inter-rater agreement is affected by the difficulty of the annotation task and the nature of the instructions given to annotators. If we can find ways of obtaining more consistent annotations across multiple annotators, training dataset quality will improve, and hence the performance of the learned system also improves. The proposed research program will study how to obtain better quality annotations from annotators with higher inter- and intra-annotator agreement. We will create augmented annotation tools that provide problem-specific semi-automation to assist annotators and quantify the resulting benefits to annotator agreement and trained system performance. We will quantify the relationship between annotator agreement and model performance. We will explore the degree to which contextual factors such as annotation type, instructions given, pressure, and distractions can influence annotator agreement and develop best practices for mitigating their effects. By studying factors that influence annotator agreement and the performance of the systems that are learned from annotated datasets, we will be able to develop new standardized methodologies for training "deep learning" models with limited data. This will allow better prediction of the optimal amount of resources to invest in annotation, reduce the reliance on trial-and-error methods to obtain the best trained system performance, and make successful machine learning less reliant on deep technical expertise.

机器学习是目前最热门、发展最快的技术之一。每个人都想用“深度学习”来开发各种系统：人脸识别、身份跟踪、制造中的质量控制、视频中的跟踪对象、疾病诊断、精准农业等。当有大量的图像，机器学习算法可以从中学习“正确答案”时，深度学习效果最好。对于人们来说，给一辆汽车、一艘船、一架飞机，甚至是不同类型的汽车的图像加上适当的标签是相对容易的。创建大型数据集来训练用于诊断癌症或计算植物上花朵数量的系统要具有挑战性得多。标记医学图像需要经验丰富的专家。在标记植物上的所有花时，很容易遗漏一个，或者花的一部分被叶子或另一朵花遮挡，或者在指定花的确切位置时不精确。此外，为了获得足够的注释图像来学习，通常会有多个注释者标记一个数据集，但每个图像只由一个注释者注释。在这种情况下，我们无法确定不同注释者的注释是否一致，也就是说，我们无法量化注释者之间的一致性。注释者的分歧来自于他们注释中的偏见，并降低了训练数据集的质量。评注者之间的协议受评注任务的难度和给评注者的指令的性质的影响。如果我们能找到在多个注释器之间获得更一致的注释的方法，训练数据集的质量将会提高，因此学习系统的性能也会提高。提出的研究计划将研究如何从注释者获得更高质量的注释，并具有更高的注释者之间和注释者内部的一致性。我们将创建增强的注释工具，提供特定于问题的半自动化，以帮助注释者，并量化对注释者协议和经过训练的系统性能的最终好处。我们将量化注释者协议和模型性能之间的关系。我们将探讨上下文因素（如注释类型、给出的指示、压力和干扰）对注释者协议的影响程度，并开发减轻其影响的最佳实践。通过研究影响标注者一致性的因素和从标注数据集学习的系统性能，我们将能够开发新的标准化方法，用于用有限的数据训练“深度学习”模型。这将允许更好地预测用于注释的最佳资源量，减少对试错方法的依赖，以获得最佳训练系统性能，并使成功的机器学习减少对深厚技术专业知识的依赖。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Eramian, Mark其他文献

Enhancement of Textural Differences Based on Morphological Component Analysis

DOI：
10.1109/tip.2015.2427514
发表时间：
2015-09-01
期刊：
IEEE TRANSACTIONS ON IMAGE PROCESSING
影响因子：
10.6
作者：
Chi, Jianning;Eramian, Mark
通讯作者：
Eramian, Mark

Iterative image segmentation of plant roots for high-throughput phenotyping.

DOI：
10.1038/s41598-022-19754-9
发表时间：
2022-10-04
期刊：
SCIENTIFIC REPORTS
影响因子：
4.6
作者：
Seidenthal, Kyle;Panjvani, Karim;Chandnani, Rahul;Kochian, Leon;Eramian, Mark
通讯作者：
Eramian, Mark