权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Reducing Training Data in Deep Learning

减少深度学习中的训练数据

基本信息

批准号：
RGPIN-2019-06222
负责人：
Ling, Charles
金额：
$ 4.01万
依托单位：
University of Western Ontario
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2021
资助国家：
加拿大
起止时间：
2021-01-01 至 2022-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=738781
关键词：
Reducing Training Data Deep Learning

项目摘要

Deep neural networks have been highly successful in supervised learning for a variety of AI related applications. They include automatic speech recognition, image classification and face recognition in computer vision, and natural language translation. However, their success relies on huge volumes of labeled training data, which is time-consuming and expensive to obtain. While data is abundant in today's digital world of the Web, mobile devices, and the Internet of Things, unsupervised learning (which does not need labels) has yet to live up to its promises. In this research program we plan to study machine learning requiring human capabilities. These learning problems often have many unlabeled data, very few labeled data, and abstract concepts and knowledge are learned and accumulated across many tasks. Human learning is also active and interactive. Progress on these problems would lead to new theory and algorithms that not only significantly reduce the amount of labeled data needed in supervised learning, but also advance our understanding of machine learning in solving difficult real-world problems. We propose a novel deep learning framework in which autoencoders and classifiers are coupled and optimized simultaneously to make maximal usage of both unlabeled and labeled data. The autoencoder networks are trained from a large set of unlabeled data, but only need to recall enough details for the purpose of classifying a small number of labeled examples. The proposed research nicely unifies and integrates supervised and unsupervised learning, feature learning, learning representations, lifelong learning, and few-shot learning. The research proposal consists of two long-term objectives, and five short-term objectives, each with clear and feasible methodologies. These will provide ample opportunities for training PhD and MSc students. In total, the proposal will train 4 PhD students and 6 MSc students, as well as one Postdoc, in the next 5 years of the proposed research. As deep learning in AI is an extremely popular area that attracts both academia and industry, I expect that the HQP trained in this research will be in high demand, and will be making an impact in their future research career in academia and industry. We expect to make significant contributions not only to the academic research of machine learning and deep learning, but also to various real-world applications. We expect that less than 10% of the training data (or the labeling cost) would be needed to train the deep neural networks without affecting much the predictive accuracy or the computational cost. The savings would be very significant in any real-world application of deep learning.

深度神经网络在各种人工智能相关应用的监督学习方面已经取得了很大的成功。它们包括自动语音识别、计算机视觉中的图像分类和人脸识别以及自然语言翻译。然而，它们的成功依赖于大量的标记训练数据，而这些数据的获取既耗时又昂贵。尽管在当今网络、移动设备和物联网的数字世界中，数据是丰富的，但非监督学习(不需要标签)尚未兑现其承诺。在这个研究项目中，我们计划研究需要人类能力的机器学习。这些学习问题往往有很多未标记的数据，极少的有标记的数据，抽象的概念和知识是跨许多任务学习和积累的。人类学习也是主动的和交互的。在这些问题上的进展将导致新的理论和算法，不仅可以显著减少监督学习所需的标记数据量，而且可以促进我们对机器学习在解决现实世界中的困难问题的理解。我们提出了一种新的深度学习框架，其中自动编码器和分类器被同时耦合和优化，以最大限度地利用未标记数据和已标记数据。自动编码器网络是从大量未标记的数据中训练出来的，但只需要回忆足够的细节就可以对少量已标记的样本进行分类。所提出的研究很好地统一和集成了监督和非监督学习、特征学习、学习表征、终身学习和少机会学习。研究提案包括两个长期目标和五个短期目标，每个目标都有明确和可行的方法。这些将为培养博士生和硕士研究生提供充足的机会。在未来5年的研究中，该计划总共将培养4名博士生和6名硕士学生，以及1名博士后。由于AI中的深度学习是一个极受学术界和产业界欢迎的领域，我预计本研究培养的HQP将需求旺盛，并将对他们未来在学术界和产业界的研究生涯产生影响。我们期待不仅对机器学习和深度学习的学术研究做出重大贡献，而且还将对各种现实世界的应用做出重大贡献。我们预计，在不影响预测精度或计算成本的情况下，训练深度神经网络所需的训练数据(或标记成本)将不到10%。在深度学习的任何现实应用中，节省的成本都会非常显著。