权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Reducing Training Data in Deep Learning

减少深度学习中的训练数据

基本信息

批准号：
RGPIN-2019-06222
负责人：
Ling, Charles
金额：
$ 4.01万
依托单位：
University of Western Ontario
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2019
资助国家：
加拿大
起止时间：
2019-01-01 至 2020-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=690612
关键词：
Reducing Training Data Deep Learning

项目摘要

Deep neural networks have been highly successful in supervised learning for a variety of AI related applications. They include automatic speech recognition, image classification and face recognition in computer vision, and natural language translation. However, their success relies on huge volumes of labeled training data, which is time-consuming and expensive to obtain. While data is abundant in today's digital world of the Web, mobile devices, and the Internet of Things, unsupervised learning (which does not need labels) has yet to live up to its promises. ******In this research program we plan to study machine learning requiring human capabilities. These learning problems often have many unlabeled data, very few labeled data, and abstract concepts and knowledge are learned and accumulated across many tasks. Human learning is also active and interactive. Progress on these problems would lead to new theory and algorithms that not only significantly reduce the amount of labeled data needed in supervised learning, but also advance our understanding of machine learning in solving difficult real-world problems.******We propose a novel deep learning framework in which autoencoders and classifiers are coupled and optimized simultaneously to make maximal usage of both unlabeled and labeled data. The autoencoder networks are trained from a large set of unlabeled data, but only need to recall enough details for the purpose of classifying a small number of labeled examples. The proposed research nicely unifies and integrates supervised and unsupervised learning, feature learning, learning representations, lifelong learning, and few-shot learning. ******The research proposal consists of two long-term objectives, and five short-term objectives, each with clear and feasible methodologies. These will provide ample opportunities for training PhD and MSc students. In total, the proposal will train 4 PhD students and 6 MSc students, as well as one Postdoc, in the next 5 years of the proposed research. ******As deep learning in AI is an extremely popular area that attracts both academia and industry, I expect that the HQP trained in this research will be in high demand, and will be making an impact in their future research career in academia and industry. ******We expect to make significant contributions not only to the academic research of machine learning and deep learning, but also to various real-world applications. We expect that less than 10% of the training data (or the labeling cost) would be needed to train the deep neural networks without affecting much the predictive accuracy or the computational cost. The savings would be very significant in any real-world application of deep learning.**

深度神经网络在各种人工智能相关应用的监督学习方面非常成功。它们包括自动语音识别，计算机视觉中的图像分类和人脸识别以及自然语言翻译。然而，他们的成功依赖于大量的标记训练数据，这是耗时和昂贵的获得。虽然在当今的网络、移动的设备和物联网的数字世界中数据丰富，但无监督学习（不需要标签）尚未实现其承诺。 * 在这个研究项目中，我们计划研究需要人类能力的机器学习。这些学习问题通常有许多未标记的数据，很少有标记的数据，抽象概念和知识是在许多任务中学习和积累的。人类的学习也是积极的和互动的。这些问题的进展将导致新的理论和算法，不仅可以显着减少监督学习所需的标记数据量，而且还可以促进我们对机器学习在解决困难的现实问题方面的理解。我们提出了一种新的深度学习框架，其中自动编码器和分类器同时耦合和优化，以最大限度地利用未标记和标记数据。自动编码器网络是从大量未标记的数据中训练出来的，但只需要回忆足够的细节即可对少量已标记的示例进行分类。这项研究很好地统一和集成了监督和无监督学习、特征学习、学习表示、终身学习和少次学习。** 研究计划包括两个长期目标和五个短期目标，每个目标都有明确可行的方法。这将为培养博士和硕士学生提供充足的机会。总的来说，该提案将在未来5年的拟议研究中培养4名博士生和6名硕士生，以及一名博士后。****** 由于人工智能中的深度学习是一个非常受欢迎的领域，吸引了学术界和工业界，我预计在这项研究中训练的HQP将受到很高的需求，并将对他们未来在学术界和工业界的研究生涯产生影响。** 我们希望不仅对机器学习和深度学习的学术研究做出重大贡献，而且对各种现实世界的应用做出重大贡献。我们预计，训练深度神经网络所需的训练数据（或标记成本）不到10%，而不会对预测精度或计算成本产生太大影响。在深度学习的任何实际应用中，节省的成本都是非常显著的。