Reducing Training Data in Deep Learning
减少深度学习中的训练数据
基本信息
- 批准号:RGPIN-2019-06222
- 负责人:
- 金额:$ 4.01万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2021
- 资助国家:加拿大
- 起止时间:2021-01-01 至 2022-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Deep neural networks have been highly successful in supervised learning for a variety of AI related applications. They include automatic speech recognition, image classification and face recognition in computer vision, and natural language translation. However, their success relies on huge volumes of labeled training data, which is time-consuming and expensive to obtain. While data is abundant in today's digital world of the Web, mobile devices, and the Internet of Things, unsupervised learning (which does not need labels) has yet to live up to its promises. In this research program we plan to study machine learning requiring human capabilities. These learning problems often have many unlabeled data, very few labeled data, and abstract concepts and knowledge are learned and accumulated across many tasks. Human learning is also active and interactive. Progress on these problems would lead to new theory and algorithms that not only significantly reduce the amount of labeled data needed in supervised learning, but also advance our understanding of machine learning in solving difficult real-world problems. We propose a novel deep learning framework in which autoencoders and classifiers are coupled and optimized simultaneously to make maximal usage of both unlabeled and labeled data. The autoencoder networks are trained from a large set of unlabeled data, but only need to recall enough details for the purpose of classifying a small number of labeled examples. The proposed research nicely unifies and integrates supervised and unsupervised learning, feature learning, learning representations, lifelong learning, and few-shot learning. The research proposal consists of two long-term objectives, and five short-term objectives, each with clear and feasible methodologies. These will provide ample opportunities for training PhD and MSc students. In total, the proposal will train 4 PhD students and 6 MSc students, as well as one Postdoc, in the next 5 years of the proposed research. As deep learning in AI is an extremely popular area that attracts both academia and industry, I expect that the HQP trained in this research will be in high demand, and will be making an impact in their future research career in academia and industry. We expect to make significant contributions not only to the academic research of machine learning and deep learning, but also to various real-world applications. We expect that less than 10% of the training data (or the labeling cost) would be needed to train the deep neural networks without affecting much the predictive accuracy or the computational cost. The savings would be very significant in any real-world application of deep learning.
深度神经网络在各种人工智能相关应用的监督学习方面已经取得了很大的成功。它们包括自动语音识别、计算机视觉中的图像分类和人脸识别以及自然语言翻译。然而,它们的成功依赖于大量的标记训练数据,而这些数据的获取既耗时又昂贵。尽管在当今网络、移动设备和物联网的数字世界中,数据是丰富的,但非监督学习(不需要标签)尚未兑现其承诺。在这个研究项目中,我们计划研究需要人类能力的机器学习。这些学习问题往往有很多未标记的数据,极少的有标记的数据,抽象的概念和知识是跨许多任务学习和积累的。人类学习也是主动的和交互的。在这些问题上的进展将导致新的理论和算法,不仅可以显著减少监督学习所需的标记数据量,而且可以促进我们对机器学习在解决现实世界中的困难问题的理解。我们提出了一种新的深度学习框架,其中自动编码器和分类器被同时耦合和优化,以最大限度地利用未标记数据和已标记数据。自动编码器网络是从大量未标记的数据中训练出来的,但只需要回忆足够的细节就可以对少量已标记的样本进行分类。所提出的研究很好地统一和集成了监督和非监督学习、特征学习、学习表征、终身学习和少机会学习。研究提案包括两个长期目标和五个短期目标,每个目标都有明确和可行的方法。这些将为培养博士生和硕士研究生提供充足的机会。在未来5年的研究中,该计划总共将培养4名博士生和6名硕士学生,以及1名博士后。由于AI中的深度学习是一个极受学术界和产业界欢迎的领域,我预计本研究培养的HQP将需求旺盛,并将对他们未来在学术界和产业界的研究生涯产生影响。我们期待不仅对机器学习和深度学习的学术研究做出重大贡献,而且还将对各种现实世界的应用做出重大贡献。我们预计,在不影响预测精度或计算成本的情况下,训练深度神经网络所需的训练数据(或标记成本)将不到10%。在深度学习的任何现实应用中,节省的成本都会非常显著。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Ling, Charles其他文献
Ling, Charles的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Ling, Charles', 18)}}的其他基金
Reducing Training Data in Deep Learning
减少深度学习中的训练数据
- 批准号:
RGPIN-2019-06222 - 财政年份:2022
- 资助金额:
$ 4.01万 - 项目类别:
Discovery Grants Program - Individual
Reducing Training Data in Deep Learning
减少深度学习中的训练数据
- 批准号:
RGPAS-2019-00084 - 财政年份:2020
- 资助金额:
$ 4.01万 - 项目类别:
Discovery Grants Program - Accelerator Supplements
Reducing Training Data in Deep Learning
减少深度学习中的训练数据
- 批准号:
RGPIN-2019-06222 - 财政年份:2020
- 资助金额:
$ 4.01万 - 项目类别:
Discovery Grants Program - Individual
Reducing Training Data in Deep Learning
减少深度学习中的训练数据
- 批准号:
RGPIN-2019-06222 - 财政年份:2019
- 资助金额:
$ 4.01万 - 项目类别:
Discovery Grants Program - Individual
Reducing Training Data in Deep Learning
减少深度学习中的训练数据
- 批准号:
RGPAS-2019-00084 - 财政年份:2019
- 资助金额:
$ 4.01万 - 项目类别:
Discovery Grants Program - Accelerator Supplements
Improving Information Retrieval with Machine Learning
通过机器学习改进信息检索
- 批准号:
46392-2012 - 财政年份:2018
- 资助金额:
$ 4.01万 - 项目类别:
Discovery Grants Program - Individual
Improving Information Retrieval with Machine Learning
通过机器学习改进信息检索
- 批准号:
46392-2012 - 财政年份:2017
- 资助金额:
$ 4.01万 - 项目类别:
Discovery Grants Program - Individual
Improving Highstreet's Assets Model with Advanced Machine Learning
利用先进的机器学习改进 Highstreet 的资产模型
- 批准号:
501559-2016 - 财政年份:2016
- 资助金额:
$ 4.01万 - 项目类别:
Engage Plus Grants Program
Improving Information Retrieval with Machine Learning
通过机器学习改进信息检索
- 批准号:
46392-2012 - 财政年份:2015
- 资助金额:
$ 4.01万 - 项目类别:
Discovery Grants Program - Individual
Improving customer service with predictive models for IBM London Software Lab
利用 IBM 伦敦软件实验室的预测模型改善客户服务
- 批准号:
491890-2015 - 财政年份:2015
- 资助金额:
$ 4.01万 - 项目类别:
Engage Grants Program
相似海外基金
CAREER: Mitigating the Lack of Labeled Training Data in Machine Learning Based on Multi-level Optimization
职业:基于多级优化缓解机器学习中标记训练数据的缺乏
- 批准号:
2339216 - 财政年份:2024
- 资助金额:
$ 4.01万 - 项目类别:
Continuing Grant
ZooCELL: Tracing the evolution of sensory cell types in animal diversity: multidisciplinary training in 3D cellular reconstruction, multimodal data ..
ZooCELL:追踪动物多样性中感觉细胞类型的进化:3D 细胞重建、多模态数据方面的多学科培训..
- 批准号:
EP/Y037049/1 - 财政年份:2024
- 资助金额:
$ 4.01万 - 项目类别:
Research Grant
Tracing the evolution of sensory cell types in animal diversity: multidisciplinary training in 3D cellular reconstruction, multimodal data analysis
追踪动物多样性中感觉细胞类型的进化:3D 细胞重建、多模式数据分析的多学科培训
- 批准号:
EP/Y037081/1 - 财政年份:2024
- 资助金额:
$ 4.01万 - 项目类别:
Research Grant
Measurement and analysis of radiotherapy small field dosimetry data to support the development of a simulation training product for clinical Radiotherapy Physicists.
放射治疗小场剂量测定数据的测量和分析,以支持临床放射治疗物理学家模拟培训产品的开发。
- 批准号:
10089179 - 财政年份:2024
- 资助金额:
$ 4.01万 - 项目类别:
Collaborative R&D
Generative Visual Pre-training on Unlabelled Big Data
未标记大数据的生成视觉预训练
- 批准号:
DP240101848 - 财政年份:2024
- 资助金额:
$ 4.01万 - 项目类别:
Discovery Projects
Collaborative Research: Implementation: Medium: Secure, Resilient Cyber-Physical Energy System Workforce Pathways via Data-Centric, Hardware-in-the-Loop Training
协作研究:实施:中:通过以数据为中心的硬件在环培训实现安全、有弹性的网络物理能源系统劳动力路径
- 批准号:
2320972 - 财政年份:2023
- 资助金额:
$ 4.01万 - 项目类别:
Standard Grant
Collaborative Research: Implementation: Medium: Secure, Resilient Cyber-Physical Energy System Workforce Pathways via Data-Centric, Hardware-in-the-Loop Training
协作研究:实施:中:通过以数据为中心的硬件在环培训实现安全、有弹性的网络物理能源系统劳动力路径
- 批准号:
2320975 - 财政年份:2023
- 资助金额:
$ 4.01万 - 项目类别:
Standard Grant
MCA Pilot PUI: Data Intensive Research Training (DIRT) in forecasting soil respiration at core terrestrial NEON sites
MCA 试点 PUI:预测陆地 NEON 核心站点土壤呼吸的数据密集型研究培训 (DIRT)
- 批准号:
2321958 - 财政年份:2023
- 资助金额:
$ 4.01万 - 项目类别:
Standard Grant
NRT-HDR: Integrative Training in Data Science-Enabled Sensing of the Environment for Climate Adaptation (DataSENSE)
NRT-HDR:数据科学支持的气候适应环境感知综合培训 (DataSENSE)
- 批准号:
2244403 - 财政年份:2023
- 资助金额:
$ 4.01万 - 项目类别:
Continuing Grant
Network Connector: DEDICATE: Data Science Equity-Driven Inquiry to Create Accessible Project-based Training for Social Impact Education
网络连接器:DEDICATE:数据科学公平驱动的探究,为社会影响力教育创建可访问的基于项目的培训
- 批准号:
2304100 - 财政年份:2023
- 资助金额:
$ 4.01万 - 项目类别:
Continuing Grant