BIGDATA: Collaborative Research: F: Efficient and Exact Methods for Big Data Reduction

BIGDATA:协作研究:F:大数据缩减的高效且精确的方法

基本信息

  • 批准号:
    1908198
  • 负责人:
  • 金额:
    $ 39.68万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2018
  • 资助国家:
    美国
  • 起止时间:
    2018-10-16 至 2022-08-31
  • 项目状态:
    已结题

项目摘要

AbstractResearch in big data involves analyzing growing data sets with huge numbers of samples, very high-dimensional feature vectors, and complex and diverse structures. The ever-growing volume and complexity of these data sets make many traditional techniques inadequate to extract knowledge from them. An emerging area, known as sparse learning, has achieved great success in learning from big data by identifying a small set of explanatory features and/or samples. Typical examples include selecting features that are most indicative of users? preferences for recommendation systems, identifying brain regions that are predictive of neurological disorders based on imaging data, and extracting semantic information from raw images for object recognition. However, training sparse learning models can be computationally prohibitive due to the sparsity-inducing regularization, which is non-smooth and can be highly complex when incorporating complex structures. This project aims at developing algorithms and tools to significantly accelerate the training process of sparse learning models for big data applications. The key idea is to efficiently identify redundant features and/or samples, which can be removed from the training phase without losing useful information of interests. Success in these unique techniques is expected to dramatically scaling up sparse learning for big data by orders of magnitude in terms of both time and space. The PIs plan to integrate the big data reduction tools developed in this project into their education and outreach activities, including development of new courses and integration of project components into existing courses. The PIs will make special efforts to recruit female and underrepresented students to this project.The major technical innovations of this project include the following components: (1) the PIs will develop efficient feature reduction methods for the generic scenario where the structures of both input and output can be represented by directed acyclic graphs; the proposed formulations include many existing approaches as special cases; (2) the PIs will develop efficient methods to reduce the numbers of features and samples simultaneously under a unified formulation, which can also incorporate various structures; (3) the PIs will develop efficient methods to discard irrelevant data subspaces to accelerate the process of uncovering low-rank structures commonly seen in big data. All the proposed data reduction methods are exact, i.e., the models learned on the reduced data sets are identical to the ones learned on the full data sets. This project heavily relies on optimization theory, especially on sensitivity analysis and convex geometry. The outcome of this project includes a unified approach to accelerate sparse learning and provide a systematic framework for developing efficient and exact data reduction methods. The systematic study and in-depth exploration of redundant data identification is expected to deepen the understanding of sparse learning techniques and dramatically enhance their applications in big data analytics.
摘要大数据研究涉及分析不断增长的数据集,这些数据集具有大量样本、非常高维的特征向量以及复杂多样的结构。这些数据集不断增长的数量和复杂性使得许多传统技术不足以从中提取知识。一个新兴的领域,称为稀疏学习,通过识别一小部分解释性特征和/或样本,在从大数据中学习方面取得了巨大成功。典型的例子包括选择最能代表用户的功能?推荐系统的偏好,基于成像数据识别预测神经系统疾病的大脑区域,以及从原始图像中提取语义信息用于对象识别。然而,由于稀疏诱导正则化,训练稀疏学习模型可能在计算上是禁止的,这是非平滑的,并且在合并复杂结构时可能非常复杂。该项目旨在开发算法和工具,以显着加快大数据应用稀疏学习模型的训练过程。关键思想是有效地识别冗余特征和/或样本,这些特征和/或样本可以从训练阶段中移除,而不会丢失有用的感兴趣信息。这些独特技术的成功预计将在时间和空间方面以数量级的数量级大幅扩展大数据的稀疏学习。参与者计划将该项目开发的海量数据缩减工具纳入其教育和外联活动,包括开发新课程和将项目组成部分纳入现有课程。本计划的主要技术创新包括:(1)研究员将为输入和输出结构均可用有向无环图表示的一般情况开发有效的特征约简方法;建议的公式包括许多现有的方法作为特殊情况;(2)PI将开发有效的方法,在统一的公式下同时减少特征和样本的数量,这也可以包含各种结构;(3)PI将开发有效的方法来丢弃不相关的数据子空间,以加速发现大数据中常见的低秩结构的过程。所有提出的数据简化方法都是精确的,即,在简化数据集上学习的模型与在完整数据集上学习的模型相同。该项目在很大程度上依赖于优化理论,特别是灵敏度分析和凸几何。该项目的成果包括一个统一的方法来加速稀疏学习,并为开发高效和精确的数据简化方法提供一个系统的框架。对冗余数据识别的系统研究和深入探索,有望加深对稀疏学习技术的理解,并极大地提高其在大数据分析中的应用。

项目成果

期刊论文数量(11)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
GraphFM: Improving Large-Scale GNN Training via Feature Momentum
  • DOI:
    10.48550/arxiv.2206.07161
  • 发表时间:
    2022-06
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Haiyang Yu;Limei Wang;Bokun Wang;Meng Liu;Tianbao Yang;Shuiwang Ji
  • 通讯作者:
    Haiyang Yu;Limei Wang;Bokun Wang;Meng Liu;Tianbao Yang;Shuiwang Ji
Spherical Message Passing for 3D Molecular Graphs
  • DOI:
  • 发表时间:
    2021-02
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yi Liu;Limei Wang;Meng Liu;Xuan Zhang;Bora Oztekin;Shuiwang Ji
  • 通讯作者:
    Yi Liu;Limei Wang;Meng Liu;Xuan Zhang;Bora Oztekin;Shuiwang Ji
Spatial Variational Auto-Encoding via Matrix-Variate Normal Distributions
  • DOI:
    10.1137/1.9781611975673.73
  • 发表时间:
    2017-05
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Zhengyang Wang;Hao Yuan;Shuiwang Ji
  • 通讯作者:
    Zhengyang Wang;Hao Yuan;Shuiwang Ji
Smoothed dilated convolutions for improved dense prediction
  • DOI:
    10.1145/3219819.3219944
  • 发表时间:
    2018-07
  • 期刊:
  • 影响因子:
    4.8
  • 作者:
    Zhengyang Wang;Shuiwang Ji
  • 通讯作者:
    Zhengyang Wang;Shuiwang Ji
Non-Local Graph Neural Networks
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Shuiwang Ji其他文献

A Mathematical View of Attention Models in Deep Learning
深度学习中注意力模型的数学观点
  • DOI:
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Shuiwang Ji;Yaochen Xie
  • 通讯作者:
    Yaochen Xie
Discriminant Analysis for Dimensionality Reduction: An Overview of Recent Developments
降维判别分析:近期发展概述
  • DOI:
  • 发表时间:
    2010
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jieping Ye;Shuiwang Ji
  • 通讯作者:
    Shuiwang Ji
An Interpretable Neural Model with Interactive Stepwise Influence
具有交互式逐步影响的可解释神经模型
  • DOI:
    10.1007/978-3-030-16142-2_41
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    2.3
  • 作者:
    Yin Zhang;Ninghao Liu;Shuiwang Ji;James Caverlee;Xia Hu
  • 通讯作者:
    Xia Hu
Semi-Supervised Learning for High-Fidelity Fluid Flow Reconstruction
高保真流体流动重建的半监督学习
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Cong Fu;Jacob Helwig;Shuiwang Ji
  • 通讯作者:
    Shuiwang Ji
Eliminating Position Bias of Language Models: A Mechanistic Approach
消除语言模型的位置偏差:一种机械方法
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Ziqi Wang;Hanlin Zhang;Xiner Li;Kuan;Chi Han;Shuiwang Ji;S. Kakade;Hao Peng;Heng Ji
  • 通讯作者:
    Heng Ji

Shuiwang Ji的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Shuiwang Ji', 18)}}的其他基金

III: Small: 3D Graph Neural Networks: Completeness, Efficiency, and Applications
III:小:3D 图神经网络:完整性、效率和应用
  • 批准号:
    2243850
  • 财政年份:
    2023
  • 资助金额:
    $ 39.68万
  • 项目类别:
    Standard Grant
Collaborative Research: ABI Innovation: Towards Computational Exploration of Large-Scale Neuro-Morphological Datasets
合作研究:ABI 创新:大规模神经形态数据集的计算探索
  • 批准号:
    2028361
  • 财政年份:
    2020
  • 资助金额:
    $ 39.68万
  • 项目类别:
    Standard Grant
III: Small: Collaborative Research: Demystifying Deep Learning on Graphs: From Basic Operations to Applications
III:小:协作研究:揭秘图深度学习:从基本操作到应用
  • 批准号:
    2006861
  • 财政年份:
    2020
  • 资助金额:
    $ 39.68万
  • 项目类别:
    Standard Grant
III: Medium: Collaborative Research: Towards Scalable and Interpretable Graph Neural Networks
III:媒介:协作研究:迈向可扩展和可解释的图神经网络
  • 批准号:
    1955189
  • 财政年份:
    2020
  • 资助金额:
    $ 39.68万
  • 项目类别:
    Standard Grant
III: Small: Collaborative Research: Structured Methods for Multi-Task Learning
III:小:协作研究:多任务学习的结构化方法
  • 批准号:
    1908166
  • 财政年份:
    2018
  • 资助金额:
    $ 39.68万
  • 项目类别:
    Standard Grant
III: Small: Deep Learning for Gene Expression Pattern Image Analysis
III:小:深度学习用于基因表达模式图像分析
  • 批准号:
    1908220
  • 财政年份:
    2018
  • 资助金额:
    $ 39.68万
  • 项目类别:
    Standard Grant
CAREER: Towards the Next Generation of Data-Driven
职业:迈向下一代数据驱动
  • 批准号:
    1922969
  • 财政年份:
    2018
  • 资助金额:
    $ 39.68万
  • 项目类别:
    Continuing Grant
III: Small: Deep Learning for Gene Expression Pattern Image Analysis
III:小:深度学习用于基因表达模式图像分析
  • 批准号:
    1811675
  • 财政年份:
    2018
  • 资助金额:
    $ 39.68万
  • 项目类别:
    Standard Grant
Collaborative Research: ABI Innovation: Towards Computational Exploration of Large-Scale Neuro-Morphological Datasets
合作研究:ABI 创新:大规模神经形态数据集的计算探索
  • 批准号:
    1661289
  • 财政年份:
    2017
  • 资助金额:
    $ 39.68万
  • 项目类别:
    Standard Grant
BIGDATA: Collaborative Research: F: Efficient and Exact Methods for Big Data Reduction
BIGDATA:协作研究:F:大数据缩减的高效且精确的方法
  • 批准号:
    1633359
  • 财政年份:
    2016
  • 资助金额:
    $ 39.68万
  • 项目类别:
    Standard Grant

相似海外基金

BIGDATA: IA: Collaborative Research: Asynchronous Distributed Machine Learning Framework for Multi-Site Collaborative Brain Big Data Mining
BIGDATA:IA:协作研究:用于多站点协作大脑大数据挖掘的异步分布式机器学习框架
  • 批准号:
    2348159
  • 财政年份:
    2023
  • 资助金额:
    $ 39.68万
  • 项目类别:
    Standard Grant
BIGDATA: IA: Collaborative Research: Intelligent Solutions for Navigating Big Data from the Arctic and Antarctic
BIGDATA:IA:协作研究:导航北极和南极大数据的智能解决方案
  • 批准号:
    2308649
  • 财政年份:
    2022
  • 资助金额:
    $ 39.68万
  • 项目类别:
    Standard Grant
BIGDATA: Collaborative Research: F: Holistic Optimization of Data-Driven Applications
BIGDATA:协作研究:F:数据驱动应用程序的整体优化
  • 批准号:
    2027516
  • 财政年份:
    2020
  • 资助金额:
    $ 39.68万
  • 项目类别:
    Standard Grant
BIGDATA: F: Collaborative Research: Practical Analysis of Large-Scale Data with Lyme Disease Case Study
BIGDATA:F:协作研究:莱姆病案例研究大规模数据的实际分析
  • 批准号:
    1934319
  • 财政年份:
    2019
  • 资助金额:
    $ 39.68万
  • 项目类别:
    Standard Grant
BIGDATA: IA: Collaborative Research: Protecting Yourself from Wildfire Smoke: Big Data-Driven Adaptive Air Quality Prediction Methodologies
大数据:IA:协作研究:保护自己免受野火烟雾的侵害:大数据驱动的自适应空气质量预测方法
  • 批准号:
    1838022
  • 财政年份:
    2019
  • 资助金额:
    $ 39.68万
  • 项目类别:
    Standard Grant
BIGDATA: F: Collaborative Research: Foundations of Responsible Data Management
大数据:F:协作研究:负责任的数据管理的基础
  • 批准号:
    1926250
  • 财政年份:
    2019
  • 资助金额:
    $ 39.68万
  • 项目类别:
    Standard Grant
BIGDATA: IA: Collaborative Research: Intelligent Solutions for Navigating Big Data from the Arctic and Antarctic
BIGDATA:IA:协作研究:导航北极和南极大数据的智能解决方案
  • 批准号:
    1947584
  • 财政年份:
    2019
  • 资助金额:
    $ 39.68万
  • 项目类别:
    Standard Grant
BIGDATA: IA: Collaborative Research: Asynchronous Distributed Machine Learning Framework for Multi-Site Collaborative Brain Big Data Mining
BIGDATA:IA:协作研究:用于多站点协作大脑大数据挖掘的异步分布式机器学习框架
  • 批准号:
    1837964
  • 财政年份:
    2019
  • 资助金额:
    $ 39.68万
  • 项目类别:
    Standard Grant
BIGDATA: F: Collaborative Research: Optimizing Log-Structured-Merge-Based Big Data Management Systems
BIGDATA:F:协作研究:优化基于日志结构合并的大数据管理系统
  • 批准号:
    1838222
  • 财政年份:
    2019
  • 资助金额:
    $ 39.68万
  • 项目类别:
    Standard Grant
BIGDATA: F: Collaborative Research: Optimizing Log-Structured-Merge-Based Big Data Management Systems
BIGDATA:F:协作研究:优化基于日志结构合并的大数据管理系统
  • 批准号:
    1838248
  • 财政年份:
    2019
  • 资助金额:
    $ 39.68万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了