SI2-SSE: Collaborative Research: High Performance Low Rank Approximation for Scalable Data Analytics

SI2-SSE:协作研究:可扩展数据分析的高性能低秩近似

基本信息

  • 批准号:
    1642410
  • 负责人:
  • 金额:
    $ 33.23万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2016
  • 资助国家:
    美国
  • 起止时间:
    2016-11-01 至 2020-10-31
  • 项目状态:
    已结题

项目摘要

Big Data analytics is at the core of discovery covering vast areas such as medical informatics, business analytics, national security, and materials sciences. This project aims to model some of the key data analytics problems and design, verify, and deploy scalable methods for knowledge extraction. The algorithms developed will be able to handle data sets of extreme sizes and will be deployable on advanced computer hardware. The goal is to realize orders-of-magnitude improvements over existing data analytics technologies, developing algorithms that are robust to incompleteness, noise, ambiguity, and high dimension in the data. Particular focus will be parallel and distributed algorithms that can efficiently solve large problems and produce accurate solutions. The proposed research and software development will allow domain experts to tackle Big Data sets requiring large parallel systems. The improved performance will enable fast and scalable data analysis across applications, from social network analysis to study citizens' attitudes toward sustainability-related issues to computational marketing techniques that refine customers' shopping experiences. The proposed work will help bridge the gap between computational science and data analytics ecosystems, two fields that stand to make great advancements from cross-fertilization. The education and outreach plan includes graduate course creation, engagement of under-represented groups via both undergraduate and graduate research experiences, and community-building efforts by workshop and mini-symposium organization.With the advent of internet-scale data, the data mining and machine learning community has adopted Nonnegative Matrix Factorization (NMF) for performing numerous tasks such as topic modeling, background separation from video data, hyper-spectral imaging, web-scale clustering, and community detection. The goals of this proposal are to develop efficient parallel algorithms for computing nonnegative matrix and tensor factorizations (NMF and NTF) and their variants using a unified framework, and to produce a software package called Parallel Low-rank Approximation with Nonnegative Constraints (PLANCK) that delivers the high performance, flexibility, and scalability necessary to tackle the ever-growing size of today's data sets. The algorithms will be generalized to NTF problems and extend the class of algorithms we can efficiently parallelize; our software framework will allow end-users to use and extend our techniques. Rather than developing separate software for each problem domain and mathematical technique, flexibility will be achieved by characterizing nearly all of the current NMF and NTF algorithms in the context of a block coordinate descent framework. Using this framework the shared computational kernels can be separated, which usually extend run times, from the algorithm-specific computations. Finally, the usability and practicality of the proposed software will be maintained by being application driven, establishing collaborations with early end-users, and by incrementally generalizing the framework in terms of both algorithms and problems.
大数据分析是发现的核心,涵盖了医学信息学、商业分析、国家安全和材料科学等广泛领域。该项目旨在对一些关键数据分析问题进行建模,并设计、验证和部署可扩展的知识提取方法。开发的算法将能够处理极端大小的数据集,并将部署在先进的计算机硬件上。目标是实现对现有数据分析技术的数量级改进,开发对数据中的不完整性、噪声、模糊性和高维具有鲁棒性的算法。特别的重点将是并行和分布式算法,可以有效地解决大问题,并产生准确的解决方案。拟议的研究和软件开发将使领域专家能够处理需要大型并行系统的大数据集。改进后的性能将实现跨应用程序的快速和可扩展的数据分析,从社交网络分析到研究公民对可持续发展相关问题的态度,再到优化客户购物体验的计算营销技术。拟议的工作将有助于弥合计算科学和数据分析生态系统之间的差距,这两个领域将从交叉受精中取得巨大进步。教育和推广计划包括研究生课程的创建,通过本科生和研究生的研究经历参与代表性不足的群体,以及通过讲习班和小型研讨会组织社区建设工作。随着互联网规模数据的出现,数据挖掘和机器学习社区采用非负矩阵分解(NMF)来执行许多任务,如主题建模、视频数据的背景分离、超光谱成像、网络规模聚类和社区检测。本提案的目标是使用统一的框架开发用于计算非负矩阵和张量分解(NMF和NTF)及其变体的高效并行算法,并生成一个称为具有非负约束的并行低秩近似(PLANCK)的软件包,该软件包提供高性能,灵活性和可扩展性,以应对当今不断增长的数据集规模。该算法将推广到NTF问题,并扩展了我们可以有效并行化的算法类别;我们的软件框架将允许最终用户使用和扩展我们的技术。而不是为每个问题域和数学技术开发单独的软件,灵活性将通过在块坐标下降框架的背景下描述几乎所有当前的NMF和NTF算法来实现。使用这个框架可以将共享计算内核(通常会延长运行时间)与特定算法的计算分离开来。最后,所提议的软件的可用性和实用性将通过应用程序驱动、与早期最终用户建立协作以及根据算法和问题增量地概括框架来维持。

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Haesun Park其他文献

A Dynamic Data Driven Application System for Vehicle Tracking
用于车辆跟踪的动态数据驱动应用系统
  • DOI:
    10.1016/j.procs.2014.05.108
  • 发表时间:
    2014
  • 期刊:
  • 影响因子:
    0
  • 作者:
    R. Fujimoto;Angshuman Guin;M. Hunter;Haesun Park;G. Kanitkar;R. Kannan;Michael Milholen;Sabra A. Neal;P. Pecher
  • 通讯作者:
    P. Pecher
Unfolding Latent Tree Structures using 4th Order Tensors
使用四阶张量展开潜在树结构
GPS-Based Shortest-Path Routing Scheme in Mobile Ad Hoc Network
移动Ad Hoc网络中基于GPS的最短路径路由方案
  • DOI:
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Haesun Park;Soo;So;Joo
  • 通讯作者:
    Joo
Biocompatibility Issues of Implantable Drug Delivery Systems
  • DOI:
    10.1023/a:1016012520276
  • 发表时间:
    1996-01-01
  • 期刊:
  • 影响因子:
    4.300
  • 作者:
    Haesun Park;Kinam Park
  • 通讯作者:
    Kinam Park
Efficient Implementation of Jacobi Algorithms and Jacobi Sets on Distributed Memory Architectures
雅可比算法和雅可比集在分布式内存架构上的高效实现

Haesun Park的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Haesun Park', 18)}}的其他基金

Collaborative Research: OAC Core: Robust, Scalable, and Practical Low Rank Approximation
合作研究:OAC 核心:稳健、可扩展且实用的低阶近似
  • 批准号:
    2106738
  • 财政年份:
    2021
  • 资助金额:
    $ 33.23万
  • 项目类别:
    Standard Grant
CAREER: New Representations of Probability Distributions to Improve Machine Learning --- A Unified Kernel Embedding Framework for Distributions
职业:改进机器学习的概率分布的新表示——统一的分布内核嵌入框架
  • 批准号:
    1350983
  • 财政年份:
    2014
  • 资助金额:
    $ 33.23万
  • 项目类别:
    Continuing Grant
EAGER: Hierarchical Topic Modeling by Nonnegative Matrix Factorization for Interactive Multi-scale Analysis of Text Data
EAGER:通过非负矩阵分解进行分层主题建模,用于文本数据的交互式多尺度分析
  • 批准号:
    1348152
  • 财政年份:
    2013
  • 资助金额:
    $ 33.23万
  • 项目类别:
    Standard Grant
EAGER: Fast and Accurate Nonnegative Tensor Decompositions: Algorithms and Software
EAGER:快速准确的非负张量分解:算法和软件
  • 批准号:
    0956517
  • 财政年份:
    2009
  • 资助金额:
    $ 33.23万
  • 项目类别:
    Standard Grant
FODAVA-Lead: Dimension Reduction and Data Reduction: Foundations for Visualization
FODAVA-Lead:降维和数据缩减:可视化的基础
  • 批准号:
    0808863
  • 财政年份:
    2008
  • 资助金额:
    $ 33.23万
  • 项目类别:
    Continuing Grant
MSPA-MCS: Collaborative Research: Fast Nonnegative Matrix Factorizations: Theory, Algorithms, and Applications
MSPA-MCS:协作研究:快速非负矩阵分解:理论、算法和应用
  • 批准号:
    0732318
  • 财政年份:
    2007
  • 资助金额:
    $ 33.23万
  • 项目类别:
    Standard Grant
SGER: Effective Network Anomaly Detection Based on Adaptive Machine Learning
SGER:基于自适应机器学习的有效网络异常检测
  • 批准号:
    0715342
  • 财政年份:
    2007
  • 资助金额:
    $ 33.23万
  • 项目类别:
    Standard Grant
Collaborative Research: Greedy Approximations with Nonsubmodular Potential Functions
协作研究:具有非子模势函数的贪婪近似
  • 批准号:
    0728812
  • 财政年份:
    2007
  • 资助金额:
    $ 33.23万
  • 项目类别:
    Standard Grant
CompBio: Collaborative Research: Development of Effective Gene Selection Algorithms for Microarray Data Analysis
CompBio:合作研究:开发用于微阵列数据分析的有效基因选择算法
  • 批准号:
    0621889
  • 财政年份:
    2006
  • 资助金额:
    $ 33.23万
  • 项目类别:
    Continuing Grant
Special Meeting: Workshop on Future Direction in Numerical Algorithms and Optimization
特别会议:数值算法与优化未来方向研讨会
  • 批准号:
    0633793
  • 财政年份:
    2006
  • 资助金额:
    $ 33.23万
  • 项目类别:
    Standard Grant

相似国自然基金

化脓性链球菌分泌性酯酶Sse抑制LC3相关吞噬促其侵袭的机制研究
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
太阳能电池Cu2ZnSn(SSe)4/CdS界面过渡层结构模拟及缺陷态消除研究
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    55 万元
  • 项目类别:
    面上项目
掺杂实现Cu2ZnSn(SSe)4吸收层表层稳定弱n型特性的第一性原理研究
  • 批准号:
    12004100
  • 批准年份:
    2020
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
基于SSE的航空信息系统信息安全保障评价指标体系的研究
  • 批准号:
    60776808
  • 批准年份:
    2007
  • 资助金额:
    19.0 万元
  • 项目类别:
    联合基金项目

相似海外基金

Collaborative Research: SI2-SSE: WRENCH: A Simulation Workbench for Scientific Worflow Users, Developers, and Researchers
协作研究:SI2-SSE:WRENCH:面向科学 Worflow 用户、开发人员和研究人员的模拟工作台
  • 批准号:
    1642369
  • 财政年份:
    2017
  • 资助金额:
    $ 33.23万
  • 项目类别:
    Standard Grant
SI2-SSE: Collaborative Research: Integrated Tools for DNA Nanostructure Design and Simulation
SI2-SSE:合作研究:DNA 纳米结构设计和模拟的集成工具
  • 批准号:
    1740212
  • 财政年份:
    2017
  • 资助金额:
    $ 33.23万
  • 项目类别:
    Standard Grant
Collaborative Research: NSCI: SI2-SSE: Time Stepping and Exchange-Correlation Modules for Massively Parallel Real-Time Time-Dependent DFT
合作研究:NSCI:SI2-SSE:大规模并行实时瞬态 DFT 的时间步进和交换相关模块
  • 批准号:
    1740219
  • 财政年份:
    2017
  • 资助金额:
    $ 33.23万
  • 项目类别:
    Standard Grant
SI2-SSE: Collaborative Research: Integrated Tools for DNA Nanostructure Design and Simulation
SI2-SSE:合作研究:DNA 纳米结构设计和模拟的集成工具
  • 批准号:
    1740282
  • 财政年份:
    2017
  • 资助金额:
    $ 33.23万
  • 项目类别:
    Standard Grant
Collaborative Research: SI2-SSE: An open source multi-physics platform to advance fundamental understanding of plasma physics and enable impactful application of plasma systems
合作研究:SI2-SSE:一个开源多物理平台,可促进对等离子体物理学的基本理解并实现等离子体系统的有效应用
  • 批准号:
    1740300
  • 财政年份:
    2017
  • 资助金额:
    $ 33.23万
  • 项目类别:
    Standard Grant
SI2-SSE: Collaborative Research: Software Framework for Strongly Correlated Materials: from DFT to DMFT
SI2-SSE:协作研究:强相关材料的软件框架:从 DFT 到 DMFT
  • 批准号:
    1740112
  • 财政年份:
    2017
  • 资助金额:
    $ 33.23万
  • 项目类别:
    Standard Grant
SI2-SSE: Collaborative Research: A Sustainable Future for the Glue Multi-Dimensional Linked Data Visualization Package
SI2-SSE:协作研究:Glue 多维关联数据可视化包的可持续未来
  • 批准号:
    1740229
  • 财政年份:
    2017
  • 资助金额:
    $ 33.23万
  • 项目类别:
    Standard Grant
SI2-SSE: Collaborative Research: Software Framework for Strongly Correlated Materials: from DFT to DMFT
SI2-SSE:协作研究:强相关材料的软件框架:从 DFT 到 DMFT
  • 批准号:
    1740111
  • 财政年份:
    2017
  • 资助金额:
    $ 33.23万
  • 项目类别:
    Standard Grant
Collaborative Proposal: SI2-SSE: An open source multi-physics platform to advance fundamental understanding of plasma physics and enable impactful application of plasma systems
合作提案:SI2-SSE:一个开源多物理平台,可促进对等离子体物理学的基本理解并实现等离子体系统的有效应用
  • 批准号:
    1740310
  • 财政年份:
    2017
  • 资助金额:
    $ 33.23万
  • 项目类别:
    Standard Grant
Collaborative Research: SI2-SSE: WRENCH: A Simulation Workbench for Scientific Workflow Users, Developers, and Researchers
协作研究:SI2-SSE:WRENCH:面向科学工作流程用户、开发人员和研究人员的模拟工作台
  • 批准号:
    1642335
  • 财政年份:
    2017
  • 资助金额:
    $ 33.23万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了