权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

III: Medium: High-Performance Factorization Tools for Constrained and Hidden Tensor Models

III：中：用于约束和隐藏张量模型的高性能分解工具

基本信息

批准号：
1704074
负责人：
George Karypis
金额：
$ 120万
依托单位：
University of Minnesota-Twin Cities
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2017
资助国家：
美国
起止时间：
2017-09-01 至 2023-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1704074&HistoricalAwards=false
关键词：
III Medium Performance Factorization Tools

项目摘要

Tensors generalize matrices to higher dimensions (called modes) and are designed to model multi-way data. Tensor factorization algorithms analyze such multi-way data to uncover relations between the different modes that can be used to both gain insights and to predict unknown aspects of the underlying system/process. For example, medical diagnosis and treatment records can be modeled via a four-mode tensor whose modes correspond to patients, physicians, diagnosis, and treatments and its factorization can provide insights on the co-occurrence of medical conditions, treatment approaches, any treatment differences based on the physician, and identify potential instances of medical fraud. This project's research is designed to address current limitations of tensor analysis by developing new theory and algorithms and high-performance scalable parallel formulations of the various computational kernels used by these algorithms, and a flexible open source software toolkit that can be used to perform constrained and hidden tensor factorization of very large and sparse multi-way datasets. The success of this project will allow researchers to leverage the power of multi-way ``Big Data'' analysis to solve various problems in diverse application domains such as healthcare, medical imaging, cybersecurity, social and behavioral sciences, and e-commerce. At the same time, the project will provide data science training to the students involved by combining cutting-edge data and signal analytics, data mining, and high-performance computing.Constrained matrix and tensor factorization techniques are widely used for dimensionality reduction, clustering, and estimation in machine learning, signal processing, and many other walks of science and engineering. Unconstrained matrix and tensor factorization algorithms are relatively mature, but constrained counterparts are lagging in terms of speed, scalability, and flexibility. In many applications (e.g., medical imaging and recommender systems), instead of observing the actual entries of a tensor, we observe a limited number of linear combinations (e.g., partial sums) of these entries and need to identify the tensor's latent factors from these measurements. Being able to directly identify the latent factors from linear measurements, which we refer to as hidden tensor factorization, has important advantages in terms of complexity, memory footprint, and the ability to handle very large data sets. Developing open source high-performance parallel tools for constrained and hidden tensor factorization in both shared- and distributed-memory systems will significantly enhance the ability to analyze very large multi-way data. The research will evolve along two synergistic thrusts. First, it will develop new theory and algorithms for constrained and hidden tensor factorization by (i) building fast first-order (FFO) and fast stochastic first-order (FSFO) constrained tensor decomposition algorithms that strike favorable trade-offs between simplicity, scalability, and speed of convergence, and (ii) tackling important identifiability and algorithmic issues related to hidden tensor factorization. Second, it will undertake a multi-pronged effort towards developing high-performance parallel formulations for the computational kernels used in constrained and unconstrained tensor and hidden tensor factorization and develop a high-performance tensor factorization software toolbox. The release of the high-performance tensor factorization toolbox will enable researchers and practitioners to scale up not only the size of data but also the variety of constraints and types of data they can analyze. The research will involve students that will be trained in data science, combining cutting-edge signal and data analytics, data mining, and high-performance computing.

张量将矩阵推广到更高的维度（称为模式），旨在对多路数据进行建模。张量因子分解算法分析此类多路数据，以揭示不同模式之间的关系，这些模式可用于获得见解并预测底层系统/流程的未知方面。例如，医疗诊断和治疗记录可以通过四模式张量进行建模，其模式对应于患者，医生，诊断和治疗，并且其因子分解可以提供关于医疗状况，治疗方法，基于医生的任何治疗差异的共同出现的见解，并识别医疗欺诈的潜在实例。该项目的研究旨在通过开发新的理论和算法以及这些算法所使用的各种计算内核的高性能可扩展并行公式来解决张量分析的当前限制，以及一个灵活的开源软件工具包，可用于执行非常大和稀疏的多路数据集的约束和隐藏张量因子分解。该项目的成功将使研究人员能够利用多路“大数据”分析的力量来解决医疗保健、医学成像、网络安全、社会和行为科学以及电子商务等不同应用领域的各种问题。与此同时，该项目还将为参与的学生提供数据科学培训，将前沿的数据和信号分析、数据挖掘和高性能计算相结合。约束矩阵和张量因子分解技术广泛用于机器学习、信号处理和许多其他科学和工程领域的降维、聚类和估计。无约束矩阵和张量分解算法相对成熟，但有约束的算法在速度、可扩展性和灵活性方面相对滞后。在许多应用中（例如，医学成像和推荐系统），我们观察有限数量的线性组合（例如，部分和），并且需要从这些测量中识别张量的潜在因子。能够直接从线性测量中识别潜在因子，我们称之为隐藏张量因子分解，在复杂性，内存占用和处理超大数据集的能力方面具有重要优势。在共享和分布式内存系统中开发用于约束和隐藏张量因式分解的开源高性能并行工具将显着增强分析非常大的多路数据的能力。这项研究将沿着两个协同的方向发展。首先，它将通过（i）构建快速一阶（FFO）和快速随机一阶（FSFO）约束张量分解算法，在简单性，可扩展性和收敛速度之间进行有利的权衡，以及（ii）解决与隐藏张量因式分解相关的重要可识别性和算法问题，开发约束和隐藏张量因式分解的新理论和算法。其次，它将多管齐下，努力为约束和无约束张量和隐藏张量因式分解中使用的计算内核开发高性能并行公式，并开发高性能张量因式分解软件工具箱。高性能张量因子分解工具箱的发布将使研究人员和从业人员不仅能够扩展数据的大小，还可以扩展他们可以分析的各种约束和数据类型。该研究将涉及将接受数据科学培训的学生，结合尖端的信号和数据分析，数据挖掘和高性能计算。

项目成果

期刊论文数量（34）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Multi-Set Low-Rank Factorizations With Shared and Unshared Components

具有共享和非共享组件的多集低秩分解

DOI：
10.1109/tsp.2020.3020408
发表时间：
2020
期刊：
IEEE Transactions on Signal Processing
影响因子：
5.4
作者：
Sorensen, Mikael;Sidiropoulos, Nicholas D.
通讯作者：
Sidiropoulos, Nicholas D.

Statistical Learning Using Hierarchical Modeling of Probability Tensors

使用概率张量的分层建模进行统计学习

DOI：
10.1109/dsw.2019.8755580
发表时间：
2019
期刊：
2019 IEEE Data Science Workshop
影响因子：
0
作者：
Amiridi, Magda;Kargas, Nikos;Sidiropoulos, Nicholas D.
通讯作者：
Sidiropoulos, Nicholas D.

Hyperspectral Super-Resolution Via Coupled Tensor Factorization: Identifiability and Algorithms

DOI：
10.1109/icassp.2018.8462525
发表时间：
2018-04
期刊：
2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
影响因子：
0
作者：
Charilaos I. Kanatsoulis;Xiao Fu;N. Sidiropoulos;Wing-Kin Ma
通讯作者：
Charilaos I. Kanatsoulis;Xiao Fu;N. Sidiropoulos;Wing-Kin Ma

Prema: Principled Tensor Data Recovery From Multiple Aggregated Views

Prema：从多个聚合视图恢复有原则的张量数据