权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Autonomous Tensor Analysis: From Raw Multi-Aspect Data to Actionable Insights

职业：自主张量分析：从原始多方面数据到可操作的见解

基本信息

批准号：
2046086
负责人：
Evangelos Papalexakis
金额：
$ 60万
依托单位：
University of California-Riverside
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2021
资助国家：
美国
起止时间：
2021-06-01 至 2026-05-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2046086&HistoricalAwards=false
关键词：
CAREER Autonomous Tensor Analysis Raw

项目摘要

Real-world entities are often described by multiple aspects in data. For instance, a news article online can be expressed by its textual content, the images it may contain, its author, its publication date, and the number of times it has been shared in social media. Integration of those aspects has been shown to be beneficial in a number of data science tasks, such as extracting the topic of an article or ascertaining its trustworthiness. Tensor decomposition, a class of multi-aspect data analytic methods, has been empirically shown to be particularly effective in such integration in a number of diverse applications, including: chemometrics, signal processing, social network analysis, and brain data analysis. Despite their effectiveness, current tensor methods have a crucial limitation that hinders their broad applicability: conducting tensor analysis is largely a manual endeavor that entails laborious trial-and-error tuning and requires both high familiarity with tensor methods and domain expertise in the target application. As a result, the number of practitioners who can successfully apply tensor analysis is limited and, thus, tensor analysis has not enjoyed as broad an adoption despite being such a powerful tool. The goal of this project is to democratize the entire, currently laborious and highly inaccessible, process of unsupervised exploratory tensor analysis, towards producing actionable insights from raw multi-aspect data. Towards broad convergence of research, practice, and education, the project will integrate the research outcomes to the investigator's educational and outreach activities. Those activities include the introduction of research outcomes to the development of undergraduate and graduate data science curricula, organization of workshops and tutorials at major research venues for disseminating the research outcomes to the scientific community, mentoring undergraduate students from underrepresented groups, and organizing summer workshops for teachers and capstone projects for students in collaboration with the local school district towards broadening the presence of data science in high school education.The research activities are organized in two major tasks: (i) Algorithms for unsupervised autonomous tensor analysis and (ii) Real-world applications, in collaboration with domain experts. Towards democratizing tensor decomposition, in the first task, the project will be the first to formulate in a principled manner a number of very challenging problems which have traditionally been tackled manually and which are essential in unsupervised tensor analysis. The project will develop methods of self-supervised learning in order to learn tensor datasets with exploitable and meaningful structure from raw data, solving a fundamental problem in data preparation for tensor analysis. The project has parallels with the emerging fields of meta-learning and auto-ML, which have recently gained traction in the data science and machine learning communities. However, the focus of those fields has overwhelmingly been to automate the fine-tuning of supervised models. The lack of supervision in this endeavor, however, poses major challenges and presents a huge opportunity: success of the project has the potential to create a new field of unsupervised exploratory meta-learning where the goal is to extract actionable and interpretable insights from raw data in an unsupervised manner. In order to maximize the potential benefit to the academic community, industry, and society at-large, and to stress-test the proposed methods on a wide array of diverse data (e.g., text, images, graphs, time-series, and scientific data), the second task of the project will focus on two major applications: (i) Misinformation detection on the Web: The project will provide the public with tools to better judge the trustworthiness of news online, towards more informed and active citizenry. (ii) Gravitational wave detection: the study of gravitational waves can help unravel current mysteries of the universe and analyze cosmic objects that do not emit light. The project aims at improving the detection and analysis of Gravitational Waves, with the potential to empower scientific discovery towards further understanding the universe.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

现实世界的实体通常由数据中的多个方面描述。例如，一篇在线新闻文章可以通过其文本内容、可能包含的图像、作者、出版日期以及在社交媒体上被分享的次数来表达。这些方面的整合已被证明在许多数据科学任务中是有益的，例如提取文章的主题或确定其可信度。张量分解，一类多方面的数据分析方法，已被经验证明是特别有效的，在这样的集成在许多不同的应用，包括：化学计量学，信号处理，社会网络分析，和大脑数据分析。尽管它们的有效性，目前的张量方法有一个关键的限制，阻碍了它们的广泛适用性：进行张量分析在很大程度上是一个手动的奋进，需要费力的试错调整，并需要高度熟悉张量方法和目标应用领域的专业知识。因此，能够成功应用张量分析的从业者数量有限，因此，尽管张量分析是如此强大的工具，但它并没有得到广泛的采用。该项目的目标是使整个目前费力且高度不可访问的无监督探索性张量分析过程民主化，以从原始多方面数据中产生可操作的见解。为了广泛融合研究、实践和教育，该项目将把研究成果纳入调查人员的教育和外联活动。这些活动包括将研究成果引入本科生和研究生数据科学课程的开发，在主要研究场所组织研讨会和辅导，向科学界传播研究成果，指导来自代表性不足群体的本科生，并与当地学区合作，为教师举办暑期讲习班，为学生举办顶点项目，研究活动分为两个主要任务：（i）无监督自主张量分析算法和（ii）与领域专家合作的现实世界应用。为了使张量分解民主化，在第一个任务中，该项目将首先以原则性的方式制定一些非常具有挑战性的问题，这些问题传统上是手动解决的，并且在无监督张量分析中至关重要。该项目将开发自监督学习的方法，以便从原始数据中学习具有可利用和有意义结构的张量数据集，解决张量分析数据准备的基本问题。该项目与元学习和自动ML的新兴领域相似，这些领域最近在数据科学和机器学习社区中获得了关注。然而，这些领域的重点绝大多数是自动化监督模型的微调。然而，在这一奋进中缺乏监督，带来了重大挑战，也带来了巨大的机会：该项目的成功有可能创造一个新的无监督探索性元学习领域，其目标是以无监督的方式从原始数据中提取可操作和可解释的见解。为了最大限度地提高学术界、工业界和整个社会的潜在利益，并对所提出的方法在各种各样的数据上进行压力测试（例如，文本、图像、图形、时间序列和科学数据），该项目的第二项任务将集中在两个主要应用上：（i）网络上的错误信息检测：该项目将为公众提供更好地判断可信度的工具在线新闻，以培养更加知情和积极的公民。(ii)引力波检测：引力波的研究可以帮助解开宇宙目前的谜团，分析不发光的宇宙物体。该项目旨在改善引力波的探测和分析，并有可能为科学发现提供支持，以进一步了解宇宙。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（15）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Vec2Node: Self-Training with Tensor Augmentation for Text Classification with Few Labels

DOI：
10.1007/978-3-031-26390-3_33
发表时间：
2022
期刊：
Nucleic Acids Research
影响因子：
14.9
作者：
S. Abdali;Subhabrata Mukherjee;E. Papalexakis
通讯作者：
S. Abdali;Subhabrata Mukherjee;E. Papalexakis

SV-Learn: Learning Matrix Singular Values with Neural Networks

SV-Learn：使用神经网络学习矩阵奇异值

DOI：
10.1109/icdmw58026.2022.00039
发表时间：
2022
期刊：
2022 IEEE International Conference on Data Mining Workshops (ICDMW
影响因子：
0
作者：
Xu, Derek;Shiao, William;Chen, Jia;Papalexakis, Evangelos E.
通讯作者：
Papalexakis, Evangelos E.

NED: Niche Detection in User Content Consumption Data

DOI：
10.1145/3459637.3482455
发表时间：
2021-10
期刊：
Proceedings of the 30th ACM International Conference on Information & Knowledge Management
影响因子：
0
作者：
Ekta Gujral;Leonardo Neves;E. Papalexakis;Neil Shah
通讯作者：
Ekta Gujral;Leonardo Neves;E. Papalexakis;Neil Shah

Link Prediction with Non-Contrastive Learning

DOI：
10.48550/arxiv.2211.14394
发表时间：
2022-11
期刊：
ArXiv
影响因子：
0
作者：
William Shiao;Zhichun Guo;Tong Zhao;E. Papalexakis;Yozen Liu;Neil Shah
通讯作者：
William Shiao;Zhichun Guo;Tong Zhao;E. Papalexakis;Yozen Liu;Neil Shah

Identifying Witnesses to Noise Transients in Ground-based Gravitational-wave Observations using Auxiliary Channels with Matrix and Tensor Factorization Techniques

使用具有矩阵和张量分解技术的辅助通道识别地基引力波观测中噪声瞬变的证据

DOI：
发表时间：
2022
期刊：
NeurIPS 2022 AI for Science Workshop
影响因子：
0
作者：
Gurav, Rutuja;Papalexakis, E.E.;Barish B.C.;Richardson, Jonatha;Vajente, Gabriele
通讯作者：
Vajente, Gabriele