CAREER: Leveraging Randomization and Structure in Computational Linear Algebra for Data Science

职业:利用计算线性代数中的随机化和结构进行数据科学

基本信息

项目摘要

Data science plays a central role in addressing societal challenges, such as healthcare, climate change, and urban planning. At the core of nearly all developments in algorithms for data science is computational linear algebra, an area that concerns the study of algorithms for solving ubiquitous problems involving matrices and other linear-algebraic objects that are used to represent data. With ever-increasing data sizes, randomization has become a key technique for developing efficient algorithms in computational linear algebra. Yet, there is a significant gap between the theory and practice of these algorithms, which has slowed their practical adoption in data science applications. This project identifies key challenges and puts forward new directions towards providing the algorithmic foundations necessary to ensure that a broad scope of randomized linear algebra algorithms are successfully deployed across computational data science over the next decade. This project leverages fundamental interdisciplinary ideas at the intersection of theoretical computer science, machine learning, statistics, and nonlinear optimization. In addition to developing the theoretical foundations, one of the key aims driving the project is to facilitate ongoing implementation efforts aimed at incorporating randomization into LAPACK, the default computational linear algebra software package in machine learning, engineering, statistics, and scientific computing for the past thirty years. At the core of the project is an integrated education plan focused on helping students to gain an interdisciplinary skillset at the intersection of algorithmic foundations and data science. The project also involves outreach to students from three underresourced high schools in Michigan through a collaboration with the university's Engineering Pathways program.The project’s objectives are to close the theory-practice gap in using randomization to design improved algorithms for ubiquitous matrix problems such as matrix multiplication, solving linear systems, and low-rank approximation. The project identifies three major thrusts, namely (1) reformulating optimal matrix sketching via black-box sampling methods; (2) randomized iterative refinement algorithms via stochastic optimization; (3) a study of robustness of randomized numerical linear algebra algorithms to preserve certain structural elements of data. The matrix sketch, i.e., a small randomized approximation of the input data is a key foundational component of these algorithms. The project aims to develop new algorithmic and theoretical approaches towards ensuring the control and reliability of the output produced by matrix sketching and sub-sampling, which is especially challenging when dealing with randomization and will be critical for successful software integration. Building on these tools, the project pursues new approaches for designing high-precision algorithms solving linear systems and quadratic problems, by exploring techniques that lie in the unexplored regime between deterministic iterative solvers and stochastic optimization. Finally, the project aims to contribute to a unified understanding of randomized matrix approximation algorithms that preserve the structure of the data, which is essential for feature selection, experimental design, interpretability, and more.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
数据科学在应对医疗保健、气候变化和城市规划等社会挑战方面发挥着核心作用。几乎所有数据科学算法发展的核心都是计算线性代数,这一领域涉及到解决普遍存在的问题的算法研究,这些问题涉及用于表示数据的矩阵和其他线性代数对象。随着数据量的不断增加,随机化已经成为开发计算线性代数中高效算法的关键技术。然而,这些算法的理论和实践之间存在着巨大的差距,这减缓了它们在数据科学应用中的实际采用。该项目确定了关键挑战,并提出了新的方向,以提供必要的算法基础,以确保在未来十年内在计算数据科学中成功部署广泛的随机化线性代数算法。这个项目利用了理论计算机科学、机器学习、统计学和非线性优化的交叉学科的基本思想。除了发展理论基础,推动该项目的关键目标之一是促进旨在将随机化纳入LAPACK的持续实施工作,LAPACK是过去30年来机器学习、工程、统计和科学计算中的默认计算线性代数软件包。该项目的核心是一项综合教育计划,重点是帮助学生在算法基础和数据科学的交叉点上获得跨学科技能。该项目还包括通过与密歇根大学的工程路径项目合作,向密歇根州三所资源不足的高中的学生提供服务。该项目的目标是弥合理论与实践之间的差距,使用随机化来设计改进的算法来解决普遍存在的矩阵问题,如矩阵乘法、解线性系统和低阶近似。该项目确定了三个主要推动力,即(1)通过黑盒抽样方法重新形成最优矩阵草图;(2)通过随机优化来实现随机化迭代求精算法;(3)研究随机化数值线性代数算法的稳健性,以保持数据的某些结构元素。矩阵草图,即输入数据的小随机近似,是这些算法的关键基础组件。该项目旨在开发新的算法和理论方法,以确保矩阵草图和二次抽样产生的输出的控制和可靠性,这在处理随机化时尤其具有挑战性,并将对成功的软件集成至关重要。在这些工具的基础上,该项目通过探索介于确定性迭代求解器和随机优化之间的未知技术,寻求设计解决线性系统和二次问题的高精度算法的新方法。最后,该项目旨在促进对保持数据结构的随机矩阵近似算法的统一理解,该结构对于特征选择、实验设计、可解释性等至关重要。该奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Michal Derezinski其他文献

Surrogate-based Autotuning for Randomized Sketching Algorithms in Regression Problems
回归问题中随机草图算法的基于代理的自动调整
  • DOI:
    10.48550/arxiv.2308.15720
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Younghyun Cho;J. Demmel;Michal Derezinski;Haoyun Li;Hengrui Luo;Michael W. Mahoney;Riley Murray
  • 通讯作者:
    Riley Murray
Algorithmic Gaussianization through Sketching: Converting Data into Sub-gaussian Random Designs
通过草图进行算法高斯化:将数据转换为亚高斯随机设计
Fast determinantal point processes via distortion-free intermediate sampling
  • DOI:
  • 发表时间:
    2018-11
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Michal Derezinski
  • 通讯作者:
    Michal Derezinski
Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches
  • DOI:
    10.48550/arxiv.2206.02702
  • 发表时间:
    2022-06
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Michal Derezinski
  • 通讯作者:
    Michal Derezinski
Determinantal Point Processes in Randomized Numerical Linear Algebra
随机数值线性代数中的行列式点过程
  • DOI:
    10.1090/noti2202
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Michal Derezinski;Michael W. Mahoney
  • 通讯作者:
    Michael W. Mahoney

Michal Derezinski的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似海外基金

CAREER: Leveraging Plastic Deformation Mechanisms Interactions in Metallic Materials to Access Extraordinary Fatigue Strength.
职业:利用金属材料中的塑性变形机制相互作用来获得非凡的疲劳强度。
  • 批准号:
    2338346
  • 财政年份:
    2024
  • 资助金额:
    $ 64.94万
  • 项目类别:
    Continuing Grant
CSR: Small: Leveraging Physical Side-Channels for Good
CSR:小:利用物理侧通道做好事
  • 批准号:
    2312089
  • 财政年份:
    2024
  • 资助金额:
    $ 64.94万
  • 项目类别:
    Standard Grant
REU Site: CyberAI: Cybersecurity Solutions Leveraging Artificial Intelligence for Smart Systems
REU 网站:Cyber​​AI:利用人工智能实现智能系统的网络安全解决方案
  • 批准号:
    2349104
  • 财政年份:
    2024
  • 资助金额:
    $ 64.94万
  • 项目类别:
    Standard Grant
HSI Implementation and Evaluation Project: Leveraging Social Psychology Interventions to Promote First Year STEM Persistence
HSI 实施和评估项目:利用社会心理学干预措施促进第一年 STEM 的坚持
  • 批准号:
    2345273
  • 财政年份:
    2024
  • 资助金额:
    $ 64.94万
  • 项目类别:
    Standard Grant
Nonlocal Elastic Metamaterials: Leveraging Intentional Nonlocality to Design Programmable Structures
非局域弹性超材料:利用有意的非局域性来设计可编程结构
  • 批准号:
    2330957
  • 财政年份:
    2024
  • 资助金额:
    $ 64.94万
  • 项目类别:
    Standard Grant
Postdoctoral Fellowship: OPP-PRF: Leveraging Community Structure Data and Machine Learning Techniques to Improve Microbial Functional Diversity in an Arctic Ocean Ecosystem Model
博士后奖学金:OPP-PRF:利用群落结构数据和机器学习技术改善北冰洋生态系统模型中的微生物功能多样性
  • 批准号:
    2317681
  • 财政年份:
    2024
  • 资助金额:
    $ 64.94万
  • 项目类别:
    Standard Grant
Leveraging the synergy between experiment and computation to understand the origins of chalcogen bonding
利用实验和计算之间的协同作用来了解硫族键合的起源
  • 批准号:
    EP/Y00244X/1
  • 财政年份:
    2024
  • 资助金额:
    $ 64.94万
  • 项目类别:
    Research Grant
Building recovery and resilience in severe mental illness: Leveraging the role of social determinants in illness trajectories and interventions
建立严重精神疾病的康复和复原力:利用社会决定因素在疾病轨迹和干预措施中的作用
  • 批准号:
    MR/Z503514/1
  • 财政年份:
    2024
  • 资助金额:
    $ 64.94万
  • 项目类别:
    Research Grant
CAREER: Leveraging Data Science & Policy to Promote Sustainable Development Via Resource Recovery
职业:利用数据科学
  • 批准号:
    2339025
  • 财政年份:
    2024
  • 资助金额:
    $ 64.94万
  • 项目类别:
    Continuing Grant
CAREER: Constraining the high-latitude ocean carbon cycle: Leveraging the Ocean Observatories Initiative (OOI) Global Arrays as marine biogeochemical time series
职业:限制高纬度海洋碳循环:利用海洋观测计划(OOI)全球阵列作为海洋生物地球化学时间序列
  • 批准号:
    2338450
  • 财政年份:
    2024
  • 资助金额:
    $ 64.94万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了