CAREER: Statistical Learning, Inference and Approximation with Reproducing Kernels
职业:使用再现核进行统计学习、推理和逼近
基本信息
- 批准号:1945396
- 负责人:
- 金额:$ 40万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-06-01 至 2025-05-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Many modern scientific fields, such as astrophysics, bio-informatics, finance, forensics, social science, and others generate massive amounts of data that are both high-dimensional and non-standard. For example, the data may have structure such as graphs, functions, strings, and sets, but is not Euclidean. To analyze these data sets and address various statistical applications arising in these fields, efficient learning and inference procedures that can handle high-dimensional and non-standard data are needed. The functional analytic paradigm involving reproducing kernels, also known as the kernel method, provides a unified framework to handle such data and has been applied to a variety of non-parametric statistical problems with great empirical success by the machine learning community. However, its theoretical understanding in terms of statistical optimality has been limited, and computationally it scales poorly to large data. The key focus of this project is to explore various foundational research questions associated with the kernel method to achieve a statistically optimal and computationally efficient paradigm that can handle high-dimensional non-standard data. This research will significantly impact scientific development in all areas of science and engineering that intersect with statistics, and will be integrated with the PI's educational activities of mentoring students, developing new courses and forging new collaborations. Methods and code developed under this project will be made publicly available for ready use. The core idea behind the kernel method is to map the observed data (could be high-dimensional and non-standard) to exotic function space, called the reproducing kernel Hilbert space (RKHS) and apply the standard methods developed for Euclidean data on the mapped data. Ironically, the RKHS is usually higher dimensional (even infinite-dimensional) than the dimensionality of the observed data, and is characterized by a kernel function called the reproducing kernel. The main advantage of the kernel method is its ability to explore nonlinear relationships in data by simply exploring linear relationships between the mapped elements in the RKHS through the kernel function. Despite its superior empirical performance, the statistical theory of learning algorithms based on the kernel method is not well understood except in a few cases such as classification, non-parametric least square regression, principal component analysis and goodness-of-fit testing. In tise project, the PI will explore various foundational research questions associated with the kernel method and associated learning algorithms to address this gap. The project consists of four related research themes that overall seek to deepen the mathematical understanding of the kernel method so as to exploit its full power in constructing inference procedures that can efficiently handle non-standard data. The project will also shed light on the advantages and limitations of the kernel method over other non-parametric methods in the literature. The aims are to (i) Develop statistical optimality results for kernel-based hypothesis tests and non-linear canonical correlation analysis, (ii) Develop computational vs. statistical trade-off analysis for various kernel learning procedures using approximation schemes such as Nystrom method, random features and their variations, that speed up these procedures, (iii) Develop new methodologies with concrete mathematical guarantees using the kernel method for learning and inference on functions and probability distributions, with applications in functional data analysis, and (iv) Generalize the kernel method using multi-scale kernels to obtain wavelet-like representations and investigate its statistical and computational behaviors in various learning procedures. Overall, the project will develop a comprehensive mathematical theory for computationally efficient kernel-based learning algorithms with applications in statistical learning.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
许多现代科学领域,如天体物理学、生物信息学、金融、法医学、社会科学等,都会产生大量高维和非标准的数据。例如,数据可能具有图、函数、字符串和集合等结构,但不是欧几里得结构。为了分析这些数据集并解决这些领域中出现的各种统计应用,需要能够处理高维和非标准数据的高效学习和推理程序。涉及复制核的功能分析范式,也称为核方法,提供了一个统一的框架来处理这些数据,并已被机器学习社区应用于各种非参数统计问题,并取得了巨大的经验成功。然而,它在统计最优性方面的理论理解是有限的,并且在计算上它对大数据的扩展性很差。本项目的重点是探索与核方法相关的各种基础研究问题,以实现统计上最优和计算效率最高的范式,可以处理高维非标准数据。这项研究将对与统计学相关的所有科学和工程领域的科学发展产生重大影响,并将与PI指导学生、开发新课程和建立新合作的教育活动相结合。在这个项目下开发的方法和代码将公开提供,以便随时使用。核方法背后的核心思想是将观察到的数据(可能是高维和非标准的)映射到称为再现核希尔伯特空间(RKHS)的奇异函数空间,并在映射的数据上应用为欧几里得数据开发的标准方法。具有讽刺意味的是,RKHS通常比观测数据的维数更高(甚至是无限维),并以称为再现核的核函数为特征。核方法的主要优点是它能够通过核函数简单地探索RKHS中映射元素之间的线性关系来探索数据中的非线性关系。尽管核方法具有优越的经验性能,但除了分类、非参数最小二乘回归、主成分分析和拟合优度检验等少数情况外,基于核方法的学习算法的统计理论还没有得到很好的理解。在这个项目中,PI将探索与核方法和相关学习算法相关的各种基础研究问题,以解决这一差距。该项目包括四个相关的研究主题,总体上寻求加深对核方法的数学理解,以便利用其在构建可以有效处理非标准数据的推理程序中的全部功能。该项目还将阐明核方法相对于文献中其他非参数方法的优点和局限性。其目的是:(i)为基于核的假设检验和非线性典型相关分析开发统计最优性结果;(ii)为各种核学习程序开发计算与统计权衡分析,使用近似方案,如Nystrom方法、随机特征及其变化,加快这些程序;(iii)开发具有具体数学保证的新方法,使用核方法对函数和概率分布进行学习和推理,并应用于函数数据分析;(iv)使用多尺度核推广核方法,以获得类小波表示,并研究其在各种学习过程中的统计和计算行为。总体而言,该项目将开发一个全面的数学理论,用于计算效率高的基于核的学习算法,并应用于统计学习。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(5)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Convergence analysis of kernel conjugate gradient for functional linear regression
- DOI:10.30970/ana.2023.1.33
- 发表时间:2023-10
- 期刊:
- 影响因子:0
- 作者:Naveen Gupta;And S. SIVANANTHAN;Bharath K. Sriperumbudur
- 通讯作者:Naveen Gupta;And S. SIVANANTHAN;Bharath K. Sriperumbudur
On Distance and Kernel Measures of Conditional Dependence
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:T. Sheng;Bharath K. Sriperumbudur
- 通讯作者:T. Sheng;Bharath K. Sriperumbudur
Robust Persistence Diagrams using Reproducing Kernels
使用复制内核的鲁棒持久性图
- DOI:
- 发表时间:2020
- 期刊:
- 影响因子:0
- 作者:Vishwanath, Siddharth;Fukumizu, Kenji;Kuriki, Satoshi and
- 通讯作者:Kuriki, Satoshi and
Approximate kernel PCA: Computational versus statistical trade-off
近似核 PCA:计算与统计的权衡
- DOI:10.1214/22-aos2204
- 发表时间:2022
- 期刊:
- 影响因子:0
- 作者:Sriperumbudur, Bharath K.;Sterge, Nicholas
- 通讯作者:Sterge, Nicholas
Statistical Optimality and Computational Efficiency of Nyström Kernel PCA
- DOI:
- 发表时间:2021-05
- 期刊:
- 影响因子:0
- 作者:Nicholas Sterge;Bharath K. Sriperumbudur
- 通讯作者:Nicholas Sterge;Bharath K. Sriperumbudur
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Bharath Sriperumbudur其他文献
DC programming in discrete convex analysis
离散凸分析中的 DC 编程
- DOI:
- 发表时间:
2015 - 期刊:
- 影响因子:0
- 作者:
Bharath Sriperumbudur;Kenji Fukumizu;Arthur Gretton;Aapo Hyvarinen;Revant Kumar;K. Murota - 通讯作者:
K. Murota
Bharath Sriperumbudur的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Bharath Sriperumbudur', 18)}}的其他基金
Reproducing Kernel Hilbert Space Embedding of Measures: Theory and Applications to Statistical Learning
再现核希尔伯特空间嵌入的测量:统计学习的理论和应用
- 批准号:
1713011 - 财政年份:2017
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
相似海外基金
CAREER: New Frameworks for Ethical Statistical Learning: Algorithmic Fairness and Privacy
职业:道德统计学习的新框架:算法公平性和隐私
- 批准号:
2340241 - 财政年份:2024
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
CAREER: Statistical Learning with Recursive Partitioning: Algorithms, Accuracy, and Applications
职业:递归分区的统计学习:算法、准确性和应用
- 批准号:
2239448 - 财政年份:2023
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
CAREER: Domain-aware Statistical Learning
职业:领域感知统计学习
- 批准号:
2143695 - 财政年份:2022
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CAREER: Statistical Learning from a Modern Perspective: Over-parameterization, Regularization, and Generalization
职业:现代视角下的统计学习:过度参数化、正则化和泛化
- 批准号:
2143215 - 财政年份:2022
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
CAREER: Understanding metal/support interactions in catalysis with statistical learning
职业:通过统计学习了解催化中金属/载体的相互作用
- 批准号:
2143941 - 财政年份:2022
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
CAREER: Federated Learning: Statistical Optimality and Provable Security
职业:联邦学习:统计最优性和可证明的安全性
- 批准号:
2144593 - 财政年份:2022
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
CAREER: Designing Meaningful Learning Experiences for Statistical Literacy in Secondary Mathematics
职业:为中学数学中的统计素养设计有意义的学习体验
- 批准号:
2143816 - 财政年份:2022
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
CAREER: Fast and Accurate Statistical Learning and Inference from Large-Scale Data: Theory, Methods, and Algorithms
职业:从大规模数据中快速准确地进行统计学习和推理:理论、方法和算法
- 批准号:
2046874 - 财政年份:2021
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
CAREER: New Statistical Paradigms Reconciling Empirical Surprises in Modern Machine Learning
职业:新的统计范式调和现代机器学习中的经验惊喜
- 批准号:
2042473 - 财政年份:2021
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
CAREER: Nonconvex Optimization for Statistical Estimation and Learning: Conditioning, Dynamics, and Nonsmoothness
职业:统计估计和学习的非凸优化:条件、动力学和非平滑性
- 批准号:
2047637 - 财政年份:2021
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant