权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Statistical Learning, Inference and Approximation with Reproducing Kernels

职业：使用再现核进行统计学习、推理和逼近

基本信息

批准号：
1945396
负责人：
Bharath Sriperumbudur
金额：
$ 40万
依托单位：
Pennsylvania State Univ University Park
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2020
资助国家：
美国
起止时间：
2020-06-01 至 2025-05-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1945396&HistoricalAwards=false
关键词：
CAREER Statistical Learning Inference Approximation

项目摘要

Many modern scientific fields, such as astrophysics, bio-informatics, finance, forensics, social science, and others generate massive amounts of data that are both high-dimensional and non-standard. For example, the data may have structure such as graphs, functions, strings, and sets, but is not Euclidean. To analyze these data sets and address various statistical applications arising in these fields, efficient learning and inference procedures that can handle high-dimensional and non-standard data are needed. The functional analytic paradigm involving reproducing kernels, also known as the kernel method, provides a unified framework to handle such data and has been applied to a variety of non-parametric statistical problems with great empirical success by the machine learning community. However, its theoretical understanding in terms of statistical optimality has been limited, and computationally it scales poorly to large data. The key focus of this project is to explore various foundational research questions associated with the kernel method to achieve a statistically optimal and computationally efficient paradigm that can handle high-dimensional non-standard data. This research will significantly impact scientific development in all areas of science and engineering that intersect with statistics, and will be integrated with the PI's educational activities of mentoring students, developing new courses and forging new collaborations. Methods and code developed under this project will be made publicly available for ready use. The core idea behind the kernel method is to map the observed data (could be high-dimensional and non-standard) to exotic function space, called the reproducing kernel Hilbert space (RKHS) and apply the standard methods developed for Euclidean data on the mapped data. Ironically, the RKHS is usually higher dimensional (even infinite-dimensional) than the dimensionality of the observed data, and is characterized by a kernel function called the reproducing kernel. The main advantage of the kernel method is its ability to explore nonlinear relationships in data by simply exploring linear relationships between the mapped elements in the RKHS through the kernel function. Despite its superior empirical performance, the statistical theory of learning algorithms based on the kernel method is not well understood except in a few cases such as classification, non-parametric least square regression, principal component analysis and goodness-of-fit testing. In tise project, the PI will explore various foundational research questions associated with the kernel method and associated learning algorithms to address this gap. The project consists of four related research themes that overall seek to deepen the mathematical understanding of the kernel method so as to exploit its full power in constructing inference procedures that can efficiently handle non-standard data. The project will also shed light on the advantages and limitations of the kernel method over other non-parametric methods in the literature. The aims are to (i) Develop statistical optimality results for kernel-based hypothesis tests and non-linear canonical correlation analysis, (ii) Develop computational vs. statistical trade-off analysis for various kernel learning procedures using approximation schemes such as Nystrom method, random features and their variations, that speed up these procedures, (iii) Develop new methodologies with concrete mathematical guarantees using the kernel method for learning and inference on functions and probability distributions, with applications in functional data analysis, and (iv) Generalize the kernel method using multi-scale kernels to obtain wavelet-like representations and investigate its statistical and computational behaviors in various learning procedures. Overall, the project will develop a comprehensive mathematical theory for computationally efficient kernel-based learning algorithms with applications in statistical learning.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

许多现代科学领域，如天体物理学、生物信息学、金融、法医学、社会科学等，都会产生大量高维和非标准的数据。例如，数据可能具有图、函数、字符串和集合等结构，但不是欧几里得结构。为了分析这些数据集并解决这些领域中出现的各种统计应用，需要能够处理高维和非标准数据的高效学习和推理程序。涉及复制核的功能分析范式，也称为核方法，提供了一个统一的框架来处理这些数据，并已被机器学习社区应用于各种非参数统计问题，并取得了巨大的经验成功。然而，它在统计最优性方面的理论理解是有限的，并且在计算上它对大数据的扩展性很差。本项目的重点是探索与核方法相关的各种基础研究问题，以实现统计上最优和计算效率最高的范式，可以处理高维非标准数据。这项研究将对与统计学相关的所有科学和工程领域的科学发展产生重大影响，并将与PI指导学生、开发新课程和建立新合作的教育活动相结合。在这个项目下开发的方法和代码将公开提供，以便随时使用。核方法背后的核心思想是将观察到的数据（可能是高维和非标准的）映射到称为再现核希尔伯特空间（RKHS）的奇异函数空间，并在映射的数据上应用为欧几里得数据开发的标准方法。具有讽刺意味的是，RKHS通常比观测数据的维数更高（甚至是无限维），并以称为再现核的核函数为特征。核方法的主要优点是它能够通过核函数简单地探索RKHS中映射元素之间的线性关系来探索数据中的非线性关系。尽管核方法具有优越的经验性能，但除了分类、非参数最小二乘回归、主成分分析和拟合优度检验等少数情况外，基于核方法的学习算法的统计理论还没有得到很好的理解。在这个项目中，PI将探索与核方法和相关学习算法相关的各种基础研究问题，以解决这一差距。该项目包括四个相关的研究主题，总体上寻求加深对核方法的数学理解，以便利用其在构建可以有效处理非标准数据的推理程序中的全部功能。该项目还将阐明核方法相对于文献中其他非参数方法的优点和局限性。其目的是：(i)为基于核的假设检验和非线性典型相关分析开发统计最优性结果；（ii）为各种核学习程序开发计算与统计权衡分析，使用近似方案，如Nystrom方法、随机特征及其变化，加快这些程序；（iii）开发具有具体数学保证的新方法，使用核方法对函数和概率分布进行学习和推理，并应用于函数数据分析；（iv）使用多尺度核推广核方法，以获得类小波表示，并研究其在各种学习过程中的统计和计算行为。总体而言，该项目将开发一个全面的数学理论，用于计算效率高的基于核的学习算法，并应用于统计学习。该奖项反映了美国国家科学基金会的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。