CAREER: Statistical Methods for Dimensionality Reduction in Machine Learning

职业:机器学习中降维的统计方法

基本信息

  • 批准号:
    0238323
  • 负责人:
  • 金额:
    $ 40万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing grant
  • 财政年份:
    2003
  • 资助国家:
    美国
  • 起止时间:
    2003-07-01 至 2006-11-30
  • 项目状态:
    已结题

项目摘要

This research addresses the problem of dimensionality reduction, discovering low dimensional structure hidden in high dimensional data. It arises in many fields of information processing, and poses a particular challenge to researchers attempting to build machines that emulate feats of human perception, such as recognizing faces and understanding speech. It also plays an increasingly prominent role in many applications of statistical and scientific computing. With the advent of widespread information technologies, it has become possible to collect and manipulate ever-increasing amounts of experimental data. Thus, scientists interested in the exploratory analysis and visualization of large multivariate data sets face similar challenges in information processing as our perceptual systems.This research focuses on two recently proposed algorithms for dimensionality reduction. The two algorithms address the "curse of dimensionality" as it arises in two different settings of machine learning: (1) unsupervised learning, where the dimensionality reduction is performed without any feedback from the learning environment, and (2) supervised learning, where the dimensionality reduction is performed with the benefit of labeled examples.The first algorithm to be studied is Locally Linear Embedding (LLE), an unsupervised learning algorithm that computes low dimensional, neighborhood preserving embeddings of high dimensional data. The data, assumed to lie on a nonlinear manifold, is mapped into a single global coordinate system of lower dimensionality. The mapping is derived from the symmetries of locally linear reconstructions, and the actual computation of the embedding reduces to a sparse eigenvalue problem. Notably, the optimizations in LLE (though capable of generating highly nonlinear embeddings) are simple to implement, and they do not involve local minima. LLE has applications to exploratory data analysis, scientific visualization, and computer vision.The second algorithm is Multiplicative Margin Maximization (M3), a supervised learning algorithm for nonnegative quadratic programming in support vector machines (SVMs). Support vector machines currently provide state-of-the-art solutions to many problems in machine learning, particularly those involving data sets of high dimensionality. Solving the quadratic programming problem in SVMs, however, remains a significant bottleneck in their implementation. The M3 algorithm is designed to alleviate this bottleneck. Its update rules have a simple closed form, and they converge monotonically to the solution of the maximum margin hyperplane. Moreover, they do not involve any heuristics such as choosing a learning rate or deciding which variables to update at each iteration. They optimize the traditionally proposed objective function for SVMs and can be applied to problems in classification, regression, and novelty detection.The algorithms to be studied in this research are easy to implement, but the problems they solve are quite complex. Compared to previous approaches, they are distinguished not only by their novel simplicity and well-behaved optimizations, but also by the unexpected connections they make to other areas in mathematics, computer science, and statistics. The work will not only develop the theoretical foundations of these algorithms, but also attempt to scale them up to increasingly large problems in machine learning.This CAREER award recognizes and supports the early career-development activities of a teacher-scholar who is likely to become an academic leader of the twenty-first century. The research is expected to have a broad impact across many areas of science and engineering, by overcoming the challenges posed by data sets of extremely high dimensionality. Software toolkits will be published, so that researchers everywhere will have access to state-of-the-art methods for dimensionality reduction. The educational innovations will include new undergraduate and graduate courses in artificial intelligence, machine learning, statistical computing, and sensory processing.
本研究解决了降维问题,发现隐藏在高维数据中的低维结构。它出现在信息处理的许多领域,并对试图制造模仿人类感知壮举的机器的研究人员提出了特别的挑战,例如识别面孔和理解语音。它在统计和科学计算的许多应用中也发挥着越来越突出的作用。随着信息技术的普及,收集和处理越来越多的实验数据已经成为可能。因此,对大型多元数据集的探索性分析和可视化感兴趣的科学家在信息处理方面面临着与我们的感知系统类似的挑战。本文主要研究了最近提出的两种降维算法。这两种算法解决了“维数诅咒”,因为它出现在两种不同的机器学习设置中:(1)无监督学习,其中在没有任何学习环境反馈的情况下执行降维;(2)监督学习,其中在标记示例的好处下执行降维。首先要研究的算法是局部线性嵌入(LLE),这是一种无监督学习算法,用于计算高维数据的低维,邻域保持嵌入。假设数据位于非线性流形上,将其映射到一个较低维数的单一全局坐标系中。映射是由局部线性重构的对称性导出的,并且嵌入的实际计算简化为一个稀疏特征值问题。值得注意的是,LLE中的优化(尽管能够生成高度非线性的嵌入)实现起来很简单,而且它们不涉及局部最小值。LLE应用于探索性数据分析、科学可视化和计算机视觉。第二种算法是乘法边际最大化(M3),这是一种用于支持向量机(svm)中非负二次规划的监督学习算法。支持向量机目前为机器学习中的许多问题提供了最先进的解决方案,特别是那些涉及高维数据集的问题。然而,二次规划问题的求解仍然是支持向量机实现中的一个重要瓶颈。M3算法就是为了缓解这个瓶颈而设计的。其更新规则具有简单的封闭形式,并单调收敛于最大边距超平面的解。此外,它们不涉及任何启发式方法,例如选择学习率或决定在每次迭代中更新哪些变量。它们优化了支持向量机的传统目标函数,可以应用于分类、回归和新颖性检测等问题。本研究所研究的算法易于实现,但所解决的问题却相当复杂。与以前的方法相比,它们的区别不仅在于其新颖的简单性和良好的优化,而且还在于它们与数学、计算机科学和统计学的其他领域建立了意想不到的联系。这项工作不仅将发展这些算法的理论基础,而且还将尝试将它们扩展到机器学习中越来越大的问题。该奖项旨在表彰和支持有可能成为21世纪学术领袖的教师学者的早期职业发展活动。通过克服极高维度数据集带来的挑战,这项研究有望在科学和工程的许多领域产生广泛的影响。软件工具包将被发布,这样世界各地的研究人员都可以使用最先进的降维方法。教育创新将包括人工智能、机器学习、统计计算和感官处理方面的新本科和研究生课程。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Lawrence Saul其他文献

Lawrence Saul的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Lawrence Saul', 18)}}的其他基金

Collaborative Research:EAGER:Deep Architectures for Speech and Audio Processing
合作研究:EAGER:语音和音频处理的深度架构
  • 批准号:
    0957560
  • 财政年份:
    2010
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
HCC-Small: Assistive Listening Devices and Voice Processing Platforms for the Deaf and Hard of Hearing
HCC-Small:针对聋哑人和听力障碍人士的助听设备和语音处理平台
  • 批准号:
    0812576
  • 财政年份:
    2008
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: Statistical Methods for Dimensionality Reduction in Machine Learning
职业:机器学习中降维的统计方法
  • 批准号:
    0650074
  • 财政年份:
    2006
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant

相似海外基金

CAREER: Next-Generation Methods for Statistical Integration of High-Dimensional Disparate Data Sources
职业:高维不同数据源统计集成的下一代方法
  • 批准号:
    2422478
  • 财政年份:
    2024
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: Statistical Inference in Observational Studies -- Theory, Methods, and Beyond
职业:观察研究中的统计推断——理论、方法及其他
  • 批准号:
    2338760
  • 财政年份:
    2024
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: Practical algorithms and high dimensional statistical methods for multimodal haplotype modelling
职业:多模态单倍型建模的实用算法和高维统计方法
  • 批准号:
    2239870
  • 财政年份:
    2023
  • 资助金额:
    $ 40万
  • 项目类别:
    Standard Grant
CAREER: Statistical Models and Parallel-computing Methods for Analyzing Sparse and Large Single-cell Chromatin Interaction Datasets
职业:用于分析稀疏和大型单细胞染色质相互作用数据集的统计模型和并行计算方法
  • 批准号:
    2239350
  • 财政年份:
    2023
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: Fast and Accurate Statistical Learning and Inference from Large-Scale Data: Theory, Methods, and Algorithms
职业:从大规模数据中快速准确地进行统计学习和推理:理论、方法和算法
  • 批准号:
    2046874
  • 财政年份:
    2021
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: Next-Generation Methods for Statistical Integration of High-Dimensional Disparate Data Sources
职业:高维不同数据源统计集成的下一代方法
  • 批准号:
    2044823
  • 财政年份:
    2021
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: Foundational statistical theory and methods for analyzing populations of attributed connectomes
职业:用于分析归因连接体群体的基础统计理论和方法
  • 批准号:
    1942963
  • 财政年份:
    2020
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: Statistical methods and algorithms for the analysis of combinatorial mass spectrometry data
职业:组合质谱数据分析的统计方法和算法
  • 批准号:
    1845465
  • 财政年份:
    2019
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: Computational and statistical methods for allele-specific chromatin structure analysis
职业:等位基因特异性染色质结构分析的计算和统计方法
  • 批准号:
    1751317
  • 财政年份:
    2018
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
CAREER: New Statistical Methods for Classification and Analysis of High Dimensional and Functional Data
职业:高维和功能数据分类和分析的新统计方法
  • 批准号:
    1812354
  • 财政年份:
    2017
  • 资助金额:
    $ 40万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了