High-Dimensional Challenges in Statistical Machine Learning: Theory, Models and Algorithms

统计机器学习的高维挑战:理论、模型和算法

基本信息

  • 批准号:
    0605165
  • 负责人:
  • 金额:
    $ 45万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2006
  • 资助国家:
    美国
  • 起止时间:
    2006-08-15 至 2010-07-31
  • 项目状态:
    已结题

项目摘要

TECHNICAL SUMMARY: This research proposal consists of four closely related research thrusts, all centered around the common goal of an integrated treatment of statistical and computational issues in dealing with high-dimensional data sets arising in information technology (IT). The first two research thrusts focus on fundamental issues that arise in the design of penalty-based and other algorithmic methods for regularization. Key open problems to be addressed include the link between regularization methods and sparsity, consistency and other theoretical issues, as well as structured regularization methods for model selection. Sparse models are desirable both for scientific reasons including interpretability, and for computational reasons, such as the efficiency of performing classification or regression. The third research thrust focuses on problems of statistical inference in decentralized settings, which are of increasing importance for a broad variety of IT applications such as wireless sensor networks, computer server ``farms'', traffic monitoring systems. Designing suitable data compression schemes is the key challenge. On one hand, these schemes should respect the decentralization requirements imposed by the system (e.g., due to limited power or bandwidth of communicating data); on the other hand, they should also be (near)-optimal with respect to a statistical criterion of merit (e.g., Bayes error for a classification task; MSE for a regression or smoothing problem). The fourth project addresses statistical issues centered around the use of Markov random fields, widely-used for modeling large collections of interacting random variables, and associated variational methods for approximating moments and likelihoods in such models.BROAD SUMMARY: The field of statistical machine learning is motivated by a broad range of problems in the information sciences, among them remote sensing, data mining and compression, and statistical signal processing. Its applications range from homeland security (e.g., detecting anomalous patterns in large data sets) to environmental monitoring and assessment (e.g., estimating changes in Arctic ice). A challenging aspect to such applications is that data sets tend to be complex, massive (frequently measured in terabytes of data), and rich in terms of possible features (hundreds of thousands to millions). These characteristics presents fundamental challenges in the design and application of statistical models and algorithms for testinghypotheses and performing estimation. Whereas classical statistical methods are designed separately from computational considerations, dealing effectively with extremely high-dimensional data sets requires that computational issues be addressed in a more integrated manner during the design and testing of statistical models; and that issues of over-fitting and regularization, while always statistically relevant, become of paramount importance.
技术概要:该研究提案包括四个密切相关的研究重点,都围绕着一个共同的目标,即在处理信息技术(IT)中出现的高维数据集时,综合处理统计和计算问题。 前两个研究重点集中在基于惩罚和其他算法的正则化方法的设计中出现的基本问题。 需要解决的关键开放问题包括正则化方法与稀疏性、一致性和其他理论问题之间的联系,以及用于模型选择的结构化正则化方法。 稀疏模型是理想的,既有科学原因,包括可解释性,也有计算原因,如执行分类或回归的效率。第三个研究重点集中在分散的设置,这是越来越重要的各种IT应用,如无线传感器网络,计算机服务器“农场”,交通监控系统的统计推断的问题。 设计合适的数据压缩方案是关键的挑战。一方面,这些计划应尊重系统所施加的权力下放要求(例如,由于通信数据的有限功率或带宽);另一方面,它们还应当相对于统计的优值标准是(接近)最优的(例如,分类任务的贝叶斯误差;回归或平滑问题的MSE)。 第四个项目解决了围绕马尔可夫随机场的使用的统计问题,广泛用于建模大量的相互作用的随机变量,以及相关的变分方法近似的时刻和这种模型中的可能性。统计机器学习领域受到信息科学中广泛问题的推动,其中包括遥感,数据挖掘和压缩,和统计信号处理。 其应用范围从国土安全(例如,检测大数据集中的异常模式)到环境监测和评估(例如,北极冰的变化)。 这些应用程序的一个挑战性方面是数据集往往是复杂的,庞大的(通常以TB的数据来衡量),并且在可能的特征方面丰富(数十万到数百万)。这些特点提出了基本的挑战,在设计和应用的统计模型和算法来测试hypotheses和执行估计。 虽然经典的统计方法是从计算考虑分开设计的,但有效处理极高维数据集需要在统计模型的设计和测试过程中以更综合的方式解决计算问题;而过度拟合和正则化问题虽然总是与统计相关,但变得至关重要。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Bin Yu其他文献

Images of China : An Empirical Study of Western Tourist Material
中国形象:西方旅游材料的实证研究
  • DOI:
  • 发表时间:
    2012
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Ying Sun;Bin Yu
  • 通讯作者:
    Bin Yu
Machine perfusion combined with antibiotics prevents donor‐derived infections caused by multidrug‐resistant bacteria
机器灌注联合抗生素预防多重耐药菌引起的供体源性感染
  • DOI:
    10.1111/ajt.17032
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    8.8
  • 作者:
    Han Liang;Peng Zhang;Bin Yu;Zhongzhong Liu;Li Pan;Xueyu He;Xiaoli Fan;Yanfeng Wang
  • 通讯作者:
    Yanfeng Wang
Nonparametric sparse hierarchical models describe V1 fMRI responses to natural images
非参数稀疏分层模型描述 V1 fMRI 对自然图像的响应
  • DOI:
  • 发表时间:
    2008
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Pradeep Ravikumar;Vincent Q. Vu;Bin Yu;Thomas Naselaris;Kendrick Norris Kay;J. Gallant
  • 通讯作者:
    J. Gallant
Scaling vortex breakdown mechanism based on viscous effect in shock cylindrical bubble interaction
激波圆柱气泡相互作用中基于粘性效应的尺度涡流破坏机制
  • DOI:
    10.1063/1.5051463
  • 发表时间:
    2018-12
  • 期刊:
  • 影响因子:
    4.6
  • 作者:
    Zi'ang Wang;Bin Yu;Hao Chen;Bin Zhang;Hong Liu
  • 通讯作者:
    Hong Liu
Readiness of as-built horizontal curved roads for LiDAR-based automated vehicles: A virtual simulation analysis
基于激光雷达的自动驾驶汽车的已建成水平弯曲道路的准备情况:虚拟仿真分析
  • DOI:
    10.1016/j.aap.2022.106762
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    5.9
  • 作者:
    Shuyi Wang;Yang Ma;Jinzhou Liu;Bin Yu;Feng Zhu
  • 通讯作者:
    Feng Zhu

Bin Yu的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Bin Yu', 18)}}的其他基金

Advancing Theory and Methodology for Tree-Based Algorithms in High Dimensions
推进高维树基算法的理论和方法
  • 批准号:
    2209975
  • 财政年份:
    2022
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Understanding Complexity and the Bias-Variance Tradeoff in High Dimensions: Theory and Data Evidence
理解高维度的复杂性和偏差-方差权衡:理论和数据证据
  • 批准号:
    2015341
  • 财政年份:
    2020
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Parallel Ensemble Learning and Feature Interaction Discovery: High Volume Dynamic Data
并行集成学习和特征交互发现:大量动态数据
  • 批准号:
    1953191
  • 财政年份:
    2020
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Understand the functional mechanism of the DSP1 complex in the 3' end maturation of plant small nuclear RNAs
了解DSP1复合物在植物核小RNA 3端成熟中的功能机制
  • 批准号:
    1818082
  • 财政年份:
    2018
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
BIGDATA: F: Scalable and Interpretable Machine Learning: Bridging Mechanistic and Data-Driven Modeling in the Biological Sciences
BIGDATA:F:可扩展和可解释的机器学习:桥接生物科学中的机械和数据驱动建模
  • 批准号:
    1741340
  • 财政年份:
    2017
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Canonical Linear Methods and Hierarchical Non-Linear Methods in High-Dimensional Statistics
高维统计中的规范线性方法和分层非线性方法
  • 批准号:
    1613002
  • 财政年份:
    2016
  • 资助金额:
    $ 45万
  • 项目类别:
    Continuing Grant
Smart Nanofabrication via Rational Assembly of Two-Dimensional Heterosystems
通过二维异质系统的合理组装实现智能纳米制造
  • 批准号:
    1434689
  • 财政年份:
    2014
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Collaborative Research: Leverage Subsampling for Regression and Dimension Reduction
协作研究:利用子采样进行回归和降维
  • 批准号:
    1228246
  • 财政年份:
    2012
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Direct Self-Assembly of Large Area, High Crystallinity 2D Graphene on Insulator: An Integratable Carbon Platform
绝缘体上大面积、高结晶度二维石墨烯的直接自组装:可集成的碳平台
  • 批准号:
    1162312
  • 财政年份:
    2012
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Understanding DAWDLE Function in miRNA and siRNA Biogenesis
了解 DAWDLE 在 miRNA 和 siRNA 生物发生中的功能
  • 批准号:
    1121193
  • 财政年份:
    2011
  • 资助金额:
    $ 45万
  • 项目类别:
    Continuing Grant

相似国自然基金

Supply Chain Collaboration in addressing Grand Challenges: Socio-Technical Perspective
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    外国青年学者研究基金项目

相似海外基金

CAREER: New Challenges in Statistical Genetics: Mendelian Randomization, Integrated Omics and General Methodology
职业:统计遗传学的新挑战:孟德尔随机化、综合组学和通用方法论
  • 批准号:
    2238656
  • 财政年份:
    2023
  • 资助金额:
    $ 45万
  • 项目类别:
    Continuing Grant
Resolving single-cell analysis challenges via data-driven decision frameworks and novel statistical methods
通过数据驱动的决策框架和新颖的统计方法解决单细胞分析挑战
  • 批准号:
    10707308
  • 财政年份:
    2022
  • 资助金额:
    $ 45万
  • 项目类别:
DMS-EPSRC Collaborative Research: Advancing Statistical Foundations and Frontiers from and for Emerging Astronomical Data Challenges
DMS-EPSRC 合作研究:推进统计基础和前沿,应对新出现的天文数据挑战
  • 批准号:
    EP/W015080/1
  • 财政年份:
    2022
  • 资助金额:
    $ 45万
  • 项目类别:
    Research Grant
Statistical Challenges and Methods in the Analysis of High Dimensional and Complex Structured Data
高维复杂结构化数据分析中的统计挑战和方法
  • 批准号:
    RGPIN-2018-05475
  • 财政年份:
    2022
  • 资助金额:
    $ 45万
  • 项目类别:
    Discovery Grants Program - Individual
Statistical solutions for the open challenges of integrated population models.
针对综合人口模型的开放挑战的统计解决方案。
  • 批准号:
    2753510
  • 财政年份:
    2022
  • 资助金额:
    $ 45万
  • 项目类别:
    Studentship
Development of Innovative Statistical Tools to Address Data-Analytic Challenges in Physics and Astronomy
开发创新统计工具来解决物理和天文学中的数据分析挑战
  • 批准号:
    RGPIN-2021-03985
  • 财政年份:
    2022
  • 资助金额:
    $ 45万
  • 项目类别:
    Discovery Grants Program - Individual
New Challenges in Statistical Inference with Regularized Optimal Transport
正则化最优传输统计推断的新挑战
  • 批准号:
    2210368
  • 财政年份:
    2022
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Development of Innovative Statistical Tools to Address Data-Analytic Challenges in Physics and Astronomy
开发创新统计工具来解决物理和天文学中的数据分析挑战
  • 批准号:
    RGPIN-2021-03985
  • 财政年份:
    2021
  • 资助金额:
    $ 45万
  • 项目类别:
    Discovery Grants Program - Individual
DMS-EPSRC Collaborative Research: Advancing Statistical Foundations and Frontiers for and from Emerging Astronomical Data Challenges
DMS-EPSRC 合作研究:为新出现的天文数据挑战推进统计基础和前沿
  • 批准号:
    2113605
  • 财政年份:
    2021
  • 资助金额:
    $ 45万
  • 项目类别:
    Standard Grant
Statistical Challenges and Methods in the Analysis of High Dimensional and Complex Structured Data
高维复杂结构化数据分析中的统计挑战和方法
  • 批准号:
    RGPIN-2018-05475
  • 财政年份:
    2021
  • 资助金额:
    $ 45万
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了