Sparsity, thresholding and regularization in data science

数据科学中的稀疏性、阈值化和正则化

基本信息

  • 批准号:
    RGPIN-2022-04531
  • 负责人:
  • 金额:
    $ 1.38万
  • 依托单位:
  • 依托单位国家:
    加拿大
  • 项目类别:
    Discovery Grants Program - Individual
  • 财政年份:
    2022
  • 资助国家:
    加拿大
  • 起止时间:
    2022-01-01 至 2023-12-31
  • 项目状态:
    已结题

项目摘要

Data science has brought a breakthrough in the way decisions are made in real life problems such as fraud detection, healthcare, targeted advertising, website recommendations, speech recognition, among others. Practical implementations of classic and new statistical earning techniques are the common denominator of such advances. However, most of such applications are still misunderstood by its creators, and solutions are mainly implemented out of trial and error. One of the most useful techniques in machine learning is regularization. It helps to cope with overfitting problems but also impose structures in the solution of the optimization algorithms. Thresholding estimators are a particular set of regularization estimators that impose sparse structure in the solution. Sparsity assumes that only a few covariates compose the model to explain a given response. For instance, just a few genes are relevant to explain a given disease. Moreover, sparsity can give interpretability or physical meaning to the result. The objective of this proposal is to develop theory and innovative methodologies for solving and understanding machine and statistical learning models by using sparse regularization and thresholding estimators. The proposal consists of the following three lines of research. First, high dimensional data routinely arise in econometrics, machine learning, neuroscience, and social science. I will extend my previous work in thresholding estimators to other methodologies for high dimensional data. I will be interested in applications involving categorical data for social sciences. Second, I am interested in the prediction of the risk of indirectly transmitted diseases. This can be seen as a high dimensional tomographic inverse problem. The objective is to perform an epidemiologic tomography of a region by reconstructing the areas of high and low disease risk using non-invasive measurements such as GPS animal movements, by imposing a sparse total variation spatial structure. The resulting methodology will be implemented in a full data science framework. Finally, I propose to use thresholding estimators to impose sparse structures into Deep Learning methodologies. Current deep autoencoders tend to force the architecture of a neural network. I propose to impose thresholding regularizers to jointly estimate the network architecture. I also propose a new dropout framework based on L1 regularization. Instead of randomly dropping units, I propose to perform a random selection on the regularization parameter. These methodologies involving sparsity might lead to produce better interpretation of the methods and might facilitate the derivation of mathematical properties. The success of this research program will have great contribution to the understanding of high dimensional data, machine learning, and big data, and will prompt the applications of interpretable sparse regularization in many fields in the natural sciences, social sciences, and engineering.
在现实生活中的欺诈检测、医疗保健、定向广告、网站推荐、语音识别等问题上,数据科学带来了决策方式的突破。传统的和新的统计盈利技术的实际实施是这些进步的共同点。然而,大多数这样的应用程序仍然被其创建者误解,解决方案主要是在试错中实现的。机器学习中最有用的技术之一是正则化。它有助于处理过拟合问题,但也在优化算法的求解中强加了结构。阈值估计器是一组特殊的正则化估计器,它将稀疏结构强加于解。稀疏性假设只有几个协变量组成模型来解释给定的反应。例如,只有几个基因与解释一种特定疾病有关。此外,稀疏性可以赋予结果可解释性或物理意义。这项提议的目标是发展理论和创新方法,通过使用稀疏正则化和阈值估计器来求解和理解机器和统计学习模型。该提案包括以下三个方面的研究。首先,高维数据经常出现在计量经济学、机器学习、神经科学和社会科学中。我将把我之前在阈值估计器方面的工作扩展到高维数据的其他方法。我会对社会科学中涉及分类数据的应用感兴趣。第二,我对间接传播疾病风险的预测感兴趣。这可以看作是一个高维的层析反问题。其目的是通过施加稀疏全变差空间结构,使用诸如GPS动物活动等非侵入性测量来重建疾病高风险和低风险区域,从而执行区域的流行病学断层扫描。由此产生的方法将在一个完整的数据科学框架中实施。最后,我建议使用阈值估计器将稀疏结构应用到深度学习方法中。目前的深度自动编码器倾向于强制神经网络的体系结构。我建议采用阈值正则化方法来共同评估网络体系结构。提出了一种新的基于L1正则化的丢弃框架。我建议对正则化参数进行随机选择,而不是随机丢弃单元。这些涉及稀疏性的方法可能会导致对方法的更好解释,并可能有助于数学性质的推导。该研究项目的成功将为理解高维数据、机器学习和大数据做出巨大贡献,并将推动可解释稀疏正则化在自然科学、社会科学和工程等多个领域的应用。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

DiazRodriguez, Jairo其他文献

DiazRodriguez, Jairo的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('DiazRodriguez, Jairo', 18)}}的其他基金

Sparsity, thresholding and regularization in data science
数据科学中的稀疏性、阈值化和正则化
  • 批准号:
    DGECR-2022-00453
  • 财政年份:
    2022
  • 资助金额:
    $ 1.38万
  • 项目类别:
    Discovery Launch Supplement

相似海外基金

Sparsity, thresholding and regularization in data science
数据科学中的稀疏性、阈值化和正则化
  • 批准号:
    DGECR-2022-00453
  • 财政年份:
    2022
  • 资助金额:
    $ 1.38万
  • 项目类别:
    Discovery Launch Supplement
Model selection criterion for bridge estimator in sparse learning
稀疏学习中桥估计器的模型选择准则
  • 批准号:
    21K12048
  • 财政年份:
    2021
  • 资助金额:
    $ 1.38万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
On model selection criteria under shrinkage estimation in greedy learning
贪心学习中收缩估计下的模型选择标准
  • 批准号:
    18K11433
  • 财政年份:
    2018
  • 资助金额:
    $ 1.38万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Study on Cyber-attack Detection based on Automatic Extraction of Multi-dimensional Behavior Modes
基于多维行为模式自动提取的网络攻击检测研究
  • 批准号:
    18K11295
  • 财政年份:
    2018
  • 资助金额:
    $ 1.38万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Adaptive Thresholding for Hierarchical Clustering of Variables, with Connections to Scan Statistics
用于变量分层聚类的自适应阈值,并连接到扫描统计数据
  • 批准号:
    1613202
  • 财政年份:
    2016
  • 资助金额:
    $ 1.38万
  • 项目类别:
    Continuing Grant
Adaptive thresholding for subchondral bone in high-resolution peripheral computed tomography
高分辨率外周计算机断层扫描中软骨下骨的自适应阈值处理
  • 批准号:
    496568-2016
  • 财政年份:
    2016
  • 资助金额:
    $ 1.38万
  • 项目类别:
    University Undergraduate Student Research Awards
BENIGN-MALIGNANT LESION DIFFERENTIATION USING FUNCTIONAL ADC-THRESHOLDING
使用功能性 ADC 阈值区分良恶性病变
  • 批准号:
    8362919
  • 财政年份:
    2011
  • 资助金额:
    $ 1.38万
  • 项目类别:
Research on model selection of multi-layer perceptron
多层感知器模型选择研究
  • 批准号:
    21500215
  • 财政年份:
    2009
  • 资助金额:
    $ 1.38万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Fully Nonparametric Models for Random Effects, Order Thresholding, Boostrap Testing, and Applications
用于随机效应、阶次阈值、Boostrap 测试和应用的完全非参数模型
  • 批准号:
    0805598
  • 财政年份:
    2008
  • 资助金额:
    $ 1.38万
  • 项目类别:
    Standard Grant
Practical Algorithms for Common Subgraph Problems with Thresholding
带有阈值的常见子图问题的实用算法
  • 批准号:
    317203-2006
  • 财政年份:
    2006
  • 资助金额:
    $ 1.38万
  • 项目类别:
    Postgraduate Scholarships - Master's
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了