Collaborative Research: Leverage Subsampling for Regression and Dimension Reduction

协作研究:利用子采样进行回归和降维

基本信息

  • 批准号:
    1228246
  • 负责人:
  • 金额:
    $ 22.5万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2012
  • 资助国家:
    美国
  • 起止时间:
    2012-09-01 至 2016-08-31
  • 项目状态:
    已结题

项目摘要

As a result of rapid advances in information technology, massive datasets are being generated in all fields of science, engineering, social science, business, and government. Useful information is often extracted from these data through statistical model fitting, e.g., through regression models. These models are useful for describing relationships between predictor variables and a response variable. Given a set of n data elements and p predictors, p and/or n can be large in much modern massive data set applications. In these cases, conventional algorithms often face severe computational challenges. Subsampling of rows and/or columns of a data matrix have traditionally been employed as a heuristic to reduce the size of large data sets, thus enabling computations to run more quickly. Recently, however, an innovative sampling methodology that uses the empirical statistical leverage scores of the data matrix as a nonuniform importance sampling distribution has been proposed. This has been applied to the ordinary least squares (OLS) problem and other related problems, and this leverage-based nonuniform sampling procedure gives a very good approximation to the OLS based on full data (when p is small and n is large) more rapidly than traditional methods, both in worst-case theory and in high-quality numerical implementations. As of yet, however, the statistical properties of these algorithms are unexplored. Understanding these properties is of interest for both fundamental and very practical reasons; and the investigators' work addresses these problems. The investigators consider both statistical theory as well as the evaluation of that theory with high-quality numerical implementations on large real-world data. This research proposal consists of two related research thrusts, both of which center around the common goal of an integrated treatment of statistical and computational issues. The first research thrust focuses on studying the statistical properties of the subsampling estimation using the statistical leverage scores in linear regression. The second research thrust generalizes the theory and methods to nonlinear regression and dimension reduction models. The proposed theory and methods serve as an inspiration for new ideas to push statistical methodology development forward. The research provides new insight into the existing algorithms, produces innovative methodologies for analyzing large-scale data, inspires new lines of quantitative investigations in interdisciplinary research and offers a unique educational experience.
由于信息技术的快速发展,在科学、工程、社会科学、商业和政府的各个领域都产生了海量的数据集。通常通过统计模型拟合,例如通过回归模型,从这些数据中提取有用的信息。这些模型对于描述预测变量和响应变量之间的关系很有用。在给定n个数据元素和p个预测器的集合的情况下,p和/或n在许多现代海量数据集应用中可能很大。在这些情况下,传统算法通常面临严峻的计算挑战。传统上,数据矩阵的行和/或列的二次采样被用作减少大型数据集的大小的启发式方法,从而使计算能够更快地运行。然而,最近提出了一种创新的抽样方法,该方法使用数据矩阵的经验统计杠杆分数作为非均匀重要性抽样分布。这已经被应用于普通最小二乘(OLS)问题和其他相关问题,并且这种基于杠杆的非均匀抽样过程在最坏情况理论和高质量的数值实现中都比传统方法更快地逼近基于全数据(当p小而n大)的普通最小二乘问题。然而,到目前为止,这些算法的统计特性还没有得到探索。理解这些性质是有意义的,既有基本的,也有非常实际的原因;调查人员的工作解决了这些问题。研究人员既考虑了统计理论,也考虑了对该理论的评估,并对大量真实世界的数据进行了高质量的数值实施。这项研究提案包括两个相关的研究推进,两者都围绕综合处理统计和计算问题这一共同目标展开。第一个研究重点是利用线性回归中的统计杠杆分数来研究欠抽样估计的统计性质。第二个研究重点是将理论和方法推广到非线性回归和降维模型。所提出的理论和方法为推动统计方法论发展提供了新的思路。这项研究为现有算法提供了新的见解,为分析大规模数据提供了创新的方法,激发了跨学科研究中定量研究的新思路,并提供了独特的教育体验。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Bin Yu其他文献

Does ceruloplasmin differential express in the brain of Ts65Dn: a mouse mode of Down syndrome?
铜蓝蛋白在唐氏综合症小鼠模型 Ts65Dn 的大脑中是否存在差异表达?
  • DOI:
  • 发表时间:
    2014
  • 期刊:
  • 影响因子:
    3.3
  • 作者:
    Bin Yu;Jing Kong;Bao;Ziqi Zhu;Bin Zhang;Qiu;S. Shao
  • 通讯作者:
    S. Shao
A PILOT STUDY IN AN APPLICATION OF TEXT MINING TO LEARNING SYSTEM EVALUATION by NITSAWAN KATERATTANAKUL
文本挖掘在学习系统评估中的应用试点研究,作者:NITSAWAN KATERATTANAKUL
  • DOI:
  • 发表时间:
    2010
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Bin Yu
  • 通讯作者:
    Bin Yu
Lamellar gel containing emulsions as an effective carrier for stabilization and transdermal delivery of retinyl propionate
含有乳液的层状凝胶作为丙酸视黄酯的稳定和透皮递送的有效载体
Verifiable Visual Cryptography Based on Iterative Algorithm: Verifiable Visual Cryptography Based on Iterative Algorithm
基于迭代算法的可验证视觉密码:基于迭代算法的可验证视觉密码
  • DOI:
    10.3724/sp.j.1146.2010.00270
  • 发表时间:
    2011
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Bin Yu;Jin;Liguo Fang
  • 通讯作者:
    Liguo Fang
Loc680254 regulates Schwann cell proliferation through Psrc1 and Ska1 as a microRNA sponge following sciatic nerve injury
Loc680254 在坐骨神经损伤后作为 microRNA 海绵通过 Psrc1 和 Ska1 调节雪旺细胞增殖
  • DOI:
    10.1002/glia.24045
  • 发表时间:
    2021-06
  • 期刊:
  • 影响因子:
    6.2
  • 作者:
    Chun Yao;Qihui Wang;Yaxian Wang;Jiancheng Wu;Xuemin Cao;Yan Lu;Yanping Chen;Wei Feng;Xiaosong Gu;Xin‐Peng Dun;Bin Yu
  • 通讯作者:
    Bin Yu

Bin Yu的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Bin Yu', 18)}}的其他基金

Advancing Theory and Methodology for Tree-Based Algorithms in High Dimensions
推进高维树基算法的理论和方法
  • 批准号:
    2209975
  • 财政年份:
    2022
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant
Understanding Complexity and the Bias-Variance Tradeoff in High Dimensions: Theory and Data Evidence
理解高维度的复杂性和偏差-方差权衡:理论和数据证据
  • 批准号:
    2015341
  • 财政年份:
    2020
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant
Parallel Ensemble Learning and Feature Interaction Discovery: High Volume Dynamic Data
并行集成学习和特征交互发现:大量动态数据
  • 批准号:
    1953191
  • 财政年份:
    2020
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant
Understand the functional mechanism of the DSP1 complex in the 3' end maturation of plant small nuclear RNAs
了解DSP1复合物在植物核小RNA 3端成熟中的功能机制
  • 批准号:
    1818082
  • 财政年份:
    2018
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant
BIGDATA: F: Scalable and Interpretable Machine Learning: Bridging Mechanistic and Data-Driven Modeling in the Biological Sciences
BIGDATA:F:可扩展和可解释的机器学习:桥接生物科学中的机械和数据驱动建模
  • 批准号:
    1741340
  • 财政年份:
    2017
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant
Canonical Linear Methods and Hierarchical Non-Linear Methods in High-Dimensional Statistics
高维统计中的规范线性方法和分层非线性方法
  • 批准号:
    1613002
  • 财政年份:
    2016
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Continuing Grant
Smart Nanofabrication via Rational Assembly of Two-Dimensional Heterosystems
通过二维异质系统的合理组装实现智能纳米制造
  • 批准号:
    1434689
  • 财政年份:
    2014
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant
Direct Self-Assembly of Large Area, High Crystallinity 2D Graphene on Insulator: An Integratable Carbon Platform
绝缘体上大面积、高结晶度二维石墨烯的直接自组装:可集成的碳平台
  • 批准号:
    1162312
  • 财政年份:
    2012
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant
Understanding DAWDLE Function in miRNA and siRNA Biogenesis
了解 DAWDLE 在 miRNA 和 siRNA 生物发生中的功能
  • 批准号:
    1121193
  • 财政年份:
    2011
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Continuing Grant
Ultra-Low-Power Complementary Logic with On-Chip Directly Assembled, Highly Adaptive 2-D Graphitic Platform
超低功耗互补逻辑,具有片上直接组装、高度自适应的 2D 图形平台
  • 批准号:
    1002228
  • 财政年份:
    2010
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant

相似国自然基金

Research on Quantum Field Theory without a Lagrangian Description
  • 批准号:
    24ZR1403900
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
Cell Research
  • 批准号:
    31224802
  • 批准年份:
    2012
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research
  • 批准号:
    31024804
  • 批准年份:
    2010
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research (细胞研究)
  • 批准号:
    30824808
  • 批准年份:
    2008
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
  • 批准号:
    10774081
  • 批准年份:
    2007
  • 资助金额:
    45.0 万元
  • 项目类别:
    面上项目

相似海外基金

Research on tools to support remote collaborative learning of maker activities that leverage both software and hardware
研究支持利用软件和硬件的创客活动远程协作学习的工具
  • 批准号:
    22KJ1010
  • 财政年份:
    2023
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Grant-in-Aid for JSPS Fellows
The Theory of Satoumi and Empirical Research Focusing on Leverage for Sustainability Transformation in Socio-ecological Systems
里海理论和实证研究重点关注社会生态系统可持续性转型的杠杆作用
  • 批准号:
    23H03609
  • 财政年份:
    2023
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Research on digital drawing to leverage consumers' design ideas in the fashion market
研究数字绘图以利用时尚市场中消费者的设计理念
  • 批准号:
    21K12545
  • 财政年份:
    2021
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Research and development of Open Citation in Japanese scholarly publications and its leverage in information infrastructure for humanities
日本学术出版物开放引用的研究与发展及其在人文信息基础设施中的作用
  • 批准号:
    20K20132
  • 财政年份:
    2020
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Excellence in Research: An HBCU Partnership to Leverage Resources and Increase Scientific Productivity in Observational Time Domain Astrophysics
卓越的研究:HBCU 合作,利用资源并提高观测时域天体物理学的科学生产力
  • 批准号:
    1901296
  • 财政年份:
    2019
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant
Research and Curriculum Development to Leverage University Student Conceptual Resources for Understanding Physics
利用大学生概念资源理解物理的研究和课程开发
  • 批准号:
    1914572
  • 财政年份:
    2019
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant
Research and Curriculum Development to Leverage University Student Conceptual Resources for Understanding Physics
利用大学生概念资源理解物理的研究和课程开发
  • 批准号:
    1914603
  • 财政年份:
    2019
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant
International Network-to-Network (N2N) Stakeholder Collaboration Workshop: Solutions to Accelerate Research, Leverage Resources, and Maximize Synergies
国际网络到网络 (N2N) 利益相关者合作研讨会:加速研究、利用资源和最大化协同效应的解决方案
  • 批准号:
    1842111
  • 财政年份:
    2018
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant
RCN: ENSEMBLE (Enabling Neuroscience in Species Models that Broadly Leverage Evolution): A research coordination network advancing strategic development, community building and inn
RCN:ENSEMBLE(广泛利用进化的物种模型中的神经科学):一个研究协调网络,推进战略发展、社区建设和旅馆
  • 批准号:
    1638400
  • 财政年份:
    2018
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Standard Grant
Using semantics to leverage health and research Big Data
使用语义来利用健康和研究大数据
  • 批准号:
    MR/S003703/1
  • 财政年份:
    2018
  • 资助金额:
    $ 22.5万
  • 项目类别:
    Fellowship
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了