BIGDATA: Collaborative Research: F: IA: Statistical Learning for Big Data with Random Projections
BIGDATA:协作研究:F:IA:随机投影大数据的统计学习
基本信息
- 批准号:1546087
- 负责人:
- 金额:$ 16.12万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2015
- 资助国家:美国
- 起止时间:2015-09-01 至 2019-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Contemporary data-driven science and engineering problems require the development of statistical methods that do not compromise statistical accuracy, yet are computationally feasible. Data quality, particularly the heterogeneity in data measurements, is a critical factor that affects statistical accuracy in the analysis of large datasets. This project will explore and demonstrate the impact and feasibility of improving computational and statistical performances simultaneously for Big Data problems with massive datasets. The research will advance the state of knowledge in predictive statistical learning with Big Data, and be extremely valuable in applications related to financial risk management or commercial operations employing recommender systems, biology, and image analysis. A key phenomenon motivating this project is the notion that some refined ensemble methods combined with random projections can simultaneously enable the fast analysis of massive data while enhancing statistical performance. Specifically, the aims of the project are: (1) Develop new classification methods based on random projections and the random forest. By defining appropriate projections, the proposed method is shown to improve statistical accuracy for massive datasets with a large number of irrelevant noisy measurements. The theoretical properties of this method will be analyzed, and an adaptive version of the algorithm developed to optimize the computational and statistical efficiency gains; (2) Propose boosting algorithms with random projections. The statistical properties, practical performance, and implementation of the proposed random projected boosting algorithms will be investigated; (3) Develop classification methods with heterogeneities. A classification method that involves the weighted bootstrap and ensemble learning to handle heterogeneity or covariate shifts in measurements in large datasets will be developed. The random projection method will be applied to improve the proposed method for high-dimensional datasets.
当代数据驱动的科学和工程问题需要开发不影响统计准确性但计算可行的统计方法。 数据质量,特别是数据测量的异质性,是影响大型数据集分析中统计准确性的关键因素。 该项目将探索和展示同时改善具有大量数据集的大数据问题的计算和统计性能的影响和可行性。 该研究将推进大数据预测统计学习的知识状态,并在与金融风险管理或采用推荐系统,生物学和图像分析的商业运营相关的应用中极具价值。 激发该项目的一个关键现象是,一些精细的集成方法与随机投影相结合,可以同时实现对海量数据的快速分析,同时提高统计性能。 具体而言,该项目的目标是:(1)开发基于随机投影和随机森林的新分类方法。 通过定义适当的投影,所提出的方法被证明可以提高大量数据集的统计精度与大量的不相关的噪声测量。 分析了该方法的理论性质,并提出了一种自适应算法,以优化计算和统计效率;(2)提出了随机投影的Boosting算法。 研究随机投影提升算法的统计特性、实际性能和实现方法;(3)发展具有异质性的分类方法。 将开发一种分类方法,该方法涉及加权自举和集成学习,以处理大型数据集中测量值的异质性或协变量变化。 随机投影方法将被用来改进所提出的方法,高维数据集。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Cheng Yong Tang其他文献
Optimal covariance matrix estimation for high-dimensional noise in high-frequency data
高频数据中高维噪声的最优协方差矩阵估计
- DOI:
10.1016/j.jeconom.2022.06.010 - 发表时间:
2018-12 - 期刊:
- 影响因子:6.3
- 作者:
Jinyuan Chang;Qiao Hu;Cheng Liu;Cheng Yong Tang - 通讯作者:
Cheng Yong Tang
Disentangling and assessing uncertainties in multiperiod corporate default risk predictions
理清并评估多时期企业违约风险预测中的不确定性
- DOI:
10.1214/18-aoas1170 - 发表时间:
2018-04 - 期刊:
- 影响因子:1.8
- 作者:
Miao Yuan;Cheng Yong Tang;Yili Hong;Jian Yang - 通讯作者:
Jian Yang
DISCRETE LONGITUDINAL DATA MODELING WITH A MEAN-CORRELATION REGRESSION APPROACH
使用均值相关回归方法进行离散纵向数据建模
- DOI:
10.5705/ss.202016.0435 - 发表时间:
2019 - 期刊:
- 影响因子:1.4
- 作者:
Cheng Yong Tang;Weiping Zhang;Chenlei Leng - 通讯作者:
Chenlei Leng
Properties of Census Dual System Population Size Estimators
人口普查双系统人口规模估算器的属性
- DOI:
10.1111/j.1751-5823.2011.00150.x - 发表时间:
2011 - 期刊:
- 影响因子:0
- 作者:
Song Xi Chen;Cheng Yong Tang - 通讯作者:
Cheng Yong Tang
A new p-value based multiple testing procedure for generalized linear models
- DOI:
10.1007/s11222-025-10600-2 - 发表时间:
2025-03-16 - 期刊:
- 影响因子:1.600
- 作者:
Joseph Rilling;Cheng Yong Tang - 通讯作者:
Cheng Yong Tang
Cheng Yong Tang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Cheng Yong Tang', 18)}}的其他基金
Exploring Joint Modeling Approaches for Longitudinal Data: Parsimonious Correlation Modeling and Discrete Observations
探索纵向数据的联合建模方法:简约相关建模和离散观测
- 批准号:
1533956 - 财政年份:2015
- 资助金额:
$ 16.12万 - 项目类别:
Standard Grant
相似海外基金
BIGDATA: IA: Collaborative Research: Asynchronous Distributed Machine Learning Framework for Multi-Site Collaborative Brain Big Data Mining
BIGDATA:IA:协作研究:用于多站点协作大脑大数据挖掘的异步分布式机器学习框架
- 批准号:
2348159 - 财政年份:2023
- 资助金额:
$ 16.12万 - 项目类别:
Standard Grant
BIGDATA: IA: Collaborative Research: Intelligent Solutions for Navigating Big Data from the Arctic and Antarctic
BIGDATA:IA:协作研究:导航北极和南极大数据的智能解决方案
- 批准号:
2308649 - 财政年份:2022
- 资助金额:
$ 16.12万 - 项目类别:
Standard Grant
BIGDATA: Collaborative Research: F: Holistic Optimization of Data-Driven Applications
BIGDATA:协作研究:F:数据驱动应用程序的整体优化
- 批准号:
2027516 - 财政年份:2020
- 资助金额:
$ 16.12万 - 项目类别:
Standard Grant
BIGDATA: F: Collaborative Research: Practical Analysis of Large-Scale Data with Lyme Disease Case Study
BIGDATA:F:协作研究:莱姆病案例研究大规模数据的实际分析
- 批准号:
1934319 - 财政年份:2019
- 资助金额:
$ 16.12万 - 项目类别:
Standard Grant
BIGDATA: IA: Collaborative Research: Protecting Yourself from Wildfire Smoke: Big Data-Driven Adaptive Air Quality Prediction Methodologies
大数据:IA:协作研究:保护自己免受野火烟雾的侵害:大数据驱动的自适应空气质量预测方法
- 批准号:
1838022 - 财政年份:2019
- 资助金额:
$ 16.12万 - 项目类别:
Standard Grant
BIGDATA: F: Collaborative Research: Foundations of Responsible Data Management
大数据:F:协作研究:负责任的数据管理的基础
- 批准号:
1926250 - 财政年份:2019
- 资助金额:
$ 16.12万 - 项目类别:
Standard Grant
BIGDATA: IA: Collaborative Research: Intelligent Solutions for Navigating Big Data from the Arctic and Antarctic
BIGDATA:IA:协作研究:导航北极和南极大数据的智能解决方案
- 批准号:
1947584 - 财政年份:2019
- 资助金额:
$ 16.12万 - 项目类别:
Standard Grant
BIGDATA: IA: Collaborative Research: Asynchronous Distributed Machine Learning Framework for Multi-Site Collaborative Brain Big Data Mining
BIGDATA:IA:协作研究:用于多站点协作大脑大数据挖掘的异步分布式机器学习框架
- 批准号:
1837964 - 财政年份:2019
- 资助金额:
$ 16.12万 - 项目类别:
Standard Grant
BIGDATA: F: Collaborative Research: Optimizing Log-Structured-Merge-Based Big Data Management Systems
BIGDATA:F:协作研究:优化基于日志结构合并的大数据管理系统
- 批准号:
1838222 - 财政年份:2019
- 资助金额:
$ 16.12万 - 项目类别:
Standard Grant
BIGDATA: F: Collaborative Research: Optimizing Log-Structured-Merge-Based Big Data Management Systems
BIGDATA:F:协作研究:优化基于日志结构合并的大数据管理系统
- 批准号:
1838248 - 财政年份:2019
- 资助金额:
$ 16.12万 - 项目类别:
Standard Grant