On the Feasibility of Distributed Statistical Learning for Big Data
论大数据分布式统计学习的可行性
基本信息
- 批准号:RGPIN-2016-05024
- 负责人:
- 金额:$ 1.31万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2019
- 资助国家:加拿大
- 起止时间:2019-01-01 至 2020-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Collecting data with unprecedented sizes and complexities is now feasible in many scientific fields. Big data have strategic value only when they are effectively utilized to obtain useful information. Developing efficient tools to manage and process big data has been a recent hotspot in statistics and related disciplines.***Due to their huge volume, big data can be rarely stored and processed on a single machine. Therefore, a divide-and-conquer scheme is often used for computational convenience. In such a strategy, a full dataset is split into and processed in several manageable segments; the final output is then aggregated from the segmental sub-outputs. Despite its practical popularity, this distributive framework lacks a solid theoretical foundation. Its performance can vary in different scenarios. Thus, it is necessary to refine the method and provide the associated theoretical support. In this proposal, I plan to systematically investigate the distributive method for a few major statistical learning tasks. Specifically, I have the following three objectives. 1) I will specify the conditions and establish the consistency of simple distributive methods for regression, classification, and ranking purposes. This objective will provide a basic theoretical understanding of the distributive framework, so that better guidance can be provided for its application. 2) Based on distributed optimization, I will design new learning procedures that improve the simple distributive method. The new methods will enhance the communication between individual machines. Thus, they have great potential to make the overall procedure more efficient and reliable. 3) I will apply the developed distributive methods to real-world datasets. This objective will provide a software package for the new methods and address the implementation issues raised in practice. One potential application is improving the prediction system for hospital resources needed by Canadian inpatients. This application will be based on 2015-2020 Discharge Abstract Database available from Canadian Institute for Health Information. ***With the above research objectives, this program will build a theoretical foundation and develop efficient implementation procedures for distributive statistical learning. These results and products will be among the key techniques for processing big data. The proposed research objectives will be viewed and investigated jointly from the perspectives of statistics, machine learning, optimization, and approximation theory. All these make this program a novel and promising exploration in the emerging field of big data. The diversified topics in this program are well-suited for training highly qualified personals at both doctoral and master's levels. The scientists trained by this program are urgently needed for analyzing big data in industry, research institutes, and government agencies.**
收集空前规模和复杂性的数据现在在许多科学领域都是可行的。只有有效利用大数据获取有用信息,大数据才具有战略价值。开发有效的工具来管理和处理大数据已经成为统计学和相关学科的一个热点。***由于其庞大的容量,大数据很少可以在一台机器上存储和处理。因此,为了计算方便,通常使用分治方案。在这种策略中,一个完整的数据集被分割成几个可管理的部分并进行处理;然后从分段子输出汇总最终输出。尽管它在实践中很受欢迎,但这种分布式框架缺乏坚实的理论基础。它的性能在不同的场景中会有所不同。因此,有必要完善该方法并提供相关的理论支持。在这个提案中,我打算系统地研究几个主要统计学习任务的分布方法。具体来说,我有以下三个目标。1)我将为回归、分类和排序指定条件并建立简单分布方法的一致性。这一目标将提供对分布式框架的基本理论认识,以便为其应用提供更好的指导。2)基于分布式优化,我将设计新的学习程序,改进简单的分布式方法。新方法将增强单个机器之间的通信。因此,它们有很大的潜力使整个程序更加高效和可靠。3)我将把开发的分布方法应用于现实世界的数据集。该目标将为新方法提供软件包,并解决实践中提出的实施问题。一个潜在的应用是改进加拿大住院病人所需医院资源的预测系统。本申请将基于加拿大卫生信息研究所提供的2015-2020年出院摘要数据库。***基于以上研究目标,本项目将为分布式统计学习建立理论基础并开发有效的实施程序。这些结果和产品将成为处理大数据的关键技术之一。提出的研究目标将从统计学、机器学习、优化和近似理论的角度进行观察和调查。所有这些都使该项目成为大数据新兴领域的一个新颖而有前景的探索。该课程的多元化主题非常适合培养博士和硕士水平的高素质人才。该项目培养的科学家是工业、科研院所和政府机构急需的大数据分析人才
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Xu, Chen其他文献
Interfacial Assembly of Ti(3) C(2) T(x) /ZnIn(2) S(4) Heterojunction for High-Performance Photodetectors.
- DOI:
10.1002/advs.202204687 - 发表时间:
2022-12 - 期刊:
- 影响因子:15.1
- 作者:
Hou, Shuping;Xu, Chen;Ju, Xingkai;Jin, Yongdong - 通讯作者:
Jin, Yongdong
Continuous and highly accurate multi-material extrusion-based bioprinting with optical coherence tomography imaging.
- DOI:
10.18063/ijb.707 - 发表时间:
2023 - 期刊:
- 影响因子:8.4
- 作者:
Wang, Jin;Xu, Chen;Yang, Shanshan;Wang, Ling;Xu, Mingen - 通讯作者:
Xu, Mingen
In Situ Growth of Graphene Catalyzed by a Phase-Change Material at 400 °C for Wafer-Scale Optoelectronic Device Application
400°C 相变材料催化石墨烯原位生长用于晶圆级光电器件应用
- DOI:
10.1002/smll.202206738 - 发表时间:
2023-01-02 - 期刊:
- 影响因子:13.3
- 作者:
Hu, Liangchen;Dong, Yibo;Xu, Chen - 通讯作者:
Xu, Chen
Nagasaki sediments reveal that long-term fate of plutonium is controlled by select organic matter moieties
- DOI:
10.1016/j.scitotenv.2019.04.375 - 发表时间:
2019-08-15 - 期刊:
- 影响因子:9.8
- 作者:
Lin, Peng;Xu, Chen;Santschi, Peter H. - 通讯作者:
Santschi, Peter H.
Uncoupling and turnover in a Cl-/H+ exchange transporter.
- DOI:
10.1085/jgp.200709756 - 发表时间:
2007-04 - 期刊:
- 影响因子:3.8
- 作者:
Walden, Michael;Accardi, Alessio;Wu, Fang;Xu, Chen;Williams, Carole;Miller, Christopher - 通讯作者:
Miller, Christopher
Xu, Chen的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Xu, Chen', 18)}}的其他基金
On the Feasibility of Distributed Statistical Learning for Big Data
论大数据分布式统计学习的可行性
- 批准号:
RGPIN-2016-05024 - 财政年份:2021
- 资助金额:
$ 1.31万 - 项目类别:
Discovery Grants Program - Individual
On the Feasibility of Distributed Statistical Learning for Big Data
论大数据分布式统计学习的可行性
- 批准号:
RGPIN-2016-05024 - 财政年份:2020
- 资助金额:
$ 1.31万 - 项目类别:
Discovery Grants Program - Individual
On the Feasibility of Distributed Statistical Learning for Big Data
论大数据分布式统计学习的可行性
- 批准号:
RGPIN-2016-05024 - 财政年份:2018
- 资助金额:
$ 1.31万 - 项目类别:
Discovery Grants Program - Individual
On the Feasibility of Distributed Statistical Learning for Big Data
论大数据分布式统计学习的可行性
- 批准号:
RGPIN-2016-05024 - 财政年份:2017
- 资助金额:
$ 1.31万 - 项目类别:
Discovery Grants Program - Individual
On the Feasibility of Distributed Statistical Learning for Big Data
论大数据分布式统计学习的可行性
- 批准号:
RGPIN-2016-05024 - 财政年份:2016
- 资助金额:
$ 1.31万 - 项目类别:
Discovery Grants Program - Individual
Mathematical modelling for health policy
卫生政策的数学模型
- 批准号:
368836-2008 - 财政年份:2008
- 资助金额:
$ 1.31万 - 项目类别:
University Undergraduate Student Research Awards
相似国自然基金
Graphon mean field games with partial observation and application to failure detection in distributed systems
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
相似海外基金
Statistical learning algorithms for high-dimensional non-normally distributed data
高维非正态分布数据的统计学习算法
- 批准号:
RGPIN-2018-06787 - 财政年份:2022
- 资助金额:
$ 1.31万 - 项目类别:
Discovery Grants Program - Individual
Developing Statistical Tools and Visualization Methods for Understanding Heterogeneity in Distributed Networks: Applications to COVID-19 and Diabetes
开发统计工具和可视化方法来理解分布式网络中的异质性:在 COVID-19 和糖尿病中的应用
- 批准号:
468555 - 财政年份:2022
- 资助金额:
$ 1.31万 - 项目类别:
Operating Grants
Statistical learning algorithms for high-dimensional non-normally distributed data
高维非正态分布数据的统计学习算法
- 批准号:
RGPIN-2018-06787 - 财政年份:2021
- 资助金额:
$ 1.31万 - 项目类别:
Discovery Grants Program - Individual
Statistical inference for distributed datasets
分布式数据集的统计推断
- 批准号:
RGPIN-2016-06296 - 财政年份:2021
- 资助金额:
$ 1.31万 - 项目类别:
Discovery Grants Program - Individual
On the Feasibility of Distributed Statistical Learning for Big Data
论大数据分布式统计学习的可行性
- 批准号:
RGPIN-2016-05024 - 财政年份:2021
- 资助金额:
$ 1.31万 - 项目类别:
Discovery Grants Program - Individual
On the Feasibility of Distributed Statistical Learning for Big Data
论大数据分布式统计学习的可行性
- 批准号:
RGPIN-2016-05024 - 财政年份:2020
- 资助金额:
$ 1.31万 - 项目类别:
Discovery Grants Program - Individual
Statistical learning algorithms for high-dimensional non-normally distributed data
高维非正态分布数据的统计学习算法
- 批准号:
RGPIN-2018-06787 - 财政年份:2020
- 资助金额:
$ 1.31万 - 项目类别:
Discovery Grants Program - Individual
Statistical inference for distributed datasets
分布式数据集的统计推断
- 批准号:
RGPIN-2016-06296 - 财政年份:2020
- 资助金额:
$ 1.31万 - 项目类别:
Discovery Grants Program - Individual
Statistical learning algorithms for high-dimensional non-normally distributed data
高维非正态分布数据的统计学习算法
- 批准号:
RGPIN-2018-06787 - 财政年份:2019
- 资助金额:
$ 1.31万 - 项目类别:
Discovery Grants Program - Individual
Statistical inference for distributed datasets
分布式数据集的统计推断
- 批准号:
RGPIN-2016-06296 - 财政年份:2019
- 资助金额:
$ 1.31万 - 项目类别:
Discovery Grants Program - Individual