Statistical inference for distributed datasets

分布式数据集的统计推断

基本信息

  • 批准号:
    RGPIN-2016-06296
  • 负责人:
  • 金额:
    $ 1.09万
  • 依托单位:
  • 依托单位国家:
    加拿大
  • 项目类别:
    Discovery Grants Program - Individual
  • 财政年份:
    2020
  • 资助国家:
    加拿大
  • 起止时间:
    2020-01-01 至 2021-12-31
  • 项目状态:
    已结题

项目摘要

Technology allows us to store and describe massive amounts of data. These data are often about humans, keeping a personal trace of individual lives, allowing others to know a customer better than himself, and providing a special insight into social dynamics. The era of big data is here, and such massive amounts of information cannot be stored on a personal computer, but rather require clusters of machines that are interconnected. Such large scale setups are typically based on distributed file system (such as Hadoop) because no single drive could possibly store that much information. With “distributed data”, a single processor is unable to access the whole dataset, but instead, a large number of processors each have access to only a part of the data. Statistical analyses can extract knowledge from big data, but most statistical models were designed for smaller datasets, assuming that the whole data was available from one computer. This assumption does not hold for large-scale samples and only a few statistical methods are straightforward to adapt. For the vast majority of statistical tools, unless one is willing to analyse only a randomly selected fraction of a massive dataset, new innovative solutions are required. The main objective of this research program is to adapt statistical tools to the reality of distributed data. Many statistical procedures are based on the estimated law of a random variable, but in a distributed environment, the communication between the different computing nodes is a scarce resource. Sharing all of the data is not an option and a compromise must be struck between precision and communication costs. Building an estimate for the law of a variable of interest is a fundamental challenge and this research program proposes a number of strategies to do so within the constraints of a distributed architecture. Once the properties of the proposed estimates are well established, they will be used to generalize statistical procedures as diverse as maximum likelihood estimation, goodness-of-fit tests and resampling procedures including the bootstrap. Moving things one step further, strategies are also proposed to infer the multivariate dependence structure of the data, the infamous copula. By developing statistical methodology for distributed (big) data, this research project will provide tools that are highly needed in all sciences just as much as in industrial and business applications who all share the need to extract knowledge from data.
技术使我们能够存储和描述海量数据。这些数据通常是关于人类的,保留了个人生活的个人痕迹,让其他人比自己更了解客户,并提供了对社会动态的特殊洞察。大数据时代已经到来,如此海量的信息不能存储在个人电脑上,而是需要互联的机器集群。这种大规模的设置通常基于分布式文件系统(如Hadoop),因为没有单个驱动器可以存储那么多信息。对于分布式数据,单个处理器无法访问整个数据集,相反,大量处理器各自只能访问部分数据。 统计分析可以从大数据中提取知识,但大多数统计模型都是为较小的数据集设计的,假设所有数据都可以从一台计算机上获得。这一假设不适用于大规模样本,只有几种统计方法可以直接适用。对于绝大多数统计工具来说,除非有人愿意只分析海量数据集中随机选择的一小部分,否则就需要新的创新解决方案。 这项研究计划的主要目标是使统计工具适应分布式数据的现实。许多统计过程都是基于随机变量的估计定律,但在分布式环境中,不同计算节点之间的通信是一种稀缺资源。共享所有数据不是一种选择,必须在精度和通信成本之间达成妥协。构建兴趣变量定律的估计是一项基本的挑战,该研究计划提出了许多策略来在分布式体系结构的限制下做到这一点。一旦拟议估计的性质得到很好的确定,它们将被用于推广各种统计程序,如最大似然估计、拟合优度检验和包括Bootstrap在内的重新抽样程序。更进一步,还提出了推断数据的多变量相关性结构的策略,即臭名昭著的Copula。 通过开发分布式(大数据)的统计方法,该研究项目将提供所有科学都非常需要的工具,就像在工业和商业应用中一样,后者都有从数据中提取知识的需求。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Plante, JeanFrançois其他文献

Plante, JeanFrançois的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Plante, JeanFrançois', 18)}}的其他基金

Statistical inference for distributed datasets
分布式数据集的统计推断
  • 批准号:
    RGPIN-2016-06296
  • 财政年份:
    2021
  • 资助金额:
    $ 1.09万
  • 项目类别:
    Discovery Grants Program - Individual
Statistical inference for distributed datasets
分布式数据集的统计推断
  • 批准号:
    RGPIN-2016-06296
  • 财政年份:
    2019
  • 资助金额:
    $ 1.09万
  • 项目类别:
    Discovery Grants Program - Individual
Statistical inference for distributed datasets
分布式数据集的统计推断
  • 批准号:
    RGPIN-2016-06296
  • 财政年份:
    2018
  • 资助金额:
    $ 1.09万
  • 项目类别:
    Discovery Grants Program - Individual
Optimal booking window assessment under an any airline scenario
任何航空公司场景下的最佳预订窗口评估
  • 批准号:
    528814-2018
  • 财政年份:
    2018
  • 资助金额:
    $ 1.09万
  • 项目类别:
    Engage Grants Program
Atelier de maillage industriel en sciences des données
邮件工业与科学科学工作室
  • 批准号:
    522562-2017
  • 财政年份:
    2017
  • 资助金额:
    $ 1.09万
  • 项目类别:
    Connect Grants Level 2
Statistical inference for distributed datasets
分布式数据集的统计推断
  • 批准号:
    RGPIN-2016-06296
  • 财政年份:
    2017
  • 资助金额:
    $ 1.09万
  • 项目类别:
    Discovery Grants Program - Individual
Statistical inference for distributed datasets
分布式数据集的统计推断
  • 批准号:
    RGPIN-2016-06296
  • 财政年份:
    2016
  • 资助金额:
    $ 1.09万
  • 项目类别:
    Discovery Grants Program - Individual
Weighted likelihood and other weighted methods in statistics
统计学中的加权似然和其他加权方法
  • 批准号:
    385813-2010
  • 财政年份:
    2015
  • 资助金额:
    $ 1.09万
  • 项目类别:
    Discovery Grants Program - Individual
Weighted likelihood and other weighted methods in statistics
统计学中的加权似然和其他加权方法
  • 批准号:
    385813-2010
  • 财政年份:
    2014
  • 资助金额:
    $ 1.09万
  • 项目类别:
    Discovery Grants Program - Individual
Stratégies pour la collecte des données de capteur et l'analyse d'événements rares dans la conception de modèles prédictifs pour la maintenance.
收集捕获者的策略和分析维护预测模型概念中的稀有事件。
  • 批准号:
    470835-2014
  • 财政年份:
    2014
  • 资助金额:
    $ 1.09万
  • 项目类别:
    Engage Grants Program

相似海外基金

CAREER: Efficient Large Language Model Inference Through Codesign: Adaptable Software Partitioning and FPGA-based Distributed Hardware
职业:通过协同设计进行高效的大型语言模型推理:适应性软件分区和基于 FPGA 的分布式硬件
  • 批准号:
    2339084
  • 财政年份:
    2024
  • 资助金额:
    $ 1.09万
  • 项目类别:
    Continuing Grant
Distributed Machine Learning Methodology and System for Real-time Inference with Large-scale Point Clouds Towards Mobility Innovation
利用大规模点云进行实时推理的分布式机器学习方法和系统,迈向移动创新
  • 批准号:
    23H00464
  • 财政年份:
    2023
  • 资助金额:
    $ 1.09万
  • 项目类别:
    Grant-in-Aid for Scientific Research (A)
Collaborative Research: CIF: Small: A New Paradigm for Distributed Information Processing, Simulation and Inference in Networks: The Promise of Law of Small Numbers
合作研究:CIF:小:网络中分布式信息处理、模拟和推理的新范式:小数定律的承诺
  • 批准号:
    2241057
  • 财政年份:
    2022
  • 资助金额:
    $ 1.09万
  • 项目类别:
    Standard Grant
CAREER: Distributed Inference-Making via Crowdsensing
职业:通过群体感知进行分布式推理
  • 批准号:
    2302197
  • 财政年份:
    2022
  • 资助金额:
    $ 1.09万
  • 项目类别:
    Continuing Grant
Collaborative Research: CIF: Small: A New Paradigm for Distributed Information Processing, Simulation and Inference in Networks: The Promise of Law of Small Numbers
合作研究:CIF:小:网络中分布式信息处理、模拟和推理的新范式:小数定律的承诺
  • 批准号:
    2132815
  • 财政年份:
    2021
  • 资助金额:
    $ 1.09万
  • 项目类别:
    Standard Grant
Collaborative Research: Aggregated Monte Carlo: A General Framework for Distributed Bayesian Inference in Massive Spatiotemporal Data
合作研究:聚合蒙特卡罗:海量时空数据中分布式贝叶斯推理的通用框架
  • 批准号:
    2220840
  • 财政年份:
    2021
  • 资助金额:
    $ 1.09万
  • 项目类别:
    Standard Grant
Collaborative Research: CIF: Small: A New Paradigm for Distributed Information Processing, Simulation and Inference in Networks: The Promise of Law of Small Numbers
合作研究:CIF:小:网络中分布式信息处理、模拟和推理的新范式:小数定律的承诺
  • 批准号:
    2132843
  • 财政年份:
    2021
  • 资助金额:
    $ 1.09万
  • 项目类别:
    Standard Grant
CAREER: Distributed Inference-Making via Crowdsensing
职业:通过群体感知进行分布式推理
  • 批准号:
    2047701
  • 财政年份:
    2021
  • 资助金额:
    $ 1.09万
  • 项目类别:
    Continuing Grant
Statistical inference for distributed datasets
分布式数据集的统计推断
  • 批准号:
    RGPIN-2016-06296
  • 财政年份:
    2021
  • 资助金额:
    $ 1.09万
  • 项目类别:
    Discovery Grants Program - Individual
Efficient Distributed DNN Training and Inference
高效的分布式 DNN 训练和推理
  • 批准号:
    543833-2019
  • 财政年份:
    2021
  • 资助金额:
    $ 1.09万
  • 项目类别:
    Collaborative Research and Development Grants
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了