StatScale: Statistical Scalability for Streaming Data
StatScale:流数据的统计可扩展性
基本信息
- 批准号:EP/N031938/1
- 负责人:
- 金额:$ 350.52万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2016
- 资助国家:英国
- 起止时间:2016 至 无数据
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
We live in the age of data. Technology is transforming our ability to collect and store data on unprecedented scales. From the use of Oyster card data to improve London's transport network, to the Square Kilometre Array astrophysics project that has the potential to transform our understanding of the universe, Big Data can inform and enrich many aspects of our lives. Due to the widespread use of sensor-based systems in everyday life, with even smartphones having sensors that can monitor location and activity level, much of the explosion of data is in the form of data streams: data from one or more related sources that arrive over time. It has even been estimates that there will be over 30 billion devices collecting data streams by 2020. The important role of Statistics within "Big Data" and data streams has been clear for some time. However the current tendency has been to focus purely on algorithmic scalability, such as how to develop versions of existing statistical algorithms that scale better with the amount of data. Such an approach, however, ignores the fact that fundamentally new issues often arise when dealing with data sets of this magnitude, and highly innovative solutions are required. Model error is one such issue. Many statistical approaches are based on the use of mathematical models for data. These models are only approximations of the real data-generating mechanisms. In traditional applications, this model error is usually small compared with the inherent sampling variability of the data, and can be overlooked. However, there is an increasing realisation that model error can dominate in Big Data applications. Understanding the impact of model error, and developing robust methods that have excellent statistical properties even in the presence of model error, are major challenges. A second issue is that many current statistical approaches are not computationally feasible for Big Data. In practice we will often need to use less efficient statistical methods that are computationally faster, or require less computer memory. This introduces a statistical-computational trade-off that is unique to Big Data, leading to many open theoretical questions, and important practical problems.The strategic vision for this programme grant is to investigate and develop an integrated approach to tackling these and other fundamental statistical challenges. In order to do this we will focus in particular on analysing data streams. An important issue with this type of data is detecting changes in the structure of the data over time. This will be an early area of focus for the programme, as it has been identified as one of seven key problem areas for Big Data. Moreover it is an area in which our research will lead to practically important breakthroughs. Our philosophy is to tackle methodological, theoretical and computational aspects of these statistical problems together, an approach that is only possible through the programme grant scheme. Such a broad perspective is essential to achieve the substantive fundamental advances in statistics envisaged, and to ensure our new methods are sufficiently robust and efficient to be widely adopted by academics, industry and society more generally.
我们生活在数据时代。技术正在改变我们以前所未有的规模收集和存储数据的能力。从使用牡蛎卡数据来改善伦敦的交通网络,到有可能改变我们对宇宙的理解的平方公里阵列天体物理学项目,大数据可以为我们的生活提供信息并丰富我们生活的许多方面。由于基于传感器的系统在日常生活中的广泛使用,甚至智能手机也具有可以监控位置和活动水平的传感器,因此大部分数据爆炸都是以数据流的形式出现的:来自一个或多个相关来源的数据随着时间的推移而到达。据估计,到2020年将有超过300亿台设备收集数据流。一段时间以来,统计在“大数据”和数据流中的重要作用已经很明显。然而,目前的趋势是纯粹关注算法的可扩展性,例如如何开发现有统计算法的版本,以便更好地扩展数据量。然而,这种方法忽略了这样一个事实,即在处理这种规模的数据集时,经常会出现根本性的新问题,需要高度创新的解决方案。模型误差就是这样一个问题。许多统计方法都是基于对数据使用数学模型。这些模型只是真实的数据生成机制的近似值。在传统的应用中,这种模型误差与数据固有的抽样变异性相比通常很小,可以忽略不计。然而,越来越多的人意识到,模型误差可能在大数据应用中占主导地位。了解模型误差的影响,并开发出强大的方法,即使在模型误差的存在下也具有出色的统计特性,这是一个重大的挑战。第二个问题是,目前的许多统计方法在计算上对大数据不可行。在实践中,我们经常需要使用效率较低的统计方法,这些方法在计算上更快,或者需要更少的计算机内存。这引入了大数据所特有的计算-计算权衡,导致许多开放的理论问题和重要的实际问题。该计划赠款的战略愿景是调查和开发一种综合方法来应对这些和其他基本的统计挑战。为了做到这一点,我们将特别关注分析数据流。这类数据的一个重要问题是检测数据结构随时间的变化。这将是该计划的早期重点领域,因为它已被确定为大数据的七个关键问题领域之一。此外,这是一个我们的研究将导致实际重要突破的领域。我们的理念是共同解决这些统计问题的方法、理论和计算方面,这种方法只有通过方案赠款计划才能实现。这种广泛的视角对于实现所设想的统计方面的实质性根本性进展至关重要,并确保我们的新方法足够强大和有效,以便被学术界,工业界和社会广泛采用。
项目成果
期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Most recent changepoint detection in Panel data
面板数据中最新的变化点检测
- DOI:10.48550/arxiv.1609.06805
- 发表时间:2016
- 期刊:
- 影响因子:0
- 作者:Bardwell Lawrence
- 通讯作者:Bardwell Lawrence
Local continuity of log-concave projection, with applications to estimation under model misspecification
对数凹投影的局部连续性,及其在模型错误指定下的估计中的应用
- DOI:10.3150/20-bej1316
- 发表时间:2021
- 期刊:
- 影响因子:1.5
- 作者:Barber, Rina Foygel;Samworth, Richard J.
- 通讯作者:Samworth, Richard J.
High dimensional efficiency with applications to change point tests
- DOI:10.1214/18-ejs1442
- 发表时间:2018
- 期刊:
- 影响因子:1.1
- 作者:J. Aston;C. Kirch
- 通讯作者:J. Aston;C. Kirch
Online non-parametric changepoint detection with application to monitoring operational performance of network devices
- DOI:10.1016/j.csda.2022.107551
- 发表时间:2022-07
- 期刊:
- 影响因子:0
- 作者:Edward P. Austin;Gaetano Romano;I. Eckley;P. Fearnhead
- 通讯作者:Edward P. Austin;Gaetano Romano;I. Eckley;P. Fearnhead
Semiparametric detection of changepoints in location, scale, and copula
- DOI:10.1002/sam.11622
- 发表时间:2023-04
- 期刊:
- 影响因子:0
- 作者:Gaurav Agarwal;I. Eckley;P. Fearnhead
- 通讯作者:Gaurav Agarwal;I. Eckley;P. Fearnhead
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Idris Eckley其他文献
Idris Eckley的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Idris Eckley', 18)}}的其他基金
Statistical Foundations for Detecting Anomalous Structure in Stream Settings (DASS)
检测流设置中的异常结构的统计基础 (DASS)
- 批准号:
EP/Z531327/1 - 财政年份:2024
- 资助金额:
$ 350.52万 - 项目类别:
Research Grant
Locally stationary Energy Time Series (LETS)
局部固定能量时间序列 (LETS)
- 批准号:
EP/I016368/1 - 财政年份:2011
- 资助金额:
$ 350.52万 - 项目类别:
Research Grant
相似海外基金
A statistical decision theory of cognitive capacity
认知能力的统计决策理论
- 批准号:
DP240101511 - 财政年份:2024
- 资助金额:
$ 350.52万 - 项目类别:
Discovery Projects
PriorCircuit:Circuit mechanisms for computing and exploiting statistical structures in sensory decision making
PriorCircuit:在感官决策中计算和利用统计结构的电路机制
- 批准号:
EP/Z000599/1 - 财政年份:2024
- 资助金额:
$ 350.52万 - 项目类别:
Research Grant
Statistical Foundations for Detecting Anomalous Structure in Stream Settings (DASS)
检测流设置中的异常结构的统计基础 (DASS)
- 批准号:
EP/Z531327/1 - 财政年份:2024
- 资助金额:
$ 350.52万 - 项目类别:
Research Grant
CAREER: Statistical Power Analysis and Optimal Sample Size Planning for Longitudinal Studies in STEM Education
职业:STEM 教育纵向研究的统计功效分析和最佳样本量规划
- 批准号:
2339353 - 财政年份:2024
- 资助金额:
$ 350.52万 - 项目类别:
Continuing Grant
Exploration of the Nonequilibrium Statistical Mechanics of Turbulent Collisionless Plasmas
湍流无碰撞等离子体的非平衡统计力学探索
- 批准号:
2409316 - 财政年份:2024
- 资助金额:
$ 350.52万 - 项目类别:
Continuing Grant
CAREER: Statistical foundations of particle tracking and trajectory inference
职业:粒子跟踪和轨迹推断的统计基础
- 批准号:
2339829 - 财政年份:2024
- 资助金额:
$ 350.52万 - 项目类别:
Continuing Grant
Conference: Emerging Statistical and Quantitative Issues in Genomic Research in Health Sciences
会议:健康科学基因组研究中新出现的统计和定量问题
- 批准号:
2342821 - 财政年份:2024
- 资助金额:
$ 350.52万 - 项目类别:
Standard Grant
CAREER: Next-Generation Methods for Statistical Integration of High-Dimensional Disparate Data Sources
职业:高维不同数据源统计集成的下一代方法
- 批准号:
2422478 - 财政年份:2024
- 资助金额:
$ 350.52万 - 项目类别:
Continuing Grant
Practical guidance on accessible statistical methods for different estimands in randomised trials
随机试验中不同估计值的可用统计方法的实用指南
- 批准号:
MR/Z503770/1 - 财政年份:2024
- 资助金额:
$ 350.52万 - 项目类别:
Research Grant
Modern statistical methods for clustering community ecology data
群落生态数据聚类的现代统计方法
- 批准号:
DP240100143 - 财政年份:2024
- 资助金额:
$ 350.52万 - 项目类别:
Discovery Projects