Incremental Regression Analysis of Streaming Data: Estimating Function Theory and Applications

流数据的增量回归分析:估计函数理论及应用

基本信息

项目摘要

The advent of distributed data storage and parallel computing systems such as the Apache Spark has provided opportunities of innovation in data analytics and modeling. This project focuses on regression analysis of streaming data under the Spark's Lambda architecture, aiming to develop a new toolbox of Big Data analytics. Streaming data refers to a series of data batches that arrives sequentially. Such data collection schemes have become abundant lately in biomedical fields due to the booming of many AI-enhanced medical devices that are designed to monitor safety and effectiveness of medical treatments delivered by smart personalized products, or to measure real-time physiological variables such as heart beats, body temperature, and physical activity. This so-called deep phenotyping technology has significantly changed the way of information acquisition in terms of both volume and velocity. Being the most important data analytics, the regression analysis will be rebuilt in the proposed project to address various challenges from the processing of streaming data. The resulting methodology may be applied to many practical fields, where incremental learning with data streams is of primary interest. The overarching goal of this project is to develop an incremental statistical inference to address methodological challenges in regression analysis with streaming data stored in the Spark's Lambda architecture. Efficient incremental methodology requires no use of any historic raw data, rather only historic summary statistics and a newly arrived data batch. At the completion of this project the PI expects to make the following new contributions: (i) To develop a new theory of renewable estimation and incremental inference in the context of estimating functions; (ii) to develop an expansion of speed data flow architecture, called the Rho architecture, in which a new layer is added to carry over updates of inference-related quantities such as the Fisher information; (iii) to apply the proposed methodology in many important regression models, such as the generalized linear models, the generalized estimating equations (GEE), the Cox proportional hazards model, and the quantile regression model. Both python and R packages will be delivered from this project to the public.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
分布式数据存储和并行计算系统(如Apache Spark)的出现为数据分析和建模方面的创新提供了机会。该项目专注于Spark Lambda架构下的流数据回归分析,旨在开发一个新的大数据分析工具箱。流数据是指按顺序到达的一系列数据批。最近,在生物医学领域,这种数据收集方案非常丰富,因为许多人工智能增强的医疗设备正在蓬勃发展,这些医疗设备旨在监测智能个性化产品提供的医疗服务的安全性和有效性,或测量心跳、体温、身体活动等实时生理变量。这种所谓的深度表现型技术在数量和速度上都极大地改变了信息获取的方式。作为最重要的数据分析,回归分析将在建议的项目中进行重建,以解决来自流数据处理的各种挑战。由此产生的方法可以应用于许多实际领域,其中数据流的增量学习是主要兴趣。该项目的首要目标是开发一个增量统计推断,以解决存储在Spark的Lambda架构中的流数据的回归分析中的方法论挑战。有效的增量方法不需要使用任何历史原始数据,而只需要使用历史汇总统计数据和新到达的数据批。在这个项目完成后,PI期望作出以下新的贡献:(i)在估计函数的背景下发展可再生估计和增量推理的新理论;(ii)开发一种扩展速度数据流架构,称为Rho架构,其中增加了一个新层,以传递与推理相关的量的更新,例如Fisher信息;(iii)将提出的方法应用于许多重要的回归模型,如广义线性模型、广义估计方程(GEE)、Cox比例风险模型和分位数回归模型。python和R包都将从这个项目中向公众发布。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Renewable estimation and incremental inference in generalized linear models with streaming data sets
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Peter Song其他文献

Behaviour of wrinkled thin-walled steel pipes subjected to displacement-controlled axial cyclic loads
  • DOI:
    10.1016/j.tws.2021.108269
  • 发表时间:
    2021-11-01
  • 期刊:
  • 影响因子:
  • 作者:
    Habeeb Sobanke;Sreekanta Das;Peter Song;Nader Yoosef-Ghodsi
  • 通讯作者:
    Nader Yoosef-Ghodsi
Impact of Vendor Computerized Physician Order Entry in Community Hospitals
供应商计算机化医生医嘱输入对社区医院的影响
  • DOI:
  • 发表时间:
    2012
  • 期刊:
  • 影响因子:
    5.7
  • 作者:
    Alexander A. Leung;Carol A. Keohane;M. Amato;S. Simon;Michael Coffey;Nathan Kaufman;Bismarck Cadet;G. Schiff;E. Zimlichman;D. Seger;Catherine S. Yoon;Peter Song;D. Bates
  • 通讯作者:
    D. Bates
360 Pre-Dialysis Fluid Status Is an Important Predictor of Renal Recovery in Patients with Acute Kidney Injury Requiring Renal Replacement Therapy
  • DOI:
    10.1053/j.ajkd.2011.02.363
  • 发表时间:
    2011-04-01
  • 期刊:
  • 影响因子:
  • 作者:
    Dawn Wolfgram;Mallika Kommareddi;Peter Song;Michael Heung
  • 通讯作者:
    Michael Heung
Quantifying Uncertainty in Classification Performance: ROC Confidence Bands Using Conformal Prediction
量化分类性能的不确定性:使用保形预测的 ROC 置信带
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Zheshi Zheng;Bo Yang;Peter Song
  • 通讯作者:
    Peter Song
Prenatal Diet in Relation to Sleep Health of Offspring During Adolescence: Evidence From the ELEMENT Study
  • DOI:
    10.1093/cdn/nzab046_130
  • 发表时间:
    2021-06-01
  • 期刊:
  • 影响因子:
  • 作者:
    Astrid Zamora;Karen Peterson;Martha Maria Téllez-Rojo;Alejandra Cantoral;Peter Song;Maritsa Solano-González;Adriana Mercado-García;Erica Fossee;Erica Jansen
  • 通讯作者:
    Erica Jansen

Peter Song的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Peter Song', 18)}}的其他基金

Homogeneity Pursuit in Regression Analysis: Statistical Theory, Integer Optimization, and Algorithms
回归分析中的同质性追求:统计理论、整数优化和算法
  • 批准号:
    2113564
  • 财政年份:
    2021
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Regression Analysis of Networked Data: Estimating Function Theory and Applications
网络数据的回归分析:估计函数理论及其应用
  • 批准号:
    1513595
  • 财政年份:
    2015
  • 资助金额:
    $ 15万
  • 项目类别:
    Continuing Grant
Composite Estimating Function Approaches to GeoCopula Models for Complex Spatially Correlated Data
复杂空间相关数据的 GeoCopula 模型的复合估计函数方法
  • 批准号:
    1208939
  • 财政年份:
    2012
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Development of Composite Likelihood Method in High-Dimensional Correlated Data Analysis: Estimation, Inference and Model Selection
高维相关数据分析中复合似然法的发展:估计、推理和模型选择
  • 批准号:
    0904177
  • 财政年份:
    2009
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant

相似海外基金

Collaborative Research: Multiple Hypothesis Testing on the Regression Analysis
合作研究:回归分析的多重假设检验
  • 批准号:
    2311216
  • 财政年份:
    2023
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Collaborative Research: Multiple Hypothesis Testing on the Regression Analysis
合作研究:回归分析的多重假设检验
  • 批准号:
    2311215
  • 财政年份:
    2023
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Scalable Bayesian regression: Analytical and numerical tools for efficient Bayesian analysis in the large data regime
可扩展贝叶斯回归:在大数据领域进行高效贝叶斯分析的分析和数值工具
  • 批准号:
    2311354
  • 财政年份:
    2023
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Optimal and Robust Designs for Active Learning and Regression Analysis
主动学习和回归分析的最佳稳健设计
  • 批准号:
    RGPIN-2020-05283
  • 财政年份:
    2022
  • 资助金额:
    $ 15万
  • 项目类别:
    Discovery Grants Program - Individual
Leveraging Observability Via Tracing For Software Regression Detection and Root Cause Analysis
通过跟踪利用可观察性进行软件回归检测和根本原因分析
  • 批准号:
    572127-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 15万
  • 项目类别:
    Alliance Grants
Dimension Reduction and Data Visualization for Regression Analysis of Metric-Space-Valued Data
用于度量空间值数据回归分析的降维和数据可视化
  • 批准号:
    2210775
  • 财政年份:
    2022
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Multicollinearity Analysis and Variable/Model Selection in Regression
回归中的多重共线性分析和变量/模型选择
  • 批准号:
    21K01431
  • 财政年份:
    2021
  • 资助金额:
    $ 15万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Antibiotic Prescribing in Patients with COVID-19: A Rapid Review with Meta-Analysis and Meta-Regression
COVID-19 患者的抗生素处方:荟萃分析和荟萃回归的快速回顾
  • 批准号:
    466530
  • 财政年份:
    2021
  • 资助金额:
    $ 15万
  • 项目类别:
    Studentship Programs
Homogeneity Pursuit in Regression Analysis: Statistical Theory, Integer Optimization, and Algorithms
回归分析中的同质性追求:统计理论、整数优化和算法
  • 批准号:
    2113564
  • 财政年份:
    2021
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Optimal and Robust Designs for Active Learning and Regression Analysis
主动学习和回归分析的最佳稳健设计
  • 批准号:
    RGPIN-2020-05283
  • 财政年份:
    2021
  • 资助金额:
    $ 15万
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了