BIGDATA: F: DKA: Collaborative Research: Clustering Algorithms for Data Streams

BIGDATA:F:DKA:协作研究:数据流的聚类算法

基本信息

  • 批准号:
    1447639
  • 负责人:
  • 金额:
    $ 100万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2014
  • 资助国家:
    美国
  • 起止时间:
    2014-09-01 至 2018-08-31
  • 项目状态:
    已结题

项目摘要

This project will develop novel theoretical methods and algorithms for clustering massive datasets with applications to astronomy, neuroscience and natural language processing. Clustering is the process of creating groups of data based on similarities between individual data points. The developed theoretical methods will be used in applications where clustering algorithms are critical and the input data is extremely large. First, new clustering algorithms will be designed to scale and will allow for better cosmological simulations. The simulations involve billions of particles in each snapshot, and existing clustering algorithms based upon a simple friends-of-friends approach do not scale to these cardinalities. Second, this project will advance the computational capabilities in statistical neuroscience by employing clustering algorithms to discover both regular patterns and anomalies in normal and abnormal brain graphs. Finally, this research will explore the important topic of finding anomalies in massive text streams, such as Twitter. In this setting, one is concerned with detecting anomalous bursts in traffic content that share a similar pattern. These bursts might signal an important political event or a natural disaster. This project will support undergraduate and graduate research aimed at developing skills needed for algorithmic work on massive data sets.There exist numerous heuristics and approximation algorithms for many variants of the clustering problem. However, these methods are often slow or infeasible for applications with massive datasets. This research will improve space and time upper bounds for clustering algorithms in the streaming model. This project will address the k-mean and k-median problems in the dynamic streaming model, extend the results on separable data when the input comes from Euclidian space, improve the bounds in the sliding window model, combine the coresets technique with novel sampling approaches and the method of smooth histograms. The PIs' previous work has already been applied to natural language processing and this project will expand this direction further and explore the important topic of "First Story Detection." Furthermore, this research will explore the similarities and differences between various sampling and sketching techniques, and how they could be used in large multidimensional astronomical databases, like SDSS (Sloan Digital Sky Survey) SkyServer. These novel approaches will provide major speedups for the execution of large statistical aggregate queries. The new streaming algorithms will be used to find substructure in very large cosmological N-body simulations. For further information see the project web site at: http://www.cs.jhu.edu/~vova
该项目将开发新的理论方法和算法,用于天文学、神经科学和自然语言处理领域的海量数据集聚类。聚类是基于单个数据点之间的相似性创建数据组的过程。所开发的理论方法将用于聚类算法至关重要和输入数据非常大的应用中。首先,新的聚类算法将被设计成可扩展的,并将允许更好的宇宙模拟。模拟涉及每个快照中的数十亿个粒子,现有的基于简单的朋友-朋友方法的聚类算法无法扩展到这些基数。其次,该项目将通过使用聚类算法来发现正常和异常脑图中的规则模式和异常,从而提高统计神经科学的计算能力。最后,本研究将探讨在大量文本流(如Twitter)中发现异常的重要主题。在此设置中,我们关注的是检测具有相似模式的流量内容中的异常突发。这些爆发可能预示着一个重要的政治事件或自然灾害。该项目将支持本科生和研究生的研究,旨在开发大规模数据集算法工作所需的技能。对于聚类问题的许多变体,存在许多启发式和近似算法。然而,这些方法对于具有大量数据集的应用程序通常很慢或不可行。本研究将改进流模型中聚类算法的空间和时间上界。本项目将解决动态流模型中的k-均值和k-中值问题,扩展输入来自欧几里德空间的可分离数据的结果,改进滑动窗口模型中的边界,将核心集技术与新颖的采样方法和平滑直方图方法相结合。pi先前的工作已经应用于自然语言处理,该项目将进一步扩展这一方向,探索“第一故事检测”这一重要课题。此外,本研究将探索各种采样和素描技术之间的异同,以及它们如何在大型多维天文数据库中使用,如SDSS(斯隆数字巡天)SkyServer。这些新颖的方法将大大加快大型统计聚合查询的执行速度。新的流算法将用于在非常大的宇宙n体模拟中寻找子结构。欲了解更多信息,请参阅该项目的网站:http://www.cs.jhu.edu/~vova

项目成果

期刊论文数量(12)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Streaming symmetric norms via measure concentration
Nearly Optimal Distinct Elements and Heavy Hitters on Sliding Windows
  • DOI:
    10.4230/lipics.approx-random.2018.7
  • 发表时间:
    2018-05
  • 期刊:
  • 影响因子:
    0
  • 作者:
    V. Braverman;Elena Grigorescu;Harry Lang;David P. Woodruff;Samson Zhou
  • 通讯作者:
    V. Braverman;Elena Grigorescu;Harry Lang;David P. Woodruff;Samson Zhou
Matrix Norms in Data Streams: Faster, Multi-Pass and Row-Order
  • DOI:
  • 发表时间:
    2016-09
  • 期刊:
  • 影响因子:
    0
  • 作者:
    V. Braverman;Stephen R. Chestnut;Robert Krauthgamer;Yi Li;David P. Woodruff;Lin F. Yang
  • 通讯作者:
    V. Braverman;Stephen R. Chestnut;Robert Krauthgamer;Yi Li;David P. Woodruff;Lin F. Yang
Approximate Convex Hull of Data Streams
  • DOI:
    10.4230/lipics.icalp.2018.21
  • 发表时间:
    2017-12
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Avrim Blum;V. Braverman;Ananya Kumar;Harry Lang;Lin F. Yang
  • 通讯作者:
    Avrim Blum;V. Braverman;Ananya Kumar;Harry Lang;Lin F. Yang
Revisiting Frequency Moment Estimation in Random Order Streams
重新审视随机顺序流中的频率矩估计
  • DOI:
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Braverman, Vladimir;Woodruff, David;Yang, Lin
  • 通讯作者:
    Yang, Lin
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Vladimir Braverman其他文献

Preoperative brain volume loss is associated with postoperative delirium in advanced heart failure patients supported by left ventricular assist device
术前脑容量丢失与左心室辅助装置支持的晚期心力衰竭患者术后谵妄有关
  • DOI:
    10.1038/s41598-025-94074-2
  • 发表时间:
    2025-03-14
  • 期刊:
  • 影响因子:
    3.900
  • 作者:
    Iván Murrieta-Álvarez;Jacob P. Scioscia;José M. Benítez-Salazar;Jason Uwaeze;Zicheng Xu;Guangyao Zheng;Shiyi Li;Vladimir Braverman;Carl P. Walther;Alexis E. Shafii;Camila Hochman-Mendez;Todd K. Rosengart;Kenneth K. Liao;Nandan K. Mondal
  • 通讯作者:
    Nandan K. Mondal
Metric <math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" id="d1e20" altimg="si14.svg" class="math"><mi>k</mi></math>-median clustering in insertion-only streams
  • DOI:
    10.1016/j.dam.2021.07.025
  • 发表时间:
    2021-12-15
  • 期刊:
  • 影响因子:
  • 作者:
    Vladimir Braverman;Harry Lang;Keith Levin;Yevgeniy Rudoy
  • 通讯作者:
    Yevgeniy Rudoy
Optimizing beat-wise input for arrhythmia detection using 1-D convolutional neural networks: A real-world ECG study
使用一维卷积神经网络优化逐搏输入以进行心律失常检测:一项真实世界的心电图研究
  • DOI:
    10.1016/j.cmpb.2025.108898
  • 发表时间:
    2025-09-01
  • 期刊:
  • 影响因子:
    4.800
  • 作者:
    Sunghan Lee;Guangyao Zheng;Jeonghwan Koh;Haoran Li;Zicheng Xu;Sung Pil Cho;Sung Il Im;Vladimir Braverman;In cheol Jeong
  • 通讯作者:
    In cheol Jeong
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?
线性回归的上下文学习需要多少预训练任务?
  • DOI:
    10.48550/arxiv.2310.08391
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jingfeng Wu;Difan Zou;Zixiang Chen;Vladimir Braverman;Quanquan Gu;Peter L. Bartlett
  • 通讯作者:
    Peter L. Bartlett
Private Data Stream Analysis for Universal Symmetric Norm Estimation
用于通用对称范数估计的私有数据流分析

Vladimir Braverman的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Vladimir Braverman', 18)}}的其他基金

Collaborative Research: CNS: Medium: Scalable Learning from Distributed Data for Wireless Network Management
合作研究:CNS:媒介:无线网络管理的分布式数据可扩展学习
  • 批准号:
    2333887
  • 财政年份:
    2022
  • 资助金额:
    $ 100万
  • 项目类别:
    Continuing Grant
CSR: NeTS: Small: In-Network Resource Management for Rack-Scale Computers
CSR:NetS:小型:机架级计算机的网络内资源管理
  • 批准号:
    2244870
  • 财政年份:
    2022
  • 资助金额:
    $ 100万
  • 项目类别:
    Standard Grant
CAREER: New Methods for Central Streaming Problems
职业:解决中央流媒体问题的新方法
  • 批准号:
    2244899
  • 财政年份:
    2022
  • 资助金额:
    $ 100万
  • 项目类别:
    Continuing Grant
Collaborative Research: CNS: Medium: Scalable Learning from Distributed Data for Wireless Network Management
合作研究:CNS:媒介:无线网络管理的分布式数据可扩展学习
  • 批准号:
    2107239
  • 财政年份:
    2021
  • 资助金额:
    $ 100万
  • 项目类别:
    Continuing Grant
CSR: NeTS: Small: In-Network Resource Management for Rack-Scale Computers
CSR:NetS:小型:机架级计算机的网络内资源管理
  • 批准号:
    1813487
  • 财政年份:
    2018
  • 资助金额:
    $ 100万
  • 项目类别:
    Standard Grant
CAREER: New Methods for Central Streaming Problems
职业:解决中央流媒体问题的新方法
  • 批准号:
    1652257
  • 财政年份:
    2017
  • 资助金额:
    $ 100万
  • 项目类别:
    Continuing Grant
EAGER: Universal Sketches for Network Monitoring
EAGER:网络监控通用草图
  • 批准号:
    1650041
  • 财政年份:
    2016
  • 资助金额:
    $ 100万
  • 项目类别:
    Standard Grant

相似国自然基金

HIV-1逆转录酶/整合酶双重抑制剂DKA-DAPYs的分子设计、合成及抗HIV活性研究
  • 批准号:
    21402148
  • 批准年份:
    2014
  • 资助金额:
    25.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

BIGDATA: F: DKA: Collaborative Research: Randomized Numerical Linear Algebra (RandNLA) for multi-linear and non-linear data
BIGDATA:F:DKA:协作研究:用于多线性和非线性数据的随机数值线性代数 (RandNLA)
  • 批准号:
    1661760
  • 财政年份:
    2016
  • 资助金额:
    $ 100万
  • 项目类别:
    Standard Grant
BIGDATA: F: DKA: Collaborative Research: High-Dimensional Statistical Machine Learning for Spatio-Temporal Climate Data
BIGDATA:F:DKA:协作研究:时空气候数据的高维统计机器学习
  • 批准号:
    1664720
  • 财政年份:
    2016
  • 资助金额:
    $ 100万
  • 项目类别:
    Standard Grant
BIGDATA: F: DKA: Collaborative Research: Structured Nearest Neighbor Search in High Dimensions
BIGDATA:F:DKA:协作研究:高维结构化最近邻搜索
  • 批准号:
    1447473
  • 财政年份:
    2015
  • 资助金额:
    $ 100万
  • 项目类别:
    Standard Grant
BIGDATA: F: DKA: Collaborative Research: Structured Nearest Neighbor Search in High Dimensions
BIGDATA:F:DKA:协作研究:高维结构化最近邻搜索
  • 批准号:
    1447413
  • 财政年份:
    2015
  • 资助金额:
    $ 100万
  • 项目类别:
    Standard Grant
BIGDATA: F: DKA: Collaborative Research: Structured Nearest Neighbor Search in High Dimensions
BIGDATA:F:DKA:协作研究:高维结构化最近邻搜索
  • 批准号:
    1447476
  • 财政年份:
    2015
  • 资助金额:
    $ 100万
  • 项目类别:
    Standard Grant
BIGDATA: F: DKA: Collaborative Research: Randomized Numerical Linear Algebra (RandNLA) for multi-linear and non-linear data
BIGDATA:F:DKA:协作研究:用于多线性和非线性数据的随机数值线性代数 (RandNLA)
  • 批准号:
    1447283
  • 财政年份:
    2014
  • 资助金额:
    $ 100万
  • 项目类别:
    Standard Grant
BIGDATA: F: DKA: Collaborative Research: Dealing Efficiently with Big Social Network Data
BIGDATA:F:DKA:协作研究:有效处理社交网络大数据
  • 批准号:
    1447554
  • 财政年份:
    2014
  • 资助金额:
    $ 100万
  • 项目类别:
    Continuing Grant
BIGDATA: IA: DKA: Collaborative Research: High-Thoughput Connectomics
大数据:IA:DKA:协作研究:高通量连接组学
  • 批准号:
    1447786
  • 财政年份:
    2014
  • 资助金额:
    $ 100万
  • 项目类别:
    Standard Grant
BIGDATA: F: DKA: Collaborative Research: High-Dimensional Statistical Machine Learning for Spatio-Temporal Climate Data
BIGDATA:F:DKA:协作研究:时空气候数据的高维统计机器学习
  • 批准号:
    1447566
  • 财政年份:
    2014
  • 资助金额:
    $ 100万
  • 项目类别:
    Standard Grant
BIGDATA: F: DKA: Collaborative Research: High-Dimensional Statistical Machine Learning for Spatio-Temporal Climate Data
BIGDATA:F:DKA:协作研究:时空气候数据的高维统计机器学习
  • 批准号:
    1447574
  • 财政年份:
    2014
  • 资助金额:
    $ 100万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了