AF:Small:Extreme Streaming Problems

AF:小:极端流媒体问题

基本信息

项目摘要

The need for modern streaming systems to collect and analyze human activities is enormous, in businesses, government, academia, and society. Businesses can improve their operations and production; governments can improve the participation and satisfaction of citizens; society at large can be more sustainable or safe; etc. However, systems that collect and analyze such streams have enormous challenges of scale. At the highest level, this proposal combines a research, education and outreach plan to address these challenges. This research focusses on developing new algorithmic methods and theoretical understanding of modern streaming problems. There is an extensive theory of streaming algorithms for single streams, or to a limited extent to distributed streams, for one (or few) high cardinality dimensional data and simple frequency based analyses. But there are large gaps in creating streaming algorithms for "extreme" needs in modern data streaming applications, where the dimension of data that is collected is large, multiple streams of collected in distributed vantage points, there is a need to find anomalies in high dimensional spaces, and analyses that are needed are sophisticated including machine learning and other real time decision making tasks. This research develops the algorithmic theory of these extreme streaming problems. In particular, the project develops the algorithmic foundations for using large scale distributed streaming systems and tradeoff quality, certainty, CPU, memory and communication needed to do extreme, streaming, sophisticated analyses. Since modern streaming systems power businesses and deal with behavioral data on users, this work has broad societal impact. The project significantly improves the state of the art in algorithms for modern streaming systems. By providing new, rich algorithmic approaches, the project inspires practitioners in academia and industry to conceive more impactful applications, which are infeasible given the current algorithmic tools.The research program both enables and benefits from an education and outreach program that enhances curriculum, fosters training women and underrepresented minorities. To enable technology transfer the project involves practitioners in streaming systems, for field-testing the methods whenever possible. All publishable results are disseminated in respected academic journals, conferences, and workshops. All code and data sets developed in this work are made openly available to the community via the MassDAL site that already has code that is used by the community for classical streaming problems.Classical streaming algorithms use space sublinear, typically polylogarithmic in the input parameters. When extended to distributed systems, often the focus is on sublinear communication. The research program, here, builds on this algorithmic theory of past 15 years and addresses the modern, extreme needs of streaming applications. The project extends the theory to: many dimensions with large attributes, using far fewer resources in memory, computing and communication; emerging, pipeline models of streaming; more sophisticated analyses from local privacy to deep learning type vector embeddings; etc. The research program addresses fundamental problems. In more detail:(a) Extant streaming algorithms work for one or few dimensions of data of high cardinality. Modern streaming systems collect logs of human activity and have 100s of dimensions, 10s or more of them have very high cardinality. Can one identify the key problems for these domains and develop an algorithmic theory? The PI has identified an effective approach based on graphical modeling of the relationship between the dimensions, and believes this nugget can yield an effective theory.(b) Extant streaming algorithms use polylogarithmic words of memory per analysis when they are considered successful (and lower bounds point to problems for which sublinear space is not sufficient). Modern streaming systems run several orders of magnitude of such analyses, for example, one analysis for each of the millions of users. The project has identified an approach of "frugal" streaming, where algorithms use a small constant number of words, and develops a theory of frugal streaming algorithms, and their limitations.(c) Modern data stream systems allow pipelining, with the stream (modifiable or not) passing through stages, either at individual sites, or across the sites. The project abstracts and develops algorithmic theory of streaming problems with pipelined streaming systems. Preliminary results indicate that this allows algorithms that use time sublinear in the sublinear space used by the solutions, and there is a rich class of path problems that can be solved in these models which are impossible in classical streaming.(d) Modern systems need sophisticated streaming analyses. For example, streaming systems that collect usage data from users need private methods, and combining local differential privacy with streaming methods is an exciting direction; as another example, systems that analyze usage data might rely on embedding data into vectors with semantics, like word2vec and related deep learning methods. These methods need to work with polylogarithmic space for streaming. As another example, rich classes of graph problems are solvable in property testing framework with sublinear samples, can such classes be solved in streaming models too? The project highlights specific research challenges involved in developing streaming algorithms, and develops algorithms with provable performance guarantees on the tradeoff of resources used, approximation ratio, and probability of success. The project empirically evaluates them when possible.One cannot take known statements of problems and hope to solve them using streaming algorithms. One needs to modify the problems a bit to be amenable to streaming. In classical streaming, "heavy hitters" and "few terms" properties helped achieve that. In similar vein, the project identifies certain natural phenomena which helps formulate versions of problems for which extreme streaming solutions can be developed. Contours of this are already seen in using graphical models that limit interactions between dimensions to circumvent high dimensional high cardinality cases, or reusing counters in frugal streaming or sampling the sketch data structure in privacy problems and pipelined streaming, or using the randomness in stream order. This may eventually lead to algorithmic and empirical insights. Overall vision of the project is to provide a principled perspective for design and analysis of streaming algorithms with extreme needs.
在商业、政府、学术界和社会中,对现代流系统收集和分析人类活动的需求是巨大的。企业可以改善经营和生产;政府可以提高公民的参与度和满意度;整个社会可以更可持续或更安全;等。然而,收集和分析这些数据流的系统在规模上面临着巨大的挑战。在最高层面上,该提案结合了研究、教育和推广计划,以应对这些挑战。本研究的重点是发展新的算法方法和对现代流问题的理论认识。对于单个流,或在有限范围内的分布式流,对于一个(或几个)高基数维度数据和简单的基于频率的分析,有一个广泛的流算法理论。但是,在为现代数据流应用的“极端”需求创建流算法方面存在很大的差距,在这些应用中,收集的数据维度很大,在分布式有利位置收集的多个流,需要在高维空间中发现异常,并且需要复杂的分析,包括机器学习和其他实时决策任务。本研究发展了这些极端流问题的算法理论。特别是,该项目开发了使用大规模分布式流系统的算法基础,并权衡了进行极端、流化、复杂分析所需的质量、确定性、CPU、内存和通信。由于现代流媒体系统为企业提供动力并处理用户的行为数据,因此这项工作具有广泛的社会影响。该项目显著提高了现代流媒体系统算法的最新水平。通过提供新的、丰富的算法方法,该项目激励学术界和工业界的从业者构思更有影响力的应用,这在当前的算法工具下是不可行的。该研究项目使教育和推广项目得以实施,并从中受益,该项目旨在加强课程设置,促进培训妇女和未被充分代表的少数民族。为了实现技术转移,该项目涉及流系统的从业人员,以便尽可能对这些方法进行现场测试。所有可发表的成果都在受人尊敬的学术期刊、会议和研讨会上传播。在这项工作中开发的所有代码和数据集都通过MassDAL网站公开提供给社区,该网站已经有社区用于解决经典流问题的代码。经典的流算法在输入参数中使用空间次线性,通常是多对数。当扩展到分布式系统时,通常关注的是次线性通信。这里的研究项目建立在过去15年的算法理论基础上,解决了现代流媒体应用的极端需求。该项目将理论扩展到:具有大属性的多个维度,在内存、计算和通信方面使用更少的资源;新兴的流媒体管道模式;从局部隐私到深度学习类型向量嵌入的更复杂分析;等。这个研究计划解决一些基本问题。更详细地说:(a)现有的流算法适用于高基数数据的一个或几个维度。现代流系统收集人类活动的日志,并且具有100个维度,其中10个或更多维度具有非常高的基数。能否找出这些领域的关键问题并发展出一套算法理论?PI已经确定了一种基于维度之间关系的图形化建模的有效方法,并相信这一金块可以产生有效的理论。(b)现有的流算法在每次分析被认为成功时使用多对数的内存单词(下界指向次线性空间不足的问题)。现代流媒体系统可以运行几个数量级的此类分析,例如,对数百万用户中的每个用户进行一次分析。该项目已经确定了一种“节俭”流的方法,其中算法使用少量恒定的单词,并开发了节俭流算法的理论及其局限性。(c)现代数据流系统允许流水线,数据流(可修改或不可修改)通过各个阶段,在单个站点或跨站点。该项目抽象并发展了流水线流系统的流问题的算法理论。初步结果表明,这允许在求解所使用的亚线性空间中使用时间亚线性的算法,并且在这些模型中可以解决经典流中无法解决的丰富的路径问题。(d)现代系统需要复杂的流分析。例如,从用户那里收集使用数据的流系统需要私有方法,将本地差分隐私与流方法相结合是一个令人兴奋的方向;另一个例子是,分析使用数据的系统可能依赖于将数据嵌入到具有语义的向量中,如word2vec和相关的深度学习方法。这些方法需要使用多对数空间进行流处理。又如,在具有次线性样本的属性测试框架中可以解决丰富的图类问题,那么在流模型中也可以解决这类问题吗?该项目强调了开发流算法所涉及的具体研究挑战,并开发了在使用的资源、近似比率和成功概率的权衡上具有可证明性能保证的算法。项目在可能的情况下对它们进行经验性评估。一个人不能把已知的问题陈述,并希望用流算法来解决它们。我们需要稍微修改一下问题以适应流媒体。在经典流媒体中,“重量级”和“少数条款”属性帮助实现了这一点。类似地,该项目确定了某些自然现象,这些现象有助于制定问题的版本,从而可以开发出极端流解决方案。在使用图形模型来限制维度之间的交互以规避高维高基数的情况下,或者在节俭流中重用计数器,或者在隐私问题和流水线流中对草图数据结构进行采样,或者在流顺序中使用随机性,都可以看到这种轮廓。这可能最终导致算法和经验的见解。该项目的总体愿景是为具有极端需求的流算法的设计和分析提供一个原则性的视角。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Shanmugavelayu Muthukrishnan其他文献

Shanmugavelayu Muthukrishnan的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Shanmugavelayu Muthukrishnan', 18)}}的其他基金

AitF: FULL: Collaborative Research: Compact Data Structures for Traffic Measurement in Software-Defined Networks
AitF:完整:协作研究:软件定义网络中流量测量的紧凑数据结构
  • 批准号:
    1535878
  • 财政年份:
    2015
  • 资助金额:
    $ 49.91万
  • 项目类别:
    Standard Grant
BIGDATA: F: DKA: Collaborative Research: Dealing Efficiently with Big Social Network Data
BIGDATA:F:DKA:协作研究:有效处理社交网络大数据
  • 批准号:
    1447793
  • 财政年份:
    2014
  • 资助金额:
    $ 49.91万
  • 项目类别:
    Standard Grant
AF: Medium: Collaborative Research: Sparse Approximation: Theory and Extensions
AF:媒介:协作研究:稀疏逼近:理论与扩展
  • 批准号:
    1161151
  • 财政年份:
    2012
  • 资助金额:
    $ 49.91万
  • 项目类别:
    Standard Grant
Workshop on Foundations of Algorithms in the Field
现场算法基础研讨会
  • 批准号:
    1131447
  • 财政年份:
    2011
  • 资助金额:
    $ 49.91万
  • 项目类别:
    Standard Grant
ICES: Small: Auctions and Optimizations in Ad Exchanges
ICES:小型:广告交易中的拍卖和优化
  • 批准号:
    1101677
  • 财政年份:
    2011
  • 资助金额:
    $ 49.91万
  • 项目类别:
    Standard Grant
Approximate Distributed Stream Tracking: Enabling the Next Generation of Data-Streaming Applications
近似分布式流跟踪:支持下一代数据流应用程序
  • 批准号:
    0414852
  • 财政年份:
    2005
  • 资助金额:
    $ 49.91万
  • 项目类别:
    Standard Grant
Collaborative Research: Algorithms for sparse data representations
协作研究:稀疏数据表示算法
  • 批准号:
    0354690
  • 财政年份:
    2004
  • 资助金额:
    $ 49.91万
  • 项目类别:
    Standard Grant
ITR: Sublinear Algorithms for Massive Data Sets
ITR:海量数据集的次线性算法
  • 批准号:
    0220280
  • 财政年份:
    2002
  • 资助金额:
    $ 49.91万
  • 项目类别:
    Continuing Grant

相似国自然基金

昼夜节律性small RNA在血斑形成时间推断中的法医学应用研究
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
tRNA-derived small RNA上调YBX1/CCL5通路参与硼替佐米诱导慢性疼痛的机制研究
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    10.0 万元
  • 项目类别:
    省市级项目
Small RNA调控I-F型CRISPR-Cas适应性免疫性的应答及分子机制
  • 批准号:
    32000033
  • 批准年份:
    2020
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
Small RNAs调控解淀粉芽胞杆菌FZB42生防功能的机制研究
  • 批准号:
    31972324
  • 批准年份:
    2019
  • 资助金额:
    58.0 万元
  • 项目类别:
    面上项目
变异链球菌small RNAs连接LuxS密度感应与生物膜形成的机制研究
  • 批准号:
    81900988
  • 批准年份:
    2019
  • 资助金额:
    21.0 万元
  • 项目类别:
    青年科学基金项目
肠道细菌关键small RNAs在克罗恩病发生发展中的功能和作用机制
  • 批准号:
    31870821
  • 批准年份:
    2018
  • 资助金额:
    56.0 万元
  • 项目类别:
    面上项目
基于small RNA 测序技术解析鸽分泌鸽乳的分子机制
  • 批准号:
    31802058
  • 批准年份:
    2018
  • 资助金额:
    26.0 万元
  • 项目类别:
    青年科学基金项目
Small RNA介导的DNA甲基化调控的水稻草矮病毒致病机制
  • 批准号:
    31772128
  • 批准年份:
    2017
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目
基于small RNA-seq的针灸治疗桥本甲状腺炎的免疫调控机制研究
  • 批准号:
    81704176
  • 批准年份:
    2017
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目
水稻OsSGS3与OsHEN1调控small RNAs合成及其对抗病性的调节
  • 批准号:
    91640114
  • 批准年份:
    2016
  • 资助金额:
    85.0 万元
  • 项目类别:
    重大研究计划

相似海外基金

Understanding interactions between minerals and small biopolymers under extreme conditions
了解极端条件下矿物质和小型生物聚合物之间的相互作用
  • 批准号:
    2870997
  • 财政年份:
    2023
  • 资助金额:
    $ 49.91万
  • 项目类别:
    Studentship
OAC Core: SHF: SMALL: ICURE -- In-situ Analytics with Compressed or Summary Representations for Extreme-Scale Architectures
OAC 核心:SHF:SMALL:ICURE——针对超大规模架构的压缩或摘要表示的原位分析
  • 批准号:
    2333899
  • 财政年份:
    2023
  • 资助金额:
    $ 49.91万
  • 项目类别:
    Standard Grant
Collaborative Research: SHF: SMALL: Compile-Parallelize-Schedule-Retarget-Repeat (EASER) Paradigm for Dealing with Extreme Heterogeneity
合作研究:SHF:SMALL:处理极端异构性的编译-并行化-调度-重定向-重复(EASER)范式
  • 批准号:
    2146873
  • 财政年份:
    2022
  • 资助金额:
    $ 49.91万
  • 项目类别:
    Standard Grant
OAC Core: SHF: SMALL: ICURE -- In-situ Analytics with Compressed or Summary Representations for Extreme-Scale Architectures
OAC 核心:SHF:SMALL:ICURE——针对超大规模架构的压缩或摘要表示的原位分析
  • 批准号:
    2007775
  • 财政年份:
    2020
  • 资助金额:
    $ 49.91万
  • 项目类别:
    Standard Grant
Collaborative Research: CNS Core: Small: Beyond-5G Extreme Mobility: Issues and Solutions
合作研究:CNS 核心:小型:超越 5G 极限移动性:问题与解决方案
  • 批准号:
    2008026
  • 财政年份:
    2020
  • 资助金额:
    $ 49.91万
  • 项目类别:
    Standard Grant
Collaborative Research: CNS Core: Small: Beyond-5G Extreme Mobility: Issues and Solutions
合作研究:CNS 核心:小型:超越 5G 极限移动性:问题与解决方案
  • 批准号:
    2008824
  • 财政年份:
    2020
  • 资助金额:
    $ 49.91万
  • 项目类别:
    Standard Grant
III: Small: Collaborative Research: Study of Neural Architectural Components in Physics-Informed Deep Neural Networks for Extreme Flood Prediction
III:小型:协作研究:用于极端洪水预测的物理信息深度神经网络中的神经架构组件研究
  • 批准号:
    2008276
  • 财政年份:
    2020
  • 资助金额:
    $ 49.91万
  • 项目类别:
    Continuing Grant
III: Small: Collaborative Research: Study of Neural Architectural Components in Physics-Informed Deep Neural Networks for Extreme Flood Prediction
III:小型:协作研究:用于极端洪水预测的物理信息深度神经网络中的神经架构组件研究
  • 批准号:
    2008202
  • 财政年份:
    2020
  • 资助金额:
    $ 49.91万
  • 项目类别:
    Continuing Grant
OAC Core: SHF: SMALL: ICURE -- In-situ Analytics with Compressed or Summary Representations for Extreme-Scale Architectures
OAC 核心:SHF:SMALL:ICURE——针对超大规模架构的压缩或摘要表示的原位分析
  • 批准号:
    2034850
  • 财政年份:
    2020
  • 资助金额:
    $ 49.91万
  • 项目类别:
    Standard Grant
III: Small: Prediction and Characterization of Extreme Events in Spatio-Temporal Data.
III:小:时空数据中极端事件的预测和表征。
  • 批准号:
    2006633
  • 财政年份:
    2020
  • 资助金额:
    $ 49.91万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了