CAREER: Scalable and Flexible Indexing of Compressed Sequences

职业:压缩序列的可扩展且灵活的索引

基本信息

  • 批准号:
    2337891
  • 负责人:
  • 金额:
    $ 59.82万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2024
  • 资助国家:
    美国
  • 起止时间:
    2024-03-01 至 2029-02-28
  • 项目状态:
    未结题

项目摘要

Lossless data compression is a classical and ubiquitous task that reduces the size of data, leading to a decreased cost of transfer or archival. Modern applications, however, require more than compression. In domains such as computational biology, terabytes of highly compressible data are generated at an increasing rate. To fully take advantage of this data, it needs to be not only stored in a small space but also accessible and searchable. Recent years have witnessed the birth of data structures called "compressed indexes" that can accomplish this task. The research so far, however, has mostly focused on static indexes, leaving out important aspects such as support for updates and efficient construction. This project's overarching goal is to develop powerful, dynamic compressed indexes capable of handling a versatile set of queries and various representations, that can moreover be efficiently constructed. This will make it significantly cheaper, faster, and more energy-efficient to store and share highly compressible datasets such as DNA collections, thereby fully unlocking the potential of advances in DNA sequencing. The advances of this project will also be integrated into research experience for underrepresented students, as well as outreach to non-computer science students.The main research goals of this project can be broadly classified into the following three directions. First, the project will lead to new efficient algorithms for constructing compressed indexes. The new approach lies in first preprocessing the input using lightly sub-optimal compression and then constructing the final index in compressed time, i.e., time proportional to the precompressed text. Second, the project aims to utilize and improve modern suffix sampling techniques to design new and powerful indexes that are both compressed and able to support powerful queries, including multi-string representations. Finally, the project will study the underdeveloped landscape of lower bounds for compressed indexes. Currently, only the most basic queries, such as random access, are well-understood, and much less is known about lower bounds on more versatile representations or lower bounds for compressed computation.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
无损数据压缩是一项经典且普遍存在的任务,它减少了数据的大小,从而降低了传输或存档的成本。然而,现代应用程序需要的不仅仅是压缩。在计算生物学等领域,以越来越快的速度生成tb级的高度可压缩数据。为了充分利用这些数据,它不仅需要存储在一个小空间中,而且需要可访问和可搜索。近年来出现了一种名为“压缩索引”的数据结构,它可以完成这项任务。然而,到目前为止的研究主要集中在静态索引上,而忽略了支持更新和高效构建等重要方面。这个项目的总体目标是开发功能强大的动态压缩索引,这些索引能够处理一组通用的查询和各种表示,而且可以高效地构建。这将使存储和共享高度可压缩的数据集(如DNA收集)变得更便宜、更快、更节能,从而充分释放DNA测序的进步潜力。该项目的进展也将被整合到代表性不足的学生的研究经验中,并扩展到非计算机科学的学生。本项目的主要研究目标大致可分为以下三个方向。首先,该项目将产生用于构造压缩索引的新的高效算法。新方法首先使用轻微次优压缩对输入进行预处理,然后在压缩时间内构建最终索引,即与预压缩文本成比例的时间。其次,该项目旨在利用和改进现代后缀采样技术来设计新的强大的索引,这些索引既压缩又能够支持强大的查询,包括多字符串表示。最后,本项目将研究压缩指数下界的欠发达景观。目前,只有最基本的查询,如随机访问,被很好地理解,而对更通用表示的下界或压缩计算的下界知之甚少。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Dominik Kempa其他文献

Grammar Precompression Speeds Up Burrows-Wheeler Compression
语法预压缩加速 Burrows-Wheeler 压缩
  • DOI:
    10.1007/978-3-642-34109-0_34
  • 发表时间:
    2012
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Juha Kärkkäinen;Pekka Mikkola;Dominik Kempa
  • 通讯作者:
    Dominik Kempa
Grammar Boosting: A New Technique for Proving Lower Bounds for Computation over Compressed Data
语法提升:一种证明压缩数据计算下限的新技术
  • DOI:
    10.48550/arxiv.2307.08833
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    9.9
  • 作者:
    Rajat De;Dominik Kempa
  • 通讯作者:
    Dominik Kempa
素因数分解の現状
质因数分解的现状
  • DOI:
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Hideo Bannai;Travis Gagie;Shunsuke Inenaga;Juha K_rkk_inen;Dominik Kempa;Marcin Piatkowski;Shiho Sugimoto;Shimizu S;藤井夏海・V. Sharma・藤澤和謙・村上 章;Charles Jordan;岸野洋久;伊豆 哲也,國廣 昇
  • 通讯作者:
    伊豆 哲也,國廣 昇
Dynamic suffix array with polylogarithmic queries and updates
具有多对数查询和更新的动态后缀数组
String Attractors: Verification and Optimization
字符串吸引子:验证和优化
  • DOI:
    10.4230/lipics.esa.2018.52
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Dominik Kempa;A. Policriti;N. Prezza;E. Rotenberg
  • 通讯作者:
    E. Rotenberg

Dominik Kempa的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似国自然基金

Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    合作创新研究团队

相似海外基金

Amutri3D - revolutionising the 3D-visualisation industry by offering a more scalable, flexible and faster solution for multiple sector uses
Amutri3D - 通过为多个行业用途提供更具可扩展性、灵活且更快的解决方案,彻底改变 3D 可视化行业
  • 批准号:
    10061530
  • 财政年份:
    2023
  • 资助金额:
    $ 59.82万
  • 项目类别:
    Collaborative R&D
RI: Small: Semantic 3D Neural Rendering Field Models that are Accurate, Complete, Flexible, and Scalable
RI:小型:准确、完整、灵活且可扩展的语义 3D 神经渲染场模型
  • 批准号:
    2312102
  • 财政年份:
    2023
  • 资助金额:
    $ 59.82万
  • 项目类别:
    Continuing Grant
IMR: MT: NetFlex: A Flexible Scalable & Privacy-Preserving Network Measurement Platform to Iteratively Collect Multi-modal Multi-view Network Data from Access Networks
IMR:MT:NetFlex:灵活的可扩展
  • 批准号:
    2323229
  • 财政年份:
    2023
  • 资助金额:
    $ 59.82万
  • 项目类别:
    Continuing Grant
CAREER: Laser-Induced Graphene with On-Demand Morphology and Chemistry Control for Scalable Flexible Device Manufacturing
职业:具有按需形态和化学控制的激光诱导石墨烯,用于可扩展的柔性设备制造
  • 批准号:
    2239244
  • 财政年份:
    2023
  • 资助金额:
    $ 59.82万
  • 项目类别:
    Standard Grant
Flexible and scalable digital-twin platform for enhanced production efficiency and yield in battery cell production lines - BATTwin
灵活且可扩展的数字孪生平台,可提高电池生产线的生产效率和产量 - BATTwin
  • 批准号:
    10118186
  • 财政年份:
    2023
  • 资助金额:
    $ 59.82万
  • 项目类别:
    EU-Funded
High-Efficiency Flexible and Scalable Halide-Perovskite Solar Modules
高效灵活且可扩展的卤化物钙钛矿太阳能模块
  • 批准号:
    EP/V027131/1
  • 财政年份:
    2022
  • 资助金额:
    $ 59.82万
  • 项目类别:
    Research Grant
Electrodeposited 2D Transition Metal Dichalcogenides on graphene: a novel route towards scalable flexible electronics
石墨烯上电沉积二维过渡金属二硫化物:实现可扩展柔性电子产品的新途径
  • 批准号:
    EP/V062603/1
  • 财政年份:
    2022
  • 资助金额:
    $ 59.82万
  • 项目类别:
    Research Grant
Electrodeposited 2D Transition Metal Dichalcogenides on graphene: a novel route towards scalable flexible electronics
石墨烯上电沉积二维过渡金属二硫化物:实现可扩展柔性电子产品的新途径
  • 批准号:
    EP/V062689/1
  • 财政年份:
    2022
  • 资助金额:
    $ 59.82万
  • 项目类别:
    Research Grant
Electrodeposited 2D Transition Metal Dichalcogenides on graphene: a novel route towards scalable flexible electronics
石墨烯上电沉积二维过渡金属二硫化物:实现可扩展柔性电子产品的新途径
  • 批准号:
    EP/V062387/1
  • 财政年份:
    2022
  • 资助金额:
    $ 59.82万
  • 项目类别:
    Research Grant
CC* Data Storage: Flexible Affordable Scalable Technology for Research Storage (FAST-Research Storage)
CC* 数据存储:用于研究存储的灵活且经济实惠的可扩展技术(FAST-Research Storage)
  • 批准号:
    2232810
  • 财政年份:
    2022
  • 资助金额:
    $ 59.82万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了