CAREER: Scalable and Flexible Indexing of Compressed Sequences
职业:压缩序列的可扩展且灵活的索引
基本信息
- 批准号:2337891
- 负责人:
- 金额:$ 59.82万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2024
- 资助国家:美国
- 起止时间:2024-03-01 至 2029-02-28
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Lossless data compression is a classical and ubiquitous task that reduces the size of data, leading to a decreased cost of transfer or archival. Modern applications, however, require more than compression. In domains such as computational biology, terabytes of highly compressible data are generated at an increasing rate. To fully take advantage of this data, it needs to be not only stored in a small space but also accessible and searchable. Recent years have witnessed the birth of data structures called "compressed indexes" that can accomplish this task. The research so far, however, has mostly focused on static indexes, leaving out important aspects such as support for updates and efficient construction. This project's overarching goal is to develop powerful, dynamic compressed indexes capable of handling a versatile set of queries and various representations, that can moreover be efficiently constructed. This will make it significantly cheaper, faster, and more energy-efficient to store and share highly compressible datasets such as DNA collections, thereby fully unlocking the potential of advances in DNA sequencing. The advances of this project will also be integrated into research experience for underrepresented students, as well as outreach to non-computer science students.The main research goals of this project can be broadly classified into the following three directions. First, the project will lead to new efficient algorithms for constructing compressed indexes. The new approach lies in first preprocessing the input using lightly sub-optimal compression and then constructing the final index in compressed time, i.e., time proportional to the precompressed text. Second, the project aims to utilize and improve modern suffix sampling techniques to design new and powerful indexes that are both compressed and able to support powerful queries, including multi-string representations. Finally, the project will study the underdeveloped landscape of lower bounds for compressed indexes. Currently, only the most basic queries, such as random access, are well-understood, and much less is known about lower bounds on more versatile representations or lower bounds for compressed computation.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
无损数据压缩是一项经典且普遍存在的任务,它可以减少数据的大小,从而降低传输或存档的成本。然而,现代应用程序需要的不仅仅是压缩。在计算生物学等领域,TB级的高度可压缩数据以越来越快的速度生成。为了充分利用这些数据,它不仅需要存储在一个小空间中,而且还需要可访问和可搜索。近年来,可以完成这一任务的数据结构“压缩索引”诞生了。然而,到目前为止,研究主要集中在静态索引上,忽略了支持更新和高效构建等重要方面。这个项目的首要目标是开发强大的,动态的压缩索引,能够处理一组通用的查询和各种表示,而且可以有效地构建。这将使存储和共享高度可压缩的数据集(如DNA集合)变得更便宜,更快,更节能,从而充分释放DNA测序进步的潜力。该项目的进展也将被整合到研究经验,为代表性不足的学生,以及推广到非计算机科学专业的学生。该项目的主要研究目标可以大致分为以下三个方向。首先,该项目将导致新的高效算法构建压缩索引。新方法在于首先使用轻度次优压缩对输入进行预处理,然后在压缩时间内构建最终索引,即,时间与预压缩文本成比例。其次,该项目旨在利用和改进现代后缀采样技术,以设计新的强大的索引,这些索引既经过压缩,又能够支持强大的查询,包括多字符串表示。最后,本项目将研究欠发达景观的压缩指标下限。目前,只有最基本的查询,如随机访问,是很好的理解,更通用的表示或压缩计算的下限知之甚少。这个奖项反映了NSF的法定使命,并已被认为是值得通过使用基金会的智力价值和更广泛的影响审查标准进行评估的支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Dominik Kempa其他文献
Grammar Precompression Speeds Up Burrows-Wheeler Compression
语法预压缩加速 Burrows-Wheeler 压缩
- DOI:
10.1007/978-3-642-34109-0_34 - 发表时间:
2012 - 期刊:
- 影响因子:0
- 作者:
Juha Kärkkäinen;Pekka Mikkola;Dominik Kempa - 通讯作者:
Dominik Kempa
Grammar Boosting: A New Technique for Proving Lower Bounds for Computation over Compressed Data
语法提升:一种证明压缩数据计算下限的新技术
- DOI:
10.48550/arxiv.2307.08833 - 发表时间:
2023 - 期刊:
- 影响因子:9.9
- 作者:
Rajat De;Dominik Kempa - 通讯作者:
Dominik Kempa
素因数分解の現状
质因数分解的现状
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Hideo Bannai;Travis Gagie;Shunsuke Inenaga;Juha K_rkk_inen;Dominik Kempa;Marcin Piatkowski;Shiho Sugimoto;Shimizu S;藤井夏海・V. Sharma・藤澤和謙・村上 章;Charles Jordan;岸野洋久;伊豆 哲也,國廣 昇 - 通讯作者:
伊豆 哲也,國廣 昇
Dynamic suffix array with polylogarithmic queries and updates
具有多对数查询和更新的动态后缀数组
- DOI:
10.1145/3519935.3520061 - 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Dominik Kempa;Tomasz Kociumaka - 通讯作者:
Tomasz Kociumaka
String Attractors: Verification and Optimization
字符串吸引子:验证和优化
- DOI:
10.4230/lipics.esa.2018.52 - 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Dominik Kempa;A. Policriti;N. Prezza;E. Rotenberg - 通讯作者:
E. Rotenberg
Dominik Kempa的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:合作创新研究团队
相似海外基金
Amutri3D - revolutionising the 3D-visualisation industry by offering a more scalable, flexible and faster solution for multiple sector uses
Amutri3D - 通过为多个行业用途提供更具可扩展性、灵活且更快的解决方案,彻底改变 3D 可视化行业
- 批准号:
10061530 - 财政年份:2023
- 资助金额:
$ 59.82万 - 项目类别:
Collaborative R&D
RI: Small: Semantic 3D Neural Rendering Field Models that are Accurate, Complete, Flexible, and Scalable
RI:小型:准确、完整、灵活且可扩展的语义 3D 神经渲染场模型
- 批准号:
2312102 - 财政年份:2023
- 资助金额:
$ 59.82万 - 项目类别:
Continuing Grant
IMR: MT: NetFlex: A Flexible Scalable & Privacy-Preserving Network Measurement Platform to Iteratively Collect Multi-modal Multi-view Network Data from Access Networks
IMR:MT:NetFlex:灵活的可扩展
- 批准号:
2323229 - 财政年份:2023
- 资助金额:
$ 59.82万 - 项目类别:
Continuing Grant
CAREER: Laser-Induced Graphene with On-Demand Morphology and Chemistry Control for Scalable Flexible Device Manufacturing
职业:具有按需形态和化学控制的激光诱导石墨烯,用于可扩展的柔性设备制造
- 批准号:
2239244 - 财政年份:2023
- 资助金额:
$ 59.82万 - 项目类别:
Standard Grant
Flexible and scalable digital-twin platform for enhanced production efficiency and yield in battery cell production lines - BATTwin
灵活且可扩展的数字孪生平台,可提高电池生产线的生产效率和产量 - BATTwin
- 批准号:
10118186 - 财政年份:2023
- 资助金额:
$ 59.82万 - 项目类别:
EU-Funded
High-Efficiency Flexible and Scalable Halide-Perovskite Solar Modules
高效灵活且可扩展的卤化物钙钛矿太阳能模块
- 批准号:
EP/V027131/1 - 财政年份:2022
- 资助金额:
$ 59.82万 - 项目类别:
Research Grant
Electrodeposited 2D Transition Metal Dichalcogenides on graphene: a novel route towards scalable flexible electronics
石墨烯上电沉积二维过渡金属二硫化物:实现可扩展柔性电子产品的新途径
- 批准号:
EP/V062603/1 - 财政年份:2022
- 资助金额:
$ 59.82万 - 项目类别:
Research Grant
Electrodeposited 2D Transition Metal Dichalcogenides on graphene: a novel route towards scalable flexible electronics
石墨烯上电沉积二维过渡金属二硫化物:实现可扩展柔性电子产品的新途径
- 批准号:
EP/V062689/1 - 财政年份:2022
- 资助金额:
$ 59.82万 - 项目类别:
Research Grant
Electrodeposited 2D Transition Metal Dichalcogenides on graphene: a novel route towards scalable flexible electronics
石墨烯上电沉积二维过渡金属二硫化物:实现可扩展柔性电子产品的新途径
- 批准号:
EP/V062387/1 - 财政年份:2022
- 资助金额:
$ 59.82万 - 项目类别:
Research Grant
CAREER: A Model for Achieving Flexible and Scalable Conceptual Assessment – A Prototype in Undergraduate Quantum Mechanics
职业:实现灵活且可扩展的概念评估的模型 - 本科生量子力学的原型
- 批准号:
2143976 - 财政年份:2022
- 资助金额:
$ 59.82万 - 项目类别:
Continuing Grant