Compressed Indexes for Sequence Data: Fast Construction and Dynamism
序列数据的压缩索引:快速构建和动态性
基本信息
- 批准号:RGPIN-2019-04225
- 负责人:
- 金额:$ 1.85万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2019
- 资助国家:加拿大
- 起止时间:2019-01-01 至 2020-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Situations when we need to store and analyze very large data volumes frequently arise in different areas of Computer Science. Although computers are becoming faster and memory size is getting larger, the design of efficient algorithms and data structures is not losing its importance. This is a consequence of the growing size of the data that we must store and computational intensity of problems that we must solve. In many applications working with massive data, such as bioinformatics and information retrieval, the data can be represented as a sequence of symbols (string) or as a collection of such sequences. This proposal focuses on efficient data structures for storage and processing of very large volumes of sequential data. ******Space usage plays a key role in the design of efficient algorithms. If we are able to find a space-efficient representation of the data, we can fit a larger fraction of data into the main memory or cache and reduce the number of access operations to the secondary memory. This raises the question about algorithms and data structures that work with data that is stored in compressed form. Although operations on compressed data can be more costly, the algorithms are frequently more efficient: The number of data transfers between different levels of the memory hierarchy is reduced. This results in a performance gain that outweighs the computational overhead. The study of algorithms for processing structured compressed data (strings, trees, graphs) is a thriving research area. ******The topic of this proposal is compressed representation of string indexes. A string index is a data structure that stores a sequence of symbols and answers pattern matching queries on this sequence. The string index and its variants are ubiquitous in algorithms working on string data. Although there is already a large body of work on this subject, many important questions with practical relevance are still open. In this project I will concentrate on two aspects of compressed indexes that are relevant in scenarios when the data must be changed frequently: (1) space-efficient and fast construction algorithms for string indexes and (2) support of dynamism in string indexes. ******I and my students will study efficient algorithm for constructing space-efficient and compact index structures. Special attention will be given to the space usage of these algorithms and to their relation with other models of computation. My students and I will also study the performance of dynamic data structures supporting pattern matching queries and some more involved queries on string data that is stored in compressed form. Rigorous study of structured data is the central topic of this proposal. But my group and I will also pay attention to practical aspects of data storage methods. Practical efficiency will be tested by implementing some of our results. Thus my research program has potential for immediate application and for advancing software industry in Canada.*
当我们需要存储和分析非常大的数据量的情况下,经常出现在计算机科学的不同领域。虽然计算机变得越来越快,内存容量越来越大,但设计有效的算法和数据结构并没有失去其重要性。这是我们必须存储的数据规模不断增长以及我们必须解决的问题的计算强度不断增加的结果。在处理大量数据的许多应用中,例如生物信息学和信息检索,数据可以表示为符号序列(字符串)或这些序列的集合。 该提案集中于用于存储和处理非常大量的顺序数据的有效数据结构。** 空间使用在高效算法的设计中起着关键作用。如果我们能够找到数据的空间有效表示,我们可以将更大部分的数据放入主内存或缓存中,并减少对辅助内存的访问操作数量。这就提出了一个关于算法和数据结构的问题,这些算法和数据结构处理以压缩形式存储的数据。 虽然对压缩数据的操作可能更昂贵,但算法通常更有效:减少了内存层次结构不同级别之间的数据传输数量。这导致性能增益超过计算开销。处理结构化压缩数据(字符串、树、图)的算法研究是一个蓬勃发展的研究领域。* 本提案的主题是字符串索引的压缩表示。字符串索引是一种数据结构,它存储一个符号序列,并回答对该序列的模式匹配查询。字符串索引及其变体在处理字符串数据的算法中无处不在。虽然在这方面已经有大量的工作,但许多具有实际意义的重要问题仍然悬而未决。在这个项目中,我将专注于压缩索引的两个方面,这两个方面与数据必须频繁更改的场景相关:(1)字符串索引的空间效率和快速构造算法,以及(2)支持字符串索引的动态性。 ** 我和我的学生将研究有效的算法来构建空间有效和紧凑的索引结构。将特别注意这些算法的空间使用和它们与其他计算模型的关系。我和我的学生还将研究支持模式匹配查询的动态数据结构的性能,以及对以压缩形式存储的字符串数据的更多查询。对结构化数据的严格研究是本提案的中心主题。但我和我的团队也会关注数据存储方法的实际方面。实际效率将通过实施我们的一些结果进行测试。因此,我的研究计划有可能立即应用,并推动加拿大的软件产业。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Nekrich, Yakov其他文献
Orthogonal Range Reporting and Rectangle Stabbing for Fat Rectangles
胖矩形的正交范围报告和矩形刺穿
- DOI:
10.1007/978-3-030-24766-9_21 - 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
Chan, Timothy M.;Nekrich, Yakov;Smid, Michiel - 通讯作者:
Smid, Michiel
Nekrich, Yakov的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Nekrich, Yakov', 18)}}的其他基金
Compressed Indexes for Sequence Data: Fast Construction and Dynamism
序列数据的压缩索引:快速构建和动态性
- 批准号:
DGECR-2019-00327 - 财政年份:2019
- 资助金额:
$ 1.85万 - 项目类别:
Discovery Launch Supplement
相似海外基金
Scaling Disk-Resident Learned Indexes For Database Systems
扩展数据库系统的磁盘驻留学习索引
- 批准号:
DP240101211 - 财政年份:2024
- 资助金额:
$ 1.85万 - 项目类别:
Discovery Projects
Analysis of problem structure and development of evaluation indexes in wind band and chorus club activities
管乐团、合唱团活动中的问题结构分析及评价指标的制定
- 批准号:
23K02076 - 财政年份:2023
- 资助金额:
$ 1.85万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Population pharmacokinetic analysis of immunosuppressants to optimize the dosing schedule using liver function indexes
利用肝功能指标对免疫抑制剂进行群体药代动力学分析以优化给药方案
- 批准号:
23K14409 - 财政年份:2023
- 资助金额:
$ 1.85万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Designing Indexes for Detecting Satisficing Answers using Paradata
使用 Paradata 设计用于检测满意答案的索引
- 批准号:
23K01751 - 财政年份:2023
- 资助金额:
$ 1.85万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Faster Compressed Indexes On Next-Generation Hardware
下一代硬件上更快的压缩索引
- 批准号:
RGPIN-2017-03910 - 财政年份:2022
- 资助金额:
$ 1.85万 - 项目类别:
Discovery Grants Program - Individual
Subjective Cognitive Effort Indexes Sub-Criticality in the Brain
主观认知努力指数大脑的次临界度
- 批准号:
10455114 - 财政年份:2021
- 资助金额:
$ 1.85万 - 项目类别:
Subjective Cognitive Effort Indexes Sub-Criticality in the Brain
主观认知努力指数大脑的次临界度
- 批准号:
10300868 - 财政年份:2021
- 资助金额:
$ 1.85万 - 项目类别:
Subjective Cognitive Effort Indexes Sub-Criticality in the Brain
主观认知努力指数大脑的次临界度
- 批准号:
10840121 - 财政年份:2021
- 资助金额:
$ 1.85万 - 项目类别:
Faster Compressed Indexes On Next-Generation Hardware
下一代硬件上更快的压缩索引
- 批准号:
RGPIN-2017-03910 - 财政年份:2021
- 资助金额:
$ 1.85万 - 项目类别:
Discovery Grants Program - Individual
Developing new tools and computational indexes to characterize sleep macro- and micro- architecture in older adults: special focus on sex and gender and inter-individual differences.
开发新工具和计算指数来表征老年人的睡眠宏观和微观结构:特别关注性别和个体间差异。
- 批准号:
440250 - 财政年份:2020
- 资助金额:
$ 1.85万 - 项目类别:
Fellowship Programs