Faster Compressed Indexes On Next-Generation Hardware
下一代硬件上更快的压缩索引
基本信息
- 批准号:RGPIN-2017-03910
- 负责人:
- 金额:$ 3.06万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2020
- 资助国家:加拿大
- 起止时间:2020-01-01 至 2021-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Software indexes accelerate applications in business analytics, machine learning, and data science. They often determine the performance of big-data applications. Efficient indexes not only improve latency and throughput, but they also reduce energy usage. Many indexes make parsimonious use of internal memory so that critical data remains close to the processor. It is also desirable to work directly on the compressed data, to avoid potentially harmful decoding passes. Thus, we use lightweight compression strategies, optimized for speed.
We are interested in bitmap indexes. We find them in a wide range of popular systems: Oracle, Apache Hive, Apache Spark, Druid, Apache Kylin, Apache Lucene, Elastic, Git and so forth. They are an integral part of systemssuch as Wikipedia or GitHubused by millions of people every day .
Our long-term plan has three axes of research:
(1) Pursue the optimization of existing bitmap indexes, as they are used in current systems. Many of these systems rely on either Roaring or EWAH bitmaps, two formats we produced. We plan to multiply the performance of some of these indexes on processors supporting advanced SIMD (single instruction, multiple data) instructions such as those of the AVX2 and AVX-512 families.
(2) Continue to break speed records with our integer-compression techniques. We focus on sorted lists of integers, as they frequently appear in B+-trees, inverted indexes and compressed bitmap indexes. In recent years, we showed that we could decode billions of integers per second while maintaining compression ratios close to the limit given by Shannon's entropy. Yet we used all but a fraction of the features of the latest processors. We will establish new performance records on recent server processors while compressing even better. We will also accelerate applications with rank, select, merge, update and insert functions operating directly over the compressed data. Finally, we will apply this work to database engines (e.g., upscaledb) and big-data systems (e.g., Apache Parquet, Druid, Apache Spark).
(3) Develop a novel bitmap index format that outclasses the state of the art in both memory usage and raw speed. Currently, one of the best formats is Roaring: it is widely adopted, fast and it uses little memory. We seek to design a new format that uses even less memory than Roaring while improving the speed. Though it is relatively easy to offer better compression, doing so while improving query performance represents a real challenge. This research axis builds directly on the first two axes.
软件索引加速业务分析、机器学习和数据科学中的应用。它们通常决定大数据应用程序的性能。高效的索引不仅可以改善延迟和吞吐量,还可以减少能源使用。许多索引都很节约地使用内部存储器,以便关键数据保持在处理器附近。还期望直接对压缩数据进行处理,以避免潜在有害的解码通道。因此,我们使用轻量级压缩策略,针对速度进行了优化。
我们对位图索引感兴趣。我们可以在各种流行的系统中找到它们:Oracle,Apache Hive,Apache Spark,Druid,Apache Kylin,Apache Lucene,Elastic,Git等等。它们是维基百科或GitHub等系统不可或缺的一部分,每天有数百万人使用。
我们的长期计划有三个研究轴心:
(1)追求现有位图索引的优化,因为它们在当前系统中使用。这些系统中的许多依赖于Roaring或EWAH位图,这是我们制作的两种格式。我们计划在支持高级SIMD(单指令,多数据)指令(如AVX 2和AVX-512系列指令)的处理器上增加其中一些索引的性能。
(2)继续用我们的整数压缩技术打破速度记录。我们关注整数的排序列表,因为它们经常出现在B+树、倒排索引和压缩位图索引中。近年来,我们证明了我们可以每秒解码数十亿个整数,同时保持压缩比接近香农熵给出的极限。然而,我们使用了最新处理器的所有功能,只有一小部分。我们将在最近的服务器处理器上建立新的性能记录,同时压缩得更好。我们还将通过直接在压缩数据上操作的rank、select、merge、update和insert函数来加速应用程序。最后,我们将把这项工作应用到数据库引擎(例如,升级B)和大数据系统(例如,Apache Parquet,Druid,Apache Spark).
(3)开发一种新颖的位图索引格式,在内存使用和原始速度方面都超越了最先进的技术。目前,最好的格式之一是Roaring:它被广泛采用,速度快,占用内存少。我们试图设计一种新的格式,使用更少的内存比咆哮,同时提高速度。虽然提供更好的压缩相对容易,但在提高查询性能的同时这样做是一个真实的挑战。这个研究轴直接建立在前两个轴上。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Lemire, Daniel其他文献
Fast Random Integer Generation in an Interval
- DOI:
10.1145/3230636 - 发表时间:
2019-02-01 - 期刊:
- 影响因子:0.9
- 作者:
Lemire, Daniel - 通讯作者:
Lemire, Daniel
Faster retrieval with a two-pass dynamic-time-warping lower bound
- DOI:
10.1016/j.patcog.2008.11.030 - 发表时间:
2009-09-01 - 期刊:
- 影响因子:8
- 作者:
Lemire, Daniel - 通讯作者:
Lemire, Daniel
STREAM VBYTE: Faster byte-oriented integer compression
- DOI:
10.1016/j.ipl.2017.09.011 - 发表时间:
2018-02-01 - 期刊:
- 影响因子:0.5
- 作者:
Lemire, Daniel;Kurz, Nathan;Rupp, Christoph - 通讯作者:
Rupp, Christoph
Sorting improves word-aligned bitmap indexes
- DOI:
10.1016/j.datak.2009.08.006 - 发表时间:
2010-01-01 - 期刊:
- 影响因子:2.5
- 作者:
Lemire, Daniel;Kaser, Owen;Aouiche, Kamel - 通讯作者:
Aouiche, Kamel
Better bitmap performance with Roaring bitmaps
- DOI:
10.1002/spe.2325 - 发表时间:
2016-05-01 - 期刊:
- 影响因子:3.5
- 作者:
Chambi, Samy;Lemire, Daniel;Godin, Robert - 通讯作者:
Godin, Robert
Lemire, Daniel的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Lemire, Daniel', 18)}}的其他基金
Faster Compressed Indexes On Next-Generation Hardware
下一代硬件上更快的压缩索引
- 批准号:
RGPIN-2017-03910 - 财政年份:2022
- 资助金额:
$ 3.06万 - 项目类别:
Discovery Grants Program - Individual
Faster Compressed Indexes On Next-Generation Hardware
下一代硬件上更快的压缩索引
- 批准号:
RGPIN-2017-03910 - 财政年份:2021
- 资助金额:
$ 3.06万 - 项目类别:
Discovery Grants Program - Individual
Faster Compressed Indexes On Next-Generation Hardware
下一代硬件上更快的压缩索引
- 批准号:
RGPIN-2017-03910 - 财政年份:2019
- 资助金额:
$ 3.06万 - 项目类别:
Discovery Grants Program - Individual
Faster Compressed Indexes On Next-Generation Hardware
下一代硬件上更快的压缩索引
- 批准号:
507939-2017 - 财政年份:2019
- 资助金额:
$ 3.06万 - 项目类别:
Discovery Grants Program - Accelerator Supplements
Faster Compressed Indexes On Next-Generation Hardware
下一代硬件上更快的压缩索引
- 批准号:
RGPIN-2017-03910 - 财政年份:2018
- 资助金额:
$ 3.06万 - 项目类别:
Discovery Grants Program - Individual
Faster Compressed Indexes On Next-Generation Hardware
下一代硬件上更快的压缩索引
- 批准号:
507939-2017 - 财政年份:2018
- 资助金额:
$ 3.06万 - 项目类别:
Discovery Grants Program - Accelerator Supplements
Faster Compressed Indexes On Next-Generation Hardware
下一代硬件上更快的压缩索引
- 批准号:
RGPIN-2017-03910 - 财政年份:2017
- 资助金额:
$ 3.06万 - 项目类别:
Discovery Grants Program - Individual
Faster Compressed Indexes On Next-Generation Hardware
下一代硬件上更快的压缩索引
- 批准号:
507939-2017 - 财政年份:2017
- 资助金额:
$ 3.06万 - 项目类别:
Discovery Grants Program - Accelerator Supplements
Vision numérique par drone pour l'estimation du nombre de microsites de plantation
微型种植园名称估算的无人机愿景数字
- 批准号:
522077-2017 - 财政年份:2017
- 资助金额:
$ 3.06万 - 项目类别:
Engage Grants Program
Data reordering for better compression in databases
数据重新排序以更好地压缩数据库
- 批准号:
261437-2012 - 财政年份:2016
- 资助金额:
$ 3.06万 - 项目类别:
Discovery Grants Program - Individual
相似海外基金
Faster Compressed Indexes On Next-Generation Hardware
下一代硬件上更快的压缩索引
- 批准号:
RGPIN-2017-03910 - 财政年份:2022
- 资助金额:
$ 3.06万 - 项目类别:
Discovery Grants Program - Individual
Faster Compressed Indexes On Next-Generation Hardware
下一代硬件上更快的压缩索引
- 批准号:
RGPIN-2017-03910 - 财政年份:2021
- 资助金额:
$ 3.06万 - 项目类别:
Discovery Grants Program - Individual
Compressed Indexes for Sequence Data: Fast Construction and Dynamism
序列数据的压缩索引:快速构建和动态性
- 批准号:
DGECR-2019-00327 - 财政年份:2019
- 资助金额:
$ 3.06万 - 项目类别:
Discovery Launch Supplement
Compressed Indexes for Sequence Data: Fast Construction and Dynamism
序列数据的压缩索引:快速构建和动态性
- 批准号:
RGPIN-2019-04225 - 财政年份:2019
- 资助金额:
$ 3.06万 - 项目类别:
Discovery Grants Program - Individual
Faster Compressed Indexes On Next-Generation Hardware
下一代硬件上更快的压缩索引
- 批准号:
RGPIN-2017-03910 - 财政年份:2019
- 资助金额:
$ 3.06万 - 项目类别:
Discovery Grants Program - Individual
Faster Compressed Indexes On Next-Generation Hardware
下一代硬件上更快的压缩索引
- 批准号:
507939-2017 - 财政年份:2019
- 资助金额:
$ 3.06万 - 项目类别:
Discovery Grants Program - Accelerator Supplements
Faster Compressed Indexes On Next-Generation Hardware
下一代硬件上更快的压缩索引
- 批准号:
RGPIN-2017-03910 - 财政年份:2018
- 资助金额:
$ 3.06万 - 项目类别:
Discovery Grants Program - Individual
Faster Compressed Indexes On Next-Generation Hardware
下一代硬件上更快的压缩索引
- 批准号:
507939-2017 - 财政年份:2018
- 资助金额:
$ 3.06万 - 项目类别:
Discovery Grants Program - Accelerator Supplements
Faster Compressed Indexes On Next-Generation Hardware
下一代硬件上更快的压缩索引
- 批准号:
RGPIN-2017-03910 - 财政年份:2017
- 资助金额:
$ 3.06万 - 项目类别:
Discovery Grants Program - Individual
Faster Compressed Indexes On Next-Generation Hardware
下一代硬件上更快的压缩索引
- 批准号:
507939-2017 - 财政年份:2017
- 资助金额:
$ 3.06万 - 项目类别:
Discovery Grants Program - Accelerator Supplements