Stanchion Turns SQLite Into A Column Store

Written by Nikos Vaggalis

Thursday, 15 February 2024

Stanchion is an open-source extension that adds columnar storage capabilities to SQLite, rendering it efficient for analytical reporting.

Postgres is often praised for its extensibility, but we tend to forget that others, such as SQLlite, are extensible too.

sqlite

SQLite has fewer built-in functions compared to PostgreSQL and others. For that reason, the authors provided an extension mechanism which allows adding almost anything on top of the Core. As a result, there are a lot of SQLite extensions out there and an attempt to collect them all under one roof is the Github repo of Sqlean. As such there you'll find extensions such as :

crypto: secure hashes
fileio: read and write files
fuzzy: fuzzy string matching and phonetics
ipaddr: IP address manipulation
json1: JSON functions
math: math functions
re: regular expressions
stats: math statistics
text: string functions
unicode: Unicode support
uuid: Universally Unique IDentifiers
vsv: CSV files as virtual tables

Most of them are written in C or Rust, but there's also Stanchion, written in Zig.

Stanchion tries to answer the age old question; OLAP or OLTP? Should you have a separate database for your transactional workloads and another for your warehousing needs?

There are a number of situations where OLAP and column-oriented storage outperforms row-oriented storage:

Storing and processing metric, log, and event data
Timeseries data storage and analysis
Analytical queries over many rows and a few columns (e. g. calculating the average temperature over months of hourly weather data)
Change tracking, history/temporal tables
Anchor modeling / Datomic-like data models

Stanchion allows you to have both column-oriented storage and row-oriented storage under the same roof making SQLite usable as an analytics store on embedded, portable or limited resource devices. Stanchion is an ideal fit for analytical queries and wide tables because it only scans data from the columns that are referenced by a given query. (See SQL example bellow)

It also uses compression techniques like run length and bit-packed encodings that significantly reduce the size of stored data, greatly reducing the cost of large data sets. This makes it an ideal solution for storing large, expanding datasets.

As such the column-oriented data is stored in large BLOBs inside of regular SQLite tables, trying to circumvent the limitations posed by SQLite's virtual table API which is designed with row-oriented access in mind. Despite that Stanchion uses a few workarounds to enable analytical queries :

lazily loading segments
encodings/bit packing for booleans and integers which reduce the size of data that needs to be traversed on disk
using clustered index for minimizing data access.

As sqlite extensions are just dynamic libraries you just have to load it into your database from the SQLite CLI:

. load /path/to/libstanchion

Then you can create your first stanchion table much like creating any table in SQLite, with the difference
being that you must use the 'USING stanchion' phrase which tells SQLite to create a virtual table that is implemented by stanchion.

CREATE VIRTUAL TABLE dnd_monsters
USING stanchion (
id INTEGER NOT NULL,
name TEXT NOT NULL,
type TEXT NOT NULL,
size INTEGER NOT NULL,
challenge_rating FLOAT NOT NULL,
SORT KEY (id)
);

After that you can insert your data like you usually do :

INSERT INTO dnd_monsters (id, name, type, size, challenge_rating)
VALUES
(1, 'Beholder', 'ABERRATION', 4, 13),
(2, 'Gelatinous Cube', 'OOZE', 4, 2),
(3, 'Mimic', 'MONSTROSITY', 3, 2),
(4, 'Lich', 'UNDEAD', 3, 21);

and query it like you usually do:

SELECT name
FROM dnd_monsters
WHERE type = 'UNDEAD' AND challenge_rating >= 18;

The difference here is that because the `dnd_monsters` table is column-oriented, that query only reads data from the `name`, `type`, and `challenge_rating` columns. Data in the `id` and `size` columns are not accessed or scanned at all!

So there you have it. Local based analytics on devices with limited resources, see mobiles phones or Raspberry Pi, without having to use a separated dbms product or modifying your existing stack. And don't think that the local part is to be underestimated. There's lately a range of products that put local-first, first, like we examined in TinyBase And The Local First Movement. What's the deal with those apps?

They can be used offline and sync with a remote service when going online
They are effective in scenarios that continuous data connectivity is not guaranteed-don't assume that there's a Cloud always on the backend
They are local first - you own your data; they are not deposited on a Cloud provider. Unless you own the data or it doesn't reside on your own devices then it's not yours.

In conclusion, besides the advantages Stanchion brings to the table, I have a suggestion to make on top to drive the project a bit more forward. That is, in order to implement better compression, maybe to consider integrating the sqlite-zstd extension which provides transparent dictionary-based row-level compression that allows compressing a sqlite database's entries almost as well as if you were compressing the whole DB file, while at the same time retaining random access. More on that
on Use Rust To Reduce The Size Of Your SQLite Database. Of course sqlite-zstd would need some tweaking to cater for columnar compression, but that's not impossible!

sqlite

More Information

Stanchion

Use Rust To Reduce The Size Of Your SQLite Database

In Praise Of SQLite

Hydra Turns PostgreSQL Into A Column Store

TinyBase And The Local First Movement

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Undefined Behavior Just Not Worth The Effort!
30/04/2025

Some very interesting research has just been published that throws a lot of light on the crazy belief that undefined behavior is useful, essential even, to certain types of optimization rather than th [ ... ]

+ Full Story

Linux Foundation Mentorship Program
15/04/2025

Applications are now open for Summer 2025 participation in the Linux Foundation Mentorship Program, a program that provides structured guidance and opportunities for newcomers to contribute to th [ ... ]

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Recent Articles

Recent Book Reviews

Popular Articles

More Information

Related Articles

Comments