Stanchion Turns SQLite Into A Column Store
Written by Nikos Vaggalis   
Thursday, 15 February 2024

Stanchion is an open-source extension that adds columnar storage capabilities to SQLite, rendering it efficient for analytical reporting.

Postgres is often praised for its extensibility, but we tend to forget that others, such as SQLlite, are extensible too.

sqlite

SQLite has fewer built-in functions compared to PostgreSQL and others. For that reason, the authors provided an extension mechanism which allows adding almost anything on top of the Core. As a result, there are a lot of SQLite extensions out there and an attempt to collect them all under one roof is the Github repo of Sqlean. As such there you'll find extensions such as :

  • crypto: secure hashes
  • fileio: read and write files
  • fuzzy: fuzzy string matching and phonetics
  • ipaddr: IP address manipulation
  • json1: JSON functions
  • math: math functions
  • re: regular expressions
  • stats: math statistics
  • text: string functions
  • unicode: Unicode support
  • uuid: Universally Unique IDentifiers
  • vsv: CSV files as virtual tables

Most of them are written in C or Rust, but there's also Stanchion, written in Zig.

Stanchion tries to answer the age old question; OLAP or OLTP? Should you have a separate database for your transactional workloads and another for your warehousing needs?

There are a number of situations where OLAP and column-oriented storage outperforms row-oriented storage:

  • Storing and processing metric, log, and event data
  • Timeseries data storage and analysis
  • Analytical queries over many rows and a few columns (e. g. calculating the average temperature over months of hourly weather data)
  • Change tracking, history/temporal tables
  • Anchor modeling / Datomic-like data models

Stanchion allows you to have both column-oriented storage and row-oriented storage under the same roof making SQLite usable as an analytics store on embedded, portable or limited resource devices. Stanchion is an ideal fit for analytical queries and wide tables because it only scans data from the columns that are referenced by a given query. (See SQL example bellow)

It also uses compression techniques like run length and bit-packed encodings that significantly reduce the size of stored data, greatly reducing the cost of large data sets. This makes it an ideal solution for storing large, expanding datasets.

As such the column-oriented data is stored in large BLOBs inside of regular SQLite tables, trying to circumvent the limitations posed by SQLite's virtual table API which is designed with row-oriented access in mind. Despite that Stanchion uses a few workarounds to enable analytical queries :

  • lazily loading segments
  • encodings/bit packing for booleans and integers which reduce the size of data that needs to be traversed on disk
  • using clustered index for minimizing data access.

As sqlite extensions are just dynamic libraries you just have to load it into your database from the SQLite CLI:

. load /path/to/libstanchion

Then you can create your first stanchion table much like creating any table in SQLite, with the difference
being that you must use the 'USING stanchion' phrase which tells SQLite to create a virtual table that is implemented by stanchion.

CREATE VIRTUAL TABLE dnd_monsters
USING stanchion (
id INTEGER NOT NULL,
name TEXT NOT NULL,
type TEXT NOT NULL,
size INTEGER NOT NULL,
challenge_rating FLOAT NOT NULL,
SORT KEY (id)
);


After that you can insert your data like you usually do :

INSERT INTO dnd_monsters (id, name, type, size, challenge_rating)
VALUES
(1, 'Beholder', 'ABERRATION', 4, 13),
(2, 'Gelatinous Cube', 'OOZE', 4, 2),
(3, 'Mimic', 'MONSTROSITY', 3, 2),
(4, 'Lich', 'UNDEAD', 3, 21);

and query it like you usually do:

SELECT name
FROM dnd_monsters
WHERE type = 'UNDEAD' AND challenge_rating >= 18;

The difference here is that because the `dnd_monsters` table is column-oriented, that query only reads data from the `name`, `type`, and `challenge_rating` columns. Data in the `id` and `size` columns are not accessed or scanned at all!

So there you have it. Local based analytics on devices with limited resources, see mobiles phones or Raspberry Pi, without having to use a separated dbms product or modifying your existing stack. And don't think that the local part is to be underestimated. There's lately a range of products that put local-first, first, like we examined in TinyBase And The Local First Movement. What's the deal with those apps?

  • They can be used offline and sync with a remote service when going online
  • They are effective in scenarios that continuous data connectivity is not guaranteed-don't assume that there's a Cloud always on the backend
  • They are local first - you own your data; they are not deposited on a Cloud provider. Unless you own the data or it doesn't reside on your own devices then it's not yours.

In conclusion, besides the advantages Stanchion brings to the table, I have a suggestion to make on top to drive the project a bit more forward. That is, in order to implement better compression, maybe to consider integrating the sqlite-zstd extension which provides transparent dictionary-based row-level compression that allows compressing a sqlite database's entries almost as well as if you were compressing the whole DB file, while at the same time retaining random access. More on that
on Use Rust To Reduce The Size Of Your SQLite Database. Of course sqlite-zstd would need some tweaking to cater for columnar compression, but that's not impossible!

sqlite

More Information

Stanchion
 

Related Articles

Use Rust To Reduce The Size Of Your SQLite Database

In Praise Of SQLite

Hydra Turns PostgreSQL Into A Column Store

TinyBase And The Local First Movement

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Apache Fury Adds Optimized Serializers For Scala
31/10/2024

Apache Fury has been updated to add GraalVM native images and with optimized serializers for Scala collection. The update also reduces Scala collection serialization cost via the use of  encoding [ ... ]



C23 ISO Standard Is Here But You Probably Won't Read It
06/11/2024

At last ISO C23 has been published, but at $250 you probably aren't going to read it. Can we really tolerate this sort of profiteering on the work of others? This is worse than academic publishing!


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info