Turn PostgreSQL Into A Vector Store

Written by Nikos Vaggalis

Tuesday, 26 September 2023

pg_vector is an extension for PostgreSQL that renders it a viable alternative to specialized vector stores used in LLMs. We show you how to use it and report on the latest, faster incarnation, written in Rust, pgvecto.rs.

In "Learn To Chat with Your Data For Free", we've explored a LangChain course about that exact purpose going through the steps necessary.

One of those steps was storing the data into a Vector store. As a refresher, a vector or embedding is that:

embeddings take a piece of text and create a numerical representation of that text. Text with similar content will have similar vectors in this numeric space.

What that means is we can then compare those vectors and find pieces of text that are similar. NymPy can be used to convert the text into embeddings and then load them into the Chroma vector store.

pg_vector allows you to replace Chroma or any other specialized vector engine with Postgres, so that you can have your embeddings stored alongside your JSON or relation data under the same roof.

That capability renders Postgres usable in an AI or ML setting. At the same time when being able of hosting embeddings and without even touching the LLM yet, we still can do useful similarity searches like KNN and ANN from within Postgres.

Using the extension is a simple as:

Enable the extension (do this once in each database where you want to use it):

CREATE EXTENSION vector:

-- Create a vector column with 3 dimensions
CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));

-- Insert vectors
INSERT INTO items (embedding) VALUES ('[1, 2, 3]'), ('[4, 5, 6]');

-- Get the nearest neighbors by L2 distance
SELECT * FROM items ORDER BY embedding <-> '[3, 1, 2]' LIMIT 5;

You can use pgvector from any language with a Postgres client -C, C#, Perl, Java, even Dart, you name it. You can even generate and store vectors in one language and query them in another. The extension itself is written in C.

However, very recently pgvecto. rs emerged which is pgvector but written in Rust, fostering extra advantages over the original :

Transaction support
Sufficient Result with Delete/Update/Filter
Vector Dimension Limit 65535
Prefilter on HNSW
Parallel Index build
Linearly faster with more cores
Index Persistence mmap file
WAL amplification 2x

Also based on benchmarks, pgvecto.rs can be up to 2x faster than pgvector on HNSW indexes with the same configurations.

Speed is just one property of pgvecto.rs, however. pgvecto. r is architected that way to easily add new algorithms to it so that contributors can implement new indexes with ease. For instance while pgvecto. rs comes by default with two built-in index types - HNSW for maximum search speed and ivfflat for quantization-based approximate search, anyone can create additional indexes like RHNSW, NGT, or custom types tailored to specific use cases.

Using it is similar to the procedure necessary for pgvector :

CREATE EXTENSION vectors;

-- create table with a vector column

CREATE TABLE items (
id bigserial PRIMARY KEY,
embedding vector(3) NOT NULL
);

You can then populate the table with vector data as follows.

-- insert values

INSERT INTO items (embedding)
VALUES ('[1, 2, 3]'), ('[4, 5, 6]');

-- or insert values using a casting from array to vector

INSERT INTO items (embedding)
VALUES (ARRAY[1, 2, 3]::real[]), (ARRAY[4, 5, 6]::real[]);

You can then call the distance function through operators

-- squared Euclidean distance
SELECT '[1, 2, 3]' <-> '[3, 2, 1]';
-- negative dot product distance
SELECT '[1, 2, 3]' <#> '[3, 2, 1]';
-- negative square cosine distance
SELECT '[1, 2, 3]' <=> '[3, 2, 1]';

or search for a vector simply like this.

-- query the similar embeddings
SELECT * FROM items ORDER BY embedding <-> '[3, 2, 1]' LIMIT 5;
-- query the neighbors within a certain distance
SELECT * FROM items WHERE embedding <-> '[3, 2, 1]' < 5;

And with that, Postgres is here to rule them all. Due to its capability to be extend without affecting the core, Postgres is truly open to innovation; the limit is the imagination of its open source community.

postgreslogo

More Information

pgvector

pgvecto.rs

The DbDev Package Manager For PostgreSQL TLEs

Turn Your SQLite Database Into A Server

Learn To Chat with Your Data For Free

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

.NET Preview 5 Improves C#
16/06/2025

.NET 10 Preview 5 has been released with enhancements to C# 14 and additions to the Post-Quantum Cryptography (PQC) library. The .NET runtime has also been improved.

+ Full Story

Parasoft Adds AI Assistant To C/C++ Test
30/06/2025

Parasoft has updated its C/C++ Test software with an AI-powered documentation assistant, along with complete support for MISRA C:2025 and auto-suppression of equivalent violations. C/C++ Test can be u [ ... ]

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 26 September 2023 )

Recent Articles

Recent Book Reviews

Popular Articles

More Information

Related Articles

Comments