Turn PostgreSQL Into A Vector Store |
Written by Nikos Vaggalis | |||
Tuesday, 26 September 2023 | |||
pg_vector is an extension for PostgreSQL that renders it a viable alternative to specialized vector stores used in LLMs. We show you how to use it and report on the latest, faster incarnation, written in Rust, pgvecto.rs. In "Learn To Chat with Your Data For Free", we've explored a LangChain course about that exact purpose going through the steps necessary. One of those steps was storing the data into a Vector store. As a refresher, a vector or embedding is that: embeddings take a piece of text and create a numerical representation of that text. Text with similar content will have similar vectors in this numeric space. What that means is we can then compare those vectors and find pieces of text that are similar. NymPy can be used to convert the text into embeddings and then load them into the Chroma vector store. pg_vector allows you to replace Chroma or any other specialized vector engine with Postgres, so that you can have your embeddings stored alongside your JSON or relation data under the same roof. That capability renders Postgres usable in an AI or ML setting. At the same time when being able of hosting embeddings and without even touching the LLM yet, we still can do useful similarity searches like KNN and ANN from within Postgres. Using the extension is a simple as: Enable the extension (do this once in each database where you want to use it): CREATE EXTENSION vector: -- Create a vector column with 3 dimensions -- Insert vectors -- Get the nearest neighbors by L2 distance You can use pgvector from any language with a Postgres client -C, C#, Perl, Java, even Dart, you name it. You can even generate and store vectors in one language and query them in another. The extension itself is written in C. However, very recently pgvecto. rs emerged which is pgvector but written in Rust, fostering extra advantages over the original :
Also based on benchmarks, pgvecto.rs can be up to 2x faster than pgvector on HNSW indexes with the same configurations. Speed is just one property of pgvecto.rs, however. pgvecto. r is architected that way to easily add new algorithms to it so that contributors can implement new indexes with ease. For instance while pgvecto. rs comes by default with two built-in index types - HNSW for maximum search speed and ivfflat for quantization-based approximate search, anyone can create additional indexes like RHNSW, NGT, or custom types tailored to specific use cases. Using it is similar to the procedure necessary for pgvector : CREATE EXTENSION vectors; -- create table with a vector column CREATE TABLE items ( You can then populate the table with vector data as follows. -- insert values INSERT INTO items (embedding) -- or insert values using a casting from array to vector INSERT INTO items (embedding) You can then call the distance function through operators -- squared Euclidean distance or search for a vector simply like this. -- query the similar embeddings And with that, Postgres is here to rule them all. Due to its capability to be extend without affecting the core, Postgres is truly open to innovation; the limit is the imagination of its open source community.
More InformationRelated ArticlesThe DbDev Package Manager For PostgreSQL TLEs Turn Your SQLite Database Into A Server Learn To Chat with Your Data For Free
To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
Comments
or email your comment to: comments@i-programmer.info |
|||
Last Updated ( Tuesday, 26 September 2023 ) |