ElasticSearch Search Capabilities Baked Into PostgreSQL |
Written by Nikos Vaggalis | |||
Tuesday, 19 December 2023 | |||
That's right. No need anymore for a separate database alongside PostgreSQL. ParadeDB has FTS covered. Really the ingenuity around the PostgreSQL ecosystem never ceases to amaze me. PostgreSQL comes with many bells and whistles but really, Extensibility has to be its killer feature. There's just an extension for everything. This is because PostgreSQL has been built with that kind of philosophy in mind, as examined in The Enduring Influence Of Postgres : With extensibility as an architectural core, it is possible to be creative and stop worrying so much about discipline: you can try many extensions and let the strong succeed. Done well, the “second system” is not doomed; it benefits from the confidence, pet projects, and ambitions developed during the first system As a matter of fact, here at I Programmer we have been closely monitoring the landscape, covering many new extensions pg_ivm - Materialised Views On Steroids pgsqlite - Load SQLite databases into PostgresSQL pg_later - Native Asynchronous Queries Within Postgres Hydra Turns PostgreSQL Into A Column Store Turn PostgreSQL Into A Vector Store And here's yet another; pg_bmp25 which bakes ElasticSearch FTS capabilities into PostgreSQL itself. It is true that full-text search was probably the only thing not good enough on Postgres. Many projects function just fine with Postgres's Full Text Search native engine and other built-in extensions like trigram search (pg_trgm). In 2016, an open source search library called Tantivy emerged and later on, a library called pgrx. On the other hand a few projects have attempted to put such a search engine inside of Postgres , but every single one has relied on an external ElasticSearch instance. This meant introducing a complex and expensive piece of infrastructure into the data stack. Perhaps the best-known example of this kind of design is a Postgres extension called ZomboDB. All that might make PostgreSQL come close to ElasticSearch level querying but not close enough. That is what pg_bmp25 aims to change. pg_bm25, named after BM25, the algorithm used by modern search engines to calculate the relevance scores of search results, is a Rust extension that aims to bridge the gap between the native capabilities of Postgres’ full text search and those of a specialized search engine like ElasticSearch. The goal is to eliminate the need to bring a cumbersome service like ElasticSearch into the data stack. With pg_bm25 you get:
At this point, let's take a look at yet another property of PostgreSQL that is relevant here;that of Commercial Adaptations: PostgreSQL has long been an attractive starting point for building commercial database systems, given its permissive open source license, its robust codebase, its flexibility, and breadth of functionality As the result, many products are based on, or are even forks of it:
And of course PeerDb as examined in "PeerDB Brings Real Time Streaming To PostgreSQL" an ETL/ELT tool built for PostgreSQL that makes all tasks that require streaming data from PostgreSQL to third party counterparts as effortless as it gets. The Commercial Adaptation property is relevant here too because while pg_bm25 is available for self hosted PostgreSQL instances, Cloud providers are too strict to allow users installing third party extensions, eg on an RDS or Aurora managed instance. As such, pg_bmp25 is offered preinstalled on a managed Postgres database called ParadeDB. Philippe Noël, senior ParadeDB dev states : Most customers should have a "search node" isolated from their production database, for independent scaling + resilience, and that ParadeDB should be shipped alongside, not inside, your production DB. As such, ParadeDB provides the most comprehensive search features of any Postgres database, and is a good fit for you if:
On the other hand it is not a good fit if:
ParadeDB however goes beyond FTS and pg_bmp25. They also publish the pg_sparse extension which enables sparse vector search using HNSW in Postgres , as such combining searching with pgvector and pg_bm25, that way ParadeDB can offer both keyword-based search as well as semantic search. While pg_bmp25 and pg_sparse are available for downloading and installing on your selfhosted PostgreSQL instance, docker run \ ParadeDB is currently in public beta and ParadeDB managed Cloud is currently being revamped. More InformationRelated ArticlesThe Enduring Influence Of Postgres
To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
Comments
or email your comment to: comments@i-programmer.info |
|||
Last Updated ( Tuesday, 19 December 2023 ) |