DuckDB And Hydra Partner To Get DuckDB Into PostgreSQL |
Written by Nikos Vaggalis |
Monday, 11 November 2024 |
The offspring of that partnership is pg_duckdb, an extension that embeds the DuckDB engine into the PostgreSQL database, allowing it to handle analytical workloads. In this modern era of data oriented workloads, software engineers usually juggle many disparate sources of data by using many different developer tools. What if you could have one tool to rule them all? That tool exists and is called Postgres; it really can handle anything you throw at it. The news is that in yet another twist in the PostgreSQL-for-everything narrative, PostgreSQL now also becomes OLAP+OLTP = HTAP ready, thanks to MotherDuck's partnership with Hydra. Saying that, HTAP readiness is nothing new for PostgreSQL as we've examined before in "Hydra Turns PostgreSQL Into A Column Store", an open-source extension that adds columnar tables to Postgres for efficient analytical reporting, allowing it to become HTAP ready by being able to support both transactional and analytical workloads to provide real-time data access and analysis capabilities in a single system. While since Hydra's inception PostgreSQL could handle analytical workloads, however outsourcing the analytical engine to DuckDB offers optimized performance plus the ability to read and write files on external object stores like AWS and S3, therefore becoming ideal for querying data against Data Lakes. At this point, it's important to note that PostgreSQL's integration with DuckDB is not something new. There's also another extension pg_lakehouse (now pg_analytics) by ParadeDB, which we examined in Pg_lakehouse Makes PostgreSQL Quack. In there we found that : Queries are pushed down to DuckDB for processing data like events, metrics, historical snapshots, vendor data, but that's one part of the equation, the query engine. The other is being able to fetch those foreign object stores like S3 and table formats like Iceberg or Delta Lake. So what is the difference between pg_analytics and pg_duckdb, you say? pg_analytics uses the foreign data wrapper (FDW) API to connect to those disparate data sources, while with pg_duckdb the query is executed by DuckDB directly. Other than that, pg_duckdb is the officially endorsed PostgreSQL extension for DuckDB, which does imply better support and features implementation? And it's not just pg_analytics, there's pg_mooncake as well! So there's actually three extensions interfacing with DuckDB! Embedding libduckdb requires a substantial amount of compilation time and results in a dramatic increase in the size of the package. This problem is exacerbated when considering combination for 3 PostgreSQL major version and 5 OS distribution, plus it is not feasible to install these extensions simultaneously. The fix goes to suggest the adoption of a shared libduckdb library which could significantly reduce compilation times and package sizes and also eliminate extension conflicts. But enough talk already, just show me what pg_duckdb can do! Then you can read from an Iceberg table : Of course you can also write back to your data lake and so on. Installation wise, there's Docker images available on Dockerhub which are based on the official Postgres image: docker run -d -e POSTGRES_PASSWORD=duckdb pgduckdb/pgduckdb:16-main Pre-built binaries that are available via pgxman: pgxman install pg_duckdb Or, if you feel brave enough, compile from source by cloning the Github repo. To conclude. We have documented the whys of going all the way Postgres in "It's 2024. Why Does PostgreSQL Still Dominate?". Since then, there's been an avalanche of new developments that has firmly established Postgres at the number one spot in the hearts of the developers. In return for Postgres taking such good care of them, the thriving PostgreSQL community gives back by thinking up all kinds of novel ideas that go to improve it. Of course, this isn't unexpected because PostgreSQL has been built with great extensibility in mind since its inception. And this, ladies and gentlemen, is how you build a product that lasts through the ages...
More Informationpg_duckdb Related ArticlesPg_lakehouse Makes PostgreSQL Quack Hydra Turns PostgreSQL Into A Column Store It's 2024. Why Does PostgreSQL Still Dominate?
To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
Comments
or email your comment to: comments@i-programmer.info |
Last Updated ( Wednesday, 13 November 2024 ) |