PostgreSQL has recently claimed the DB-Engines DBMS of the Year for 2023 award. Another confirmation of PostgreSQL's worth.
Despite its age PostgreSQL isn't declining in popularity. On the contrary it always in front, as DB-Engines's official announcement confirms:
PostgreSQL is the database management system that gained more popularity in our DB-Engines Ranking within the last year than any of the other 417 monitored systems. We thus declare PostgreSQL as the DBMS of the Year 2023.
But, why?
What are the metrics, how is that ranking calculated? DB-Engines ranks products by their current popularity according to the following methodology :
Number of mentions of the system on websites, measured as number of results in search engines queries. At the moment, we use Google and Bing for this measurement. In order to count only relevant results, we are searching for <system name> together with the term database, e.g. "Oracle" and "database".
General interest in the system. For this measurement, we use the frequency of searches in Google Trends.
Frequency of technical discussions about the system. We use the number of related questions and the number of interested users on the well-known IT-related Q&A sites Stack Overflow and DBA Stack Exchange.
Number of job offers, in which the system is mentioned. We use the number of offers on the leading job search engines Indeed and Simply Hired.
Number of profiles in professional networks, in which the system is mentioned. We use the internationally most popular professional network LinkedIn.
Relevance in social networks. We count the number of Twitter tweets, in which the system is mentioned.
The criterion for becoming DB-Engine of the Year is having the largest increase in popularity between successive Januarys
This is the fourth time that PostgreSQL gets the DB-Engines award - 2017, 2018 and 2020 were the three previous years. Who would have thought back in the 80's that the humble Ingres fork would become one of the most, if not the most, successful DBMS of all time?
There are a few reasons behind its success. First that it is truly open source at heart, therefore embraced by a strong and vibrant community. Postgres became open source therefore open to contributions once it escaped the confinements of the Berkley laboratory. It was this property that in the end let it evolve into a melting pot of the newest and greatest ideas.
Then, it was shaped by the visions pioneer Mike Stonebraker had, which formed the basis of the marvel that followed. Some of these that utilized ideas described in the The Enduring Influence Of Postgres were:
Supporting ADTs in a Database System At the core of the Object-Relational database notion was the support of ADTs or Abstract Data Types that went beyond the traditional ones handled by the database. These were complex objects or data which had to be stored as nested bundles in stark contrast to the relational model's classical data flattening for removing duplication. This began as an attempt to cater for the needs of CAD applications which use data types such as polygons, rectangles or even fully blown objects such as circuit layout engines.The shape that it takes today is JSON, JSONB, or XML.
Extensible access methods for new data types Yet another innovation were the B-Tree indexes that everyone is familiar with today, as well as the R-Tree indexes which allowed for running two dimensional range queries on data.
Active Databases and Rule Systems Rules or triggers pioneered under Ingres were yet another construct popularized by Postgres that found its way into all the major database engines
Log-centric Storage and Recovery Not fond of the write ahead logging schemes, Stonebraker "unified the primary storage and historical logging into a single, simple disk-based representation"
Support for Multiprocessors: XPRS Sharing memory and processors to support parallel query optimization was yet again another Postgres induced novelty.
Support for a Variety of Language Models The impedance mismatch between Object Oriented programming languages and the declarative relational model was and still is one of the hottest problems of the computer industry.Instead of getting sucked into this never ending debate, Stonebreaker introduced the Object-Relational Database , totally sidestepping the Object Oriented Databases.
These were the principle PostgreSQL was built upon.However, the PostgreSQL core devs did not rest; they continued innovating, they kept on adding. For instance:
Not just hash and B-Tree indexes;PostgreSQL has many of them, like GIN,GiST for full text search and geospatial scenarios, SP-GiST, RUM, BRIN and Bloom.
Storing and querying JSON directly into the database or JSONB if you need to index it by using a GIN index.
Extending the core engine with PostGIS that turns Postgres into a geospatial database together with new datatypes and operators, handing PostgreSQL the edge over multi-million commercial counterparts.
pub/sub support inside the database? Yes,PostgreSQL can do that too with the LISTEN/NOTIFY command.
User Defined Functions in programming languages like Python with PL/Python.
Years down the line, PostgreSQL 9.2 introduced yet another type, Range, that represents a range of values of some element type;14 went one step further by introducing ‘multirange’ types which allow for non-contiguous ranges, helping developers write simpler queries for complex sequences like specifying the ranges of time a meeting room is booked through the day.
Then:
Enhanced monitoring and observability, since you can track the progress of all WAL activity
Functions for using regular expressions to inspect strings: regexp_count(), regexp_instr(), regexp_like(), and regexp_substr().
And so on, up to PostgreSQL version 16 which was released three months ago heralding:
query performance improvements with more parallelism
developer experience enhancements
monitoring of I/O stats using pg_stat_io view
enhanced security features
improved vacuum process
But this is just about the Core.The rest is plugins; extensibility has to be its killer feature as there's just an extension for everything. This is because PostgreSQL has been built with that kind of philosophy in mind:
With extensibility as an architectural core, it is possible to be creative and stop worrying so much about discipline: you can try many extensions and let the strong succeed. Done well, the “second system” is not doomed; it benefits from the confidence, pet projects, and ambitions developed during the first system.
As a matter of fact, here at I Programmer we have been closely monitoring the PostgreSQL extension landscape, covering many new extensions that take the experience to a whole new level:
Distributed database? no problem. Oracle Database and IBM Db2 both provide a shared-nothing architecture (each node in the cluster has its own compute and storage) as separate features and with Citus, PostgreSQL could do so very much too. Along with this you could also scale out horizontally by adding more servers to Citus' clusters and rebalance data from the already existing servers to the new ones without downtime.
pg_ivm an extension module for PostgreSQL 14 that provides an Incremental View Maintenance (IVM) feature. That means that materialized views are updated immediately after a base table is modified.
pgsqlite a pure python module and command line tool that makes it simple to import a SQLite database into Postgres, saving a ton of time and hassle in the process.
pg_later - Native Asynchronous Queries Within Postgres an interesting project and extension built by Tembo which enables Postgres to execute queries asynchronously. Fire your query - but don't forget to check later for the result.
Hydra Hydra is an open-source extension that adds columnar tables to Postgres for efficient analytical reporting.
pg_vector an extension for PostgreSQL that renders it a viable alternative to specialized vector stores used in LLMs.
pg_bmp25 which bakes ElasticSearch FTS capabilities into PostgreSQL itself. It is true that full-text search was probably the only thing not good enough on Postgres.
PeerDb an ETL/ELT tool built for PostgreSQL that makes all tasks that require streaming data from PostgreSQL to third party counterparts as effortless as it gets.
PostgreSQL has also been an attractive starting point for building commercial database systems given its permissive open source license, its robust codebase, its flexibility, and breadth of functionality. As a matter of fact many such forks occupy a place in DB-Engines Ranking such as:
Greenplum
TimescaleDB
YugabyteDB
EDB Postgres
As such it's no wonder that big corps jumped on the bandwagon in offering managed PostgreSQL instances, such as Amazon Web Services, Microsoft Azure or Google Cloud. Or offering managed adaptations of such as:
Azure CosmosDB Microsoft's mutli-model distributed database for supporting workloads at scale, which extendes beyond NoSQL by adding support for PostgreSQL as well.
Amazon Aurora Amazon Aurora PostgreSQL is a fully managed, PostgreSQL–compatible, and ACID–compliant relational database engine
Amazon RDS for PostgreSQL which supports Trusted Language Extensions (TLE) for PostgreSQL, so that you can build high performance extensions and safely run them on Amazon RDS using popular trusted languages without needing AWS to certify code.
That last part is a boon for developers working with the Cloud offerings.In "Trusted Language Extensions Bring PostgreSQL Procedural UDFs To The Cloud" I explain:
PostgreSQL allows user defined functions to be written in other languages besides SQL and C. These other languages are generically called procedural languages (PLs). Procedural languages aren't built into the PostgreSQL server; they are offered by loadable modules. That way you can extend your database with powerful features not found in SQL. For instance you can write a PL/Perl procedure to accept a string from your SQL to apply regular expressions to it in order to tokenize it.
This of course comes with security issues when the database invokes code like file system operations, or using statements that could interact with the operating system or database server process. For that Trusted Language Extensions (TLE) came about. They are PostgreSQL extensions that you can safely run on your DB instance. Trusted Language Extensions do not provide access to the filesystem and are designed to prevent access to unsafe resources as its runtime environment limits the impact of any extension defect to a single database connection.TLE also gives database administrators fine-grained control over who can install extensions, and provides a permissions model for running them.
To take a step further, DbDev The DbDev Package Manager For PostgreSQL TLEs by SupaBase now supports installing them in your PostgreSQL instance, like NPM does for Javascript packages.Basically what it offers is instead of fiddling around trying to install your extension DbDev streamlines this process.
So here you have it. This is why PostgreSQL is the most popular database, and that by barely scratching the surface. There's a lot under the tip and a lot yet to come. 2024 looks very exciting.
December 1st is much anticipated among those who like programming puzzles. It is time to start solving small but tricky puzzles on the Advent of Code website with the goal of amassing 50 stars by Chri [ ... ]