In case you hadn’t noticed, Artificial Intelligence is in the news. With the rise of generative AI driving Chat-GPT (although the jury is still firmly out in terms of how much use it will be in software development spheres) we have entered an era of automation and autonomous systems management that few envisaged. So then, with a blast radius arguably capable of impacting almost every facet of modern computing, how with automation play out in the database space?
A case in point may be being showcased with PostgreSQL, the open source relational database technology that evolved from the POSTGRES system that was developed at UC Berkeley back in early 1986. Known for its extensibility, its migration compliance functionality and its hands-on data-wrangling capabilities drawn from its Structured Query Language (SQL) DNA roots, Postgres enjoys a fanbase that spans both developers and of course database administrators (DBAs).
I love my database
So much is Postgres adored, it seems, that the 2022 Stack Overflow developer survey ranked PostgreSQL as both the most loved database technology and also the most wanted database technology. But can a software data system that’s now been around for some four decades withstand the impact of the new era of automation?
This question has come about because PostgreSQL is now being deployed on a massive scale – and so, logically, with the vast increase in the number of deployed PostgreSQL instances comes the need for automation and automated management. It was designed and implemented in a simpler era, when one or a few computers, possibly not even connected to a network, might have served the needs of a whole company.
“Even today, there are ‘holdovers’ from that simpler time embedded deeply in PostgreSQL’s design,” explains Robert Haas, VP chief database scientist at EDB (formerly known as EnterpriseDB, a company that specialises in enterprise PostgreSQL deployments. “Its security model presumes that the same person is responsible for both the administration of objects within the database and the operating system account into which it has been installed. This failover mechanism was really designed with the idea of allowing a knowledgeable human being to perform a manual failover, rather than with the idea of allowing fully automated, unattended failover.”
Indeed, Haas makes a crucial point, in many ways, the entire design of the system presupposes a knowledgeable DBA who oversees system operations in detail and can fix problems in any area as they happen. Yet, increasingly, PostgreSQL is being deployed in environments where these assumptions do not hold.
Let’s suggest that today, a cloud provider takes responsibility for the operating system and for failover and backups, but wishes to leave the administration of objects within the database to the customer. The customer doesn’t know much about PostgreSQL specifically; they want a database that ‘just works’, with minimal administration requirements and 100% uptime. DBAs are often responsible not just for one or a few database instances but for hundreds or even thousands of database instances, limiting the amount of time that they can afford to spend tinkering with individual systems.
“Adapting to these new realities is a challenge for PostgreSQL. Writing new code, or adapting existing code, is hard, but understanding what code should be written, or how to evolve the architecture, can be even harder,” said Haas, bringing home a certain sense of reality to proceedings. “Many PostgreSQL developers have been working on the system for decades and using it for even longer. We are, at times, stuck in the past. We remember the things that we wanted from PostgreSQL when we began using it, but new users may be looking for something altogether different.”
Wrapping (not applying) automation
Part of the reason that PostgreSQL is still succeeding in spite of these difficulties is that Database-as-a-Service (DBaaS) providers are ‘wrapping automation’ around a system that wasn’t truly designed to operate in such an environment. This automation takes shape across the lifecycle of a given database instance, from the provisioning of a highly available cluster, to automated failover between members in a cluster, to the taking of backups of a database cluster.
What we see today is a reality where DBaaS providers have realised that users want PostgreSQL and value the user experience that PostgreSQL can provide and they’ve found ways of solving (or at least working around) the challenges of administering the system in a fully automated way. Arguably, the idea of working around inherent challenges in the database system has become part of the ‘new solution set’ of the PostgreSQL ecosystem, fabric and technology proposition as a whole.
“To be successful far into the future, PostgreSQL, like any piece of software, must continue to evolve. It’s not good enough to allow DBaaS providers to scramble
and somehow work around the problems that hinder the automated management of PostgreSQL. PostgreSQL itself needs to be redesigned to facilitate such management, to make it easy, so that it can be more widely adopted, and more successful in the environments where it’s already present,” cautioned Haas, in a balancing comment as we now move forwards.
He believes that, over time, that evolution (with whatever degree of refactoring it takes) will happen. Reminiscing on how we got to this point, Haas says that when he first started using PostgreSQL, it had no built-in replication of any kind. Many would (and perhaps did) say that this was a clear example of poor design.
What is lock contention?
It was so poorly designed for multi-core systems that lock contention (a SQL database ‘phenomenon’ that occurs if multiple application or data service processes attempt to gain access to the same data concurrently, with a ‘lock’ being the access limitation action in the face of the ‘contention’ caused by multiple threads of execution) was clearly visible in CPU profiles even on a 2-core system.
Looking at how far we have come then, Haas argues that far more administration was required back then than is needed now, which, in reality, was for what he calls out as ‘embarrassingly poor’ reasons.
“Even basic SQL functionality was missing: there was no way to drop a column from an existing table. From today’s perspective, it seems astonishingly primitive,” he said. “The challenges that it faces today are the result of its own previous success. They will not be solved in a week, or a month, or even in a year. But, when we look back on today’s PostgreSQL a decade from now, we may find ourselves astonished by the progress it has made.”
This whole history, development cycle and necessary evolution is why EDB’s Haas thinks there is a great chance that PostgreSQL will continue to overcome the difficulties that stand in the way of even greater adoption. Going on this journey with the PostgreSQL cognoscenti throws up some insight into the way automation is impacting currently deployed technologies that were never really designed for the modern age of the web, the cloud and AI. We can be smarter though, mainly because we have to be.
Free image use: Wikimedia Commons