Postgresso 8 (69)

PostgreSQL 17 RC1

Release candidate 1 is out. Compared to beta 3: MERGE/SPLIT sections have been rolled back. The final release (GA) is scheduled for September 26th.

Postgres Pro Enterprise 16.4.1 Version Updates, 15.8.1, 14.13.1, 13.16.1, 12.20.1

There are some changes compared to previous versions. For example, in 16.4.1 the number of attempts to reschedule a query has been reduced.

Conferences

PGConf.SPb 2024

It appeared schedule. First report – Pavla Luzanovawho heads the education department at Postgres Professional. It's called modestly: PostgreSQL 17. In the meantime, you can read it PostgreSQL 17: Part 5 or Commitfest 2024-03.

Andrey Borodin (Yandex Cloud) will present Unusual features of the WAL-G backup systemA Daria Lepihova And Alexey Darwin (both Postgres Professional) – Choosing a Replication Protocol When Developing pg_probackup3 (let us remember that the 3rd version is not the next one, it is practically a rewrite from scratch of pg_probackup, unlike 2.x).

Sequel and Prequel: Entertaining Archaeology Egor Rogov – this is a historical excursion. A new genre (it seems) for Egor, but I have no doubt that it will be informative and fascinating:

I will tell you how people worked with databases before Codd and what changed with the invention of relational theory; we will talk about the origins of the first relational systems – System R and Ingres; about how the SQL language appeared and gained popularity; about the people who defined our present and, to some extent, our future.

This conference will also feature professional certification in PostgreSQL. Testing will be conducted on PostgreSQL versions 10 (DBA2, DBA3, QPT) and 13 (DBA1, DBA2, DBA3, QPT, Expert 10 → 13). Holders of the “Expert” level in PostgreSQL 10 will be able to pass the transition test “Expert 10 → 13” and receive the certificate “PostgreSQL Administrator 13. Expert” using a simplified system. Testing is in-person only. To pass certification, you need to sign up for personal account Postgres Professional website.

Back from PGDay UK 2024

This PGDay is dedicated to the memory of Simon Riggs (Simon Riggs) – the founder of 2nd Quadrant. The author of this sketch is Stefan Ferkot (Stefan Fercot, Data Egret) — says that the program was compact, but the reports are interesting. He himself gave a report “from life”: How To Achieve Minimal Downtime In PostgreSQL Maintenance Tasks.

I talked about the innovations of the 17th version Magnus Hagander (Magnus Hagander, Redpill Linpro). Stefan Ferkot liked about this version that some statistics from pg_stat_bgwriter transferred to the performance pg_stat_checkpointer.

Magnus has it website There is a small (and not complete) list of upcoming and past conferences. Upcoming:

PGConf NYC 2024

Will take place from September 30 to October 2.

PGConf.EU 2024

October 22-25, Athens.

NordicPGDay 2025

March 18 in Copenhagen.

Writing a good talk proposal

And here it is Tomas Vondra (Tomas Vondra) talks not about the conference, but about how to get there as a speaker – how to charm the program committee with your description of the report.

He gave his recommendations on this topic. Bruce Momjian. Six years ago, but – this is Bruce Momjian: Submitting Talks to Conferences.

LM Library

Let's decipher: library not in the programmer's sense, but in the original sense, and LM = ShaggyMammoth, who has already collected a solid collection of his articles on Habr (of course, his colleagues from Postgres Professional, who specialize in their fields, help him write them). This issue will have a lot about our company – a lot of interesting things have accumulated. The balance will be restored in the following issues [!!нужна ли такая ремарка? вроде надо бы]

Michael Stonebraker: “Everything new is well forgotten old. Continuation”

Let's start with the translation. About the original – What Goes Around Comes Around… And Around… We wrote about it in the issue before last. The authors are the great Michael Stonebraker and an outstanding theorist and practitioner (but, alas, OtterTunehis most interesting startup ordered other startups may live long) Andrew Pavlo (Andrew Pavlo, we are more used to Andy, although he is a professor at Carnegie Mellon).

This article is a real event. The translation, accordingly, is too. It is also good for marketing people – it has a tough and well-reasoned message: new architectures come and go, relational DBMSs remain. But above all, it is good for its content: I read this huge article with bated breath: not because of the jokes (there are none), but fascinated by the way the most complex architectural concepts are laid out on shelves and cleared of marketing accretions. They think clearly and formulate clearly. First, the following architectures that have appeared in the last 20 years are considered:

  1. MapReduce systems.

  2. Key-value stores.

  3. Document-oriented databases.

  4. Column Family databases.

  5. Text search engines.

  6. Array databases.

  7. Vector databases.

  8. Graph databases.

Here are the ones that have appeared recently:

  1. Column systems.

  2. Cloud databases.

  3. Data lakes / Lakehouses.

  4. NewSQL systems.

  5. Hardware accelerators.

  6. Blockchain databases.

Continuing to squeeze the most out of PostgreSQL

That is, squeeze out of vanilla PostgreSQLfrom Postgres Pro and from Postgres Pro with CFS. This article is a supplement to Selectel's article about Postgres squeezing. There Postgres Pro Enterprise 15-16 and PostgreSQL 15-16 were compared – default versions of both, then Postgres Pro was autotuned.

The LM article clearly describes the settings for:

  1. PostgreSQL,

  2. Postgres Pro and

  3. Postgres Pro with CFS.

In the 1st testing option – what was deployed by default. A small list – a dozen parameters.

But in the 2nd version the list of finer settings is very solid – under 60 parameters. And these settings were applied to all 3 test participants.

In the 3rd version, Pro added a setting for only 7 parameters to these lists, which are specific to Pro Enterprise.

You can listen to the report about the settings Speeding up hardware and OS under PostgreSQL. on PGMeetup.SPb/24 Mikhail Zhilin (but the topic is broader: there is also about NUMA – the most complex matters). He also tested the configurations from this article.

The results are presented as 3-color columns. I won't spoil them.

As for the author of the original Selectel article, he, Maxim aka Maksvelis, is not the first time testing Postgres, and he was not limited to x86, but added ARM to Intel and AMD: Who is more powerful in databases? We compare the performance of the database on servers with ARM and x86 processors. What about ARM, Maksvelis also examined Elbrus, and what Elbrus: 8CB with the architecture with a broad word (but I haven't tested Postgres, although it works there without binary compilation).

Minding the CPU: How to Find the Bottleneck and Configure Postgres Pro

This article is an analysis of a real case that happened to the company's clients. The genre is not new for the Postgres Professional writers' union, for example: Battles on the territory of ZFS, or, say, Parallelism in PostgreSQL: not spherical, not a horse, not in a vacuum. With concern for the CPU – a very recent case. Let us remind you that the English-language blog Andrey Lepikhov on Substack, which we have quoted extensively in (1) and (2), is called Conservation CPU's cycles …

The client's justified concern was caused by the processor load – more than 90%. The performance dropped. What to do? Well, in general, don't twitch, but reduce the load little by little, optimizing here and there. For this, a really powerful tool was used, among other things. pgpro_pwr Andrey Zubkov (many people know pg_profilein PWR (pgpro_pwr) the functionality is expanded). Here interesting nuances were discovered: the planning time, for example, does not often affect the overall costs, but here they decided to reduce it, using sr_plan. The conclusion is this: we must refrain from generalizations. In each specific case, we must consider factors and metrics as material for analysis and decision-making. Silver bullets remain in fairy tales for their target audience.

How we make our DBaaS

Well, how do we do it? That's how we do it. This solution is rapidly developing, so read on and then follow the updates. There is video on youtube.

Sports (not running)

Here's to this topic – How we judged at the Russian Championship in sports programming – there is a rich context, more on that later. And in this case they were judging and arguing again with the participation of Mikhail Zhilin And Shaggy Mammothwho this time appeared Alexander Fatin, humble DevRel.

Sports programming is a state-recognized sport (as indicated in the relevant register), with all the necessary things: a federation [Федерация спортивного программирования]tournaments and other championships.

They not only judged, but also formulated the task for the “Qualifying round of the Russian championship in sports programming” in the discipline “Product programming”. This is a student three-day online hackathon, the winners competed in the final of the national championship in Moscow. For some reason, the website no descriptionthey are only available starting in June 2023. The task was somewhat unexpected for us, the readers and writers of Postgresso: Smart Monitoring of PostgreSQL Database via Telegram Bot. The admin left the office, he only had his phone with him, and the base felt bad. How should the bot authenticate and authorize the admin? What problems should he be notified about? In what way? What options should be offered? The article tells what they were falling for, there is “Unsolicited advice to the organizers”, Alexander rightly criticizes the selection algorithms.

The events took place in the fall of 2023, but we missed them, unfortunately. And now this news:

XXV Open All-Siberian Olympiad in Programming named after I.V. Pottosin

Started registrationthe team consists of three participants and a coach. The main language of the competition is Russian, the working languages ​​of the Olympiad are Pascal, C/C+, Java, Kotlin, Python. Schoolchildren, students and postgraduates participate. One of the sponsors is Postgres Professional (2nd year in a row). Also: Yandex, CFT, VC Education, ICPC Northern Eurasia.

On September 29 at 10:00 (UTC +3) there will be an online tour, you can participate from anywhere in the world, the number of teams from one educational institution in this tour is not limited. From November 1 to 5, 2024, the strongest teams that passed the online tour will take part in the face-to-face tournament at NSU.

We wrote about “IT-Planet 2024”: tasks of the third stage in PostgreSQL, and before that tasks of the second stage in PostgreSQL. And a year before that, we also published the stages of the Olympiad process and the results of 2023: Tasks of the third stage of the “IT-Planet” Olympiad in PostgreSQL and Tasks of the second stage of the “IT-Planet” Olympiad in PostgreSQL. You can see on the IT-Planet website results. So, the efforts of Postgres Professional from Egor and Ilya Bashtanova now they have been supplemented – in another competition – where they took part Mikhail Zhilin And Alexander Fatin.

But – to be fair (and Postgresso was created to be fair – to the best of our ability) – the games started earlier, and not with Postgres Professional. There are articles on Habr Evgeniya Bredni from 2018, then still aka oracloid BZQ (now he heads the Postgres Professional support service). It's about the 2016/17 Olympiad, and it was already held for the tenth time. In a large article How we made the SQL Olympiad Evgeny also talks about goals:

I tried very hard to make the tasks have a pronounced wow effect like “is this really possible?” possibly in SQL“, and to be away from the traditional Olympiad theme, which requires quite specific skills. The difficulty of each task was primarily to imagine (and then implement) the declarative solution method a completely independent and non-trivial task even for classical programming [болд мой].

The article contains conditions for 5 problems, here is an analysis of one of them: the problem about the calendar. In the analysis of this problem, a remarkable shift occurs:

After hesitating for a while, I settled on PostgreSQLto get a closer look at how it will look in comparison with the Oracle database in this task. Expressing all the necessary steps of the solution in another SQL dialect should not be a problem, personally I coped with it quickly. Let me remind you that at the Olympiad, where the task was taken from, it was used Oracle SQLon which the reference solution was originally written.

By the way, Evgeniy threw in some problems even without the Olympiad: SQL: analysis of the problem of finding the last price. In general, browse through them.

By the way: Postgresso 12 (61) has a couple of tasks from job interviews. And here is the deputy general manager of Postgres Professional Ivan Panchenko aka x-wao presents Analysis of Postgres Pro Quiz Problems at PGDay'17.

And let us remind you about Advent of Code – This adventist calendar coders who came up with Eric Wastl (Eric Wastl). He assigns problems every year from December 1st to Christmas – to December 25th, one per day. They can be solved in any programming language – we wrote about this, for example, in Postgresso #6 (55). There are more problems there, if anyone is interested. Greg Sabino Mullane (Greg Sabino Mullane – we called him that back then Mullaney) solved the problem (problem) Pyroclastic Flow. The elephants need to get out of the cave, the entrance to which is blocked by stones in the shape of different tetris figures. And now:

Using SQL's Turing Completeness to Build Tetris

Someone Nuno Faria (Nuno Faria). It's not pure SQL, there's a Python helper script. He also has Tetris with AI and something else playful

Interviews, podcasts: Between the brackets and Postgres FM

Interview With Egor V Between the brackets about his book PostgreSQL 16 Inside Out.

It's a common thing: there was no book that I liked in all respects, so I had to write my own – with an optimal level of immersion in technological details: not superficially, but also without diving into C-code analysis. This book is not for admins, and not for application developers – it is for mutual understanding between the two – explains Egor. He wants to finish writing some things, a chapter on replication, for example.

Postgres FM Anniversary (oh, we missed it)

To 100TB, and beyond!

Postgres FM launched Nikolay Samokhvalov And Michael Christofides (Nikolay Samokhvalov – founder Postgres AIMichael Christofides – founder pgMustard). They invited us to the anniversary edition Arku Ganguly from Notion, Sammy Steele from Figma and Derka van Veen from Adyen (Arka Ganguli, Sammy Steele, and Derk van Veen). They are remarkable for scaling Postgres to a hundred TB (or so). As always, the release page has a large number of useful links. In this case, there are as many as 14 – a sonnet of links.

And here, for example, is the April episode of Nikolai and Michael: Don't do thisThey were inspired by the page. Don't Do This V PostgreSQL wikiadded memories that are not always pleasant, selected fragments from their FM, illustrating the theme.

And there is more Postgres TVwhich Nikolai leads with his long-time comrade from Postgres Tuesdays – with Ilya Kosmodemyansky (Ilya Kosmodemyansky, DataEgret). Well, Tuesdays #RuPostgres unforgettable! Good luck!

Discussing the Postgres Startup Ecosystem

Nikolay and Michael discuss the growing and shifting commercial ecosystem around Postgres. The focus is on Neon, Tembo, and Aiven. But they're not the only ones on the list, by any means. Each of their videos has a bed of links, and this one has a good 30 of them!

Community: educational program and pre-commitfest

Understanding the Postgres Hackers Mailing List Language

Greg Sabino Mellein (Greg Sabino Mullane, Chrunchy Data), following in the wake of recent efforts – that is, the democratization of development – even publishes his own dictionary. It includes abbreviations familiar to postgresists (CFM = commitfest manager), and slang words (bikeshedding, footgun), and even old internet jargon (AFAICT).

Postgres Pre-Commitfest Party at Saint HighLoad++ 2024 And Q&A: PostgreSQL Pre-Commitfest Party at Saint HighLoad++ 2024

Pre-commitfests are perhaps the most interesting and useful thing that has appeared in the postgres world recently. Why in the plural? So far there has only been one: at Saint HighLoad++ in St. Petersburg. Well, if it is useful and interesting, it will definitely happen.

Note that in addition to a whole bunch of Postgres Professional (remember, the idea came to mind To Nikolay Shaplovand the idea was implemented Andrey Borodin from Yandex Cloud and Nikolai's colleague) there are also 2 representatives of Tantor Labs and 1 SberTech.

Intrigues and anti-intrigues

PG_MEM: A Malware Hidden in the Postgres Processes

Aqua Nautilus engineers have published the malicious software they discovered (malware), which disguises itself as PG_MEM, throws dummy payloads to disguise crypto mining.

They were catching the attackers with live bait, basically provoking them. They used an easily guessed login/password pair as bait. And they lured them. After that, they watched what these cyber-hookers would do. And they created new privileged roles, got a superuser and, having launched what they needed, immediately removed the superuser rights to eliminate potential competitors – after all, it was easy to guess the password. And mining requires resources – it would be a shame to share them with someone. At the same time, they killed cron jobs – like a cuckoo throwing non-cuckoo chicks out of the nest.

Fascinating story. The author posted a map of servers that are in danger. There are 838 thousand of them, not far from a million. The author's name is Assaf Morag (Assaf MoragLead Data Analyst at Aqua Nautilus.

pgdsat

It stands for PostgreSQL Database Security Assessment Tool. This is a set of scripts that checks about 70 PostgreSQL parameters that affect security. It also includes CIS compliance benchmark. Developers – HexaCluster. The company is interesting. They also make this PostgreSQL access tool: Connecting to PostgreSQL with Go using PGX.

By the way, on PGConf.SPb 2024 there will be a report Testing for Strength: Utilities for PostgreSQL Analysis and Optimization Ruslana Rangulova from Softline. Among the utilities listed there is pgdsat.

pgspot 0.8.0

We wrote about this in the previous issue, let us remind you. An interesting tool for identifying vulnerabilities in Postgres SQL scripts is posted on the Timescale github.

Serverless and WASM news

Neon Autoscaling is Generally Available

Neon's logical move towards automation and resource saving. Serverless is by definition zero resources when they are not needed (downscaling to 0). From the very beginning, the company was moving in the opposite direction (seamless upscaling). As a result, the team immediately began to develop flexible automatic scaling tools. And here it is.

Their autoscaling algorithm is deeply integrated into the Neon architecture, where storage and computation are separated. Each Postgres instance sits in its own virtual machine. NeonVM is strictly isolated and can do something that Kubernetes hasn't learned yet: migrate from node to node in milliseconds. Therefore, databases can also autoscale in milliseconds.

AWS Services Using SQL for Big Data Analysis

Big article in Simple Talk on the Red Gate website ed-gate.com. Let's highlight an interesting, not very widely known service, in which the fashionable word flashes serverless:

Amazon Athenaserverless interactive service for querying Amazon S3. You can query structured data (relational and CSV), semi-structured data (JSON and XML), and unstructured data (text, logs, binary).

Dynamically loaded extensions in Postgres in the browser

Just in the last issue we wrote about PGlite and Wasmbut then the heroes were Supabase with their http interface. Lantern did roughly the same thing, but in their own direction: at the recent AGI House hackathon (that is, the House of Universal Artificial Intelligence) in San Francisco, they presented an extension that runs directly in the browser. Also on top of pglite, of course.

Migration (from Pine Cone to Lantern)

Migration to Postgres. But an unusual migration to an unusual Postgres. Not from Oracle, not from MS Server: Migrating from Pinecone to Lantern.

Function create_from_pinecone client lantern-pinecone automatically migrates data from Pinecone to Postgres using the Pinecone API, with all keys, environment variables, indexes and namespaces.

AI

On the wonderful Postgres FM The following episodes were also discussed:

  • pgvector – With Jonathan Katz (Jonathan Katz), of course – a member of the PostgreSQL Core Team, a senior product manager at AWS, and a pgvector contributor;

  • pgvectorscale – With Mat Arie (Mat Arye) and John Pruitt (John Pruitt) from Timescale.

Ivntroducing Tembo AI: the simplest way to build AI applications on Postgres

Writes Adam Gendel (Adam Hendel, Founding Developer – founding engineer). This is, they say, for those who do not want to give their data to LLM providers. For those who choose Tembo Cloud, the data will be transferred to LLM inside the personal space in the cloud, where LLM is built in. At the same time, users will have access to models from 8 billion parameters to 70+ billion. But this related to the period of at alpha before the public version, what is happening now, we did not understand (who needs it, will probably understand). The Tembo solution uses 3 AI extensions at once:

  • vector + vectorscale: storage, queries, search for embeddings in the database itself.

  • pg_vectorize: a reliable tool that not only generates imbeddings, but also updates them. This helps build RAGs.

Tembo AI is compatible with the OpenAI SDK. And you can keep in mind that Tembo solutions support GPU.

Understanding pgvector's HNSW Index Storage in Postgres

IN blog Lantern, who developed the extension of the same name, offer cloud services Lantern Cloud. Placing vector information is cheaper and more profitable (as they say – following Stonebraker and Pavlo) than vector Pinecone And Milvus.

90x faster than pgvector — Lantern's HNSW Index Creation Time

The index is not assembled inside Postgres, where it would be assembled in 1 thread, but “on the side”, with all the benefits of multi-threaded computing on multiple cores.

Neural Query Optimizers in Relational Databases (Part 1)

In the first part of this article the author – Saveliy @Safreliy- disassembles 3 models:

  1. MSCN (2018)— a model for assessing the cardinality of queries

  2. DQN (2018)— a model for constructing a query execution plan

  3. NEO (2019)— an end-to-end approach that combines both the learning function of assessing the speed of execution of a given plan and its construction itself.

In each case, it explains how the algorithm works, how to train it, and some benchmarks. The conclusion is:

almost everywhere neural networks outperform classical heuristic approaches that existed at the time of their creation in terms of quality and performance. However, the dominance of these networks is not observed in practice (a typical situation for most scientific works: the results are excellent, but there is no benefit). Apparently, the development of a product neural network solution that will satisfy all the nuances of the operation of real commercial systems was expensive and difficult in 2018-19. Nobody has canceled the inertia of the industry, and we live, without exaggeration, in a turning point. Perhaps right now we will witness a revolution not only in the world of language models, but also in the world of relational databases. But more about this already in the following articles.

The Rise, Realities, and Potential, of Distributed SQL + Vector Databases in the AI ​​Era

Author – Spencer Kimball (Spencer Kimball, CEO Cockroach Labs, and (I didn't know!) the creator of GIMP during his years at Berkeley. Article on a resource not relevant to DBMS builders: on Datanami. And in the article there is not a single mention of a specific DBMS, only generalizations.

Here's another one on a similar topic from Datanami:

Forrester Slices and Dices the Vector Database Market

Author Alex Woody (Alex Woodie) He says the vector database market has exploded. But many are left wondering whether the Postgres extension is enough pgvector for the needs of Universal AI? What are the advantages of a vector database over multimodal databases? Do vector databases always work in the cloud, is it possible to launch them on-premises?

There's a big table from Forrester's Q2 2024 report on what vector databases and non-vector databases with vector capabilities can and can't do. It includes EDB, but not PostgreSQL. Here's another article by Alex on roughly the same topic: Vectors: Coming to a Database Near You. Gartner has also joined Forrester, and this article also has beautiful graphs and charts.

Backup as a game changer

Why PostgreSQL 17's Incremental Backup Feature is a Game-Changer

From the article David Wagoner And Tima Butina (David Wagoner and Tim Boutin, EDB) It turns out that it is not PostgreSQL 17 itself that is the gamechanger, but the incremental backup in Barman 3.11. Yes, in combination with PostgreSQL 17: The powerful duo delivers seamless, Enterprise-class strategies.

A little bit about memory

Memory overcommit and PostgreSQL

Sometimes the OS behaves like an airline company, deliberately booking more tickets than there are actual seats in the cabin of the airliner – we learn from the article Laurenza Albe (Laurenz Albe, Cybertec) This behavior of Linux can be adjusted by tweaking the parameter vm.memory_overcommit. But you have to do it carefully, otherwise you can completely crash the server. There is a separate chapter in the article for the case of containers.

How to Get the Most out of Postgres Memory Settings

Article Sean Thomas (Shaun Thomas, EDB – oh no, sorry: also in Tembo!), in which the most important part is called Sharing is Caring – we would translate share wiselyIndeed, there are many parameter settings being discussed, not only shared_buffers. More about work_mem, pooling, swelling (bloat) – quite to the point.

Controlling Resource Consumption on a PostgreSQL Server Using Linux cgroup2

In the blog Perconais addressed primarily to those who are struggling with multi-tenancy/co-hosting problems.


That's all for today.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *