Postgresso #3 (64)

Conferences and commitfests

PGConf.Russia 2024

It starts very soon: April 8, and ends on April 9. Can be viewed schedule and list reports.

The first report after the opening – the head of our education department Pavel Luzanov – PostgreSQL 17:

This year, the dates of the conference coincide with the completion of the release cycle of version 17. Acceptance of changes will end on April 8 at 15:00 MSK. And we can discuss what to expect in the fall release. There are incremental backups, changes in logical replication, a connection trigger, and something interesting will probably appear in early April.

Next report in the conference room – Pro-Shardman, Alexey Borshchev and P. Konotopov. It is the first among the “product” ones – this shows how much importance the company attaches to this new DBMS, and there is really huge external interest in it. The speakers will talk about Shardman from the point of view of SQL developers and database architects:

what is Shardman?
how it differs from regular Postgres;
types of tables and their use;
how to adapt a database schema to work on shards.

Of course, other new products of the company will be discussed: report Pooler, load balancer, proxy. What do they have in common? Artyom Galonsky, pgpro_rp (prioritization of resources, will report Alexander Popov) and others.

In parallel, in the hall with an intriguing name Tolstoy-Pushkin There will be reports, including, for example, this:

Migration from Oracle APEX/Forms to Elbrus e2k. Problems. Performance, Konstantin Vashchenkov (“Rostec” / NPF PJSC “Rostelecom”). In general, there will be many company representatives: Tinkoff, Yandex, Softpoint, Skala^r, PGMechanix, UNITED TsUPIS, InfoSoft, VNIIZHT, Sibedge, Convertum, Vseda.Da, Sima-land, GNIVTs, Quillis, FORS, Chi-square, Quallis LLC. There are just more of them in Chekhov. [!!список не проверял, поверил на слово Маше]

Traditionally, there will be an astronomical report: Discovery of the Universe by the power of thought, Alexey Semikhatov. The party will be on the 8th at 19:30.

PostgreSQL 17: Part 4 or Commitfest 2024-01

We remind you that this is only part of what’s new in PostgreSQL 17, and you can learn about everything together from the report with the announcement of which this release begins. Part 4 starts with the trump card:

As always, we publish links to previous issues dedicated to PostgreSQL 17 – original and English translated by Alexander Meleshko @kaze_no_saga:

Part 1. 2023-07 (ru / en),
Part 2. 2023-09 (ru/en),
Part 3. 2023-11 (ru/en).

Recordings of reports at PGConf.Russia 2023 have been made publicly available

This is 43 reports. Including presentations from foreign speakers and those who spoke online are now available. They are collected in two playlists on the company’s YouTube channel. The first day And Second day.

Calling Postgres speakers, POSETTE CFP is open until Apr 7th 2024

Proposals are still being accepted. This is an online conference, formerly called Citus Con. It’s nice that the reports are made publicly available. There were last year's 37the year before last – 38. Reports must be completed in 25 minutes. Conference will take place June 11-13.

Swiss PGDay 2024

Will also be held in June – 27-28 at the Eastern Switzerland University of Applied Sciences, campus in Rappersville. Reports accepted until April 8.

Mini Summit Two

David Wheeler (David E. Wheeler, founder of PGXN, who moved to Tembo) in his completely non-theoretical blog called Just a Theory shares his impressions of the mini-summit on expansions. And posted it video And slides. The full summit will be called Postgres Extension Ecosystem Mini-Summit and is due to take place on April 17.

Ian Stanton (Ian Stanton) – technical founder of Tembo (something that is a phrase founding engineer has become a frequent occurrence lately). At the mini-summit he said this: first we looked at PGXN – we love it, it has been supported by the community since 2011, but there are only source codes there. And we need binaries. In addition, since 2012, PGXN has not developed very intensively. We started creating Trunkinspired by PGXN, Apt And Yum. A command line interface was attached to it. Now, with a couple of commands, the extension is created and installed. Trunk and CLI are written in Rust and use Docker. It is still unclear how the community will react to the project.

PostgreSQL: PGDay at FOSSASIA Summit

This conference will take place in Vietnam, in Hanoi April 9. Working language is English. This is a whole set of events. There will also be PostgreSQL Day, dedicated to efficiency and productivity. It was announced that the following will perform there:

Chris Travers (Chris Travers, came and to us at PGConf.Russia 2023), but in a different capacity: now he is the chief engineer (principal engineer) at DeliveryHero (Singapore),

Andrey Lepikhov (Andrei Lepikhov) Lead Developer at Postgres Professional,

Gilles Darol (Gilles Darold, as we have seen in recent publications, he is the Director of Technology at HexaCluster,

Julien Rouault (Julien Rouhaud, too sign we don't care PGConf.Russia, but now he turns out to be a founding engineer in an interesting company Nile).

A familiar name on the organizing committee – Umair Shahid (Umair Shahid, founder Stormatics).

PG BootCamp Minsk 2024

It will take place on April 16 at the Renaissance Minsk Hotel. The language of the event is Russian. From PostgreSQL.org writtenWhat the event is held in accordance with PostgreSQL community recommendations and helps its development.

Eat English speaking site version. In the program committee: again we see Umair Shahid; and we feel good there too familiar Maxim Milyutin – PostgreSQL contributor and openGauss; Mikhail Goldberg – conference organizer PGDay lsrael (since 2017).

Education, internships, certification

PostgreSQL 16 from the inside

IN DMK Press one of these days an updated edition of the book will be released Egor Rogov! Available on our website in PDF. This edition not only reflects the changes that occurred in version PostgreSQL 16, but also takes into account readers' comments and corrects typos.

PGStart

This is an internship program at Postgres Professional. You can work in 6 areas:

DevOps engineer,
QA engineer,
fault-tolerant solutions engineer,
performance engineer,
Go developer,
developer C.

The website outlines the requirements for each of the areas and the skills that the trainee will acquire.

Postgres Professional will conduct professional certification “PostgreSQL Administration” at ITMO University

On April 18-19, the professional certification “PostgreSQL Administration” will be held at ITMO University (St. Petersburg, Gastello str., 12). Testing allows you to confirm your level of knowledge, gain advantages when searching for a job, and find out your growth area and topics that are worth studying.

The test can be taken on PostgreSQL 10 (DBA2, DBA3, QPT) and 13 (DBA1, DBA2, DBA3, QPT). Specialists with an Expert level certificate in PostgreSQL 10 will be able to pass the transition test Expert 10 → 13. The number of slots is limited. Sign up Here.

SQL Logic

On the Redgate website on their blog Simple Talk Joe Selko (Joe Celko) offers an educational course on logic and “logic” in SQL. Joe has won the DBMS Magazine Reader's Choice Award for several years in a row, wrote 8 books on SQL.

In this article he talks about formal logic. Of course SQL is built on three-digit logic, which is a specific thing. But SQL is not logic at all, he claims. And quotes the pioneer of relational databases Dave McGovern (Dave McGoveran). For the reason that if it were a logical system, there would be inference rules.

Then he talks about the weirdness of SQL. For example, DDL works with UNKNOWN And FALSE as with similar, and for DML similarUNKNOWN And TRUE . Further discusses conjunction and disjunction, phrases with BETWEEN, subqueries with [NOT] EXIST, search by expression (pattern matching). And that's all with the history of the issue. In the end, he sings the praises of declarative languages and SQL in particular.

And almost more on topic:

The Stormatics company is not engaged in educational activities per se, but educational activities, the basics, the genre “ for dummies” abundantly featured on their blogs. Deep dives also happen, but less often. Here are examples of “understandings”:

PostgreSQL Internals Part 1: Understanding Database Cluster, Database and Tables

PostgreSQL Internals Part 2: Understanding Page Structure

Understand PostgreSQL's Planner – Simple Scan Paths vs Plans

I was interested in the article based on the results webinar Chris Travers (him again! :)) NUMA policies and their impact on PostgreSQL. There he is presented as the main (principal) consultant on PostgreSQL. This, it seems, is not an educational program, but a deep dive – about the functioning of controllers in different modes, access to memory, but it’s very short – an extract from a 5-minute video. We ourselves once published on the topic of NUMA, but it was not an educational program, but a deep analysis of the situation that was resolved Mikhail Zhilin And Peter Petrov. There is no numa in the title, but it is there: Battles on ZFS territory.

Sad news: Simon Riggs

Remembering Simon Riggs

Simon left this world on March 26th. A very bright personality, an active member of the community. He founded 2ndQuadrant and that alone would go down in Postgres history. He was responsible for the UK PostgreSQL conferences until he transferred responsibility to PostgreSQL Europe last year.

The last speech was in Prague – his opening speech at PostgreSQL Conference Europe 2023, it is available on YouTube.

With incredible energy, Simon pushed into the community features that he considered important for the community and for his own company. We devoted a separate article to one of the episodes: Battle of MERGE. Chronicle with conclusions and morals. It was quite a thriller.

Artur Zakirov – person of the week

Interview with: Artur Zakirov | PostgreSQL Person of the Week

I know Arthur and I can only say good things about him. But let him say it himself:

I started with MS SQL Server. After 6 years of working with him, I became a developer at Postgres Professional. There I worked actively on PostgreSQL, including new features. Some of the bugs that were fixed also ended up in the main PostgreSQL branch.

I was also a contributor to various open source extensions, including the access method RUM, pg_variables for session variables, worked on pg_probackup. The experience and knowledge I gained in the company turned out to be very valuable, I am very grateful for the opportunity to work in this company. Now I support pg_repack, which we actively use at my new job.

One of my favorite extensions is postgres_fdw. We use it for a variety of purposes, it facilitates seamless access to remote data. From applications – Robert Haas (Robert Haas) recently committed incremental backup support to PostgreSQL 17 (see about this, for example, in the review Luzanova – editor's note). I also really want support for UNDO logs. It's a pity that development zheap slowed down. IN OrioleDB UNDO logs have been implemented, but PostgreSQL needs to be patched there, and this is not convenient in all environments.

Now I'm in Berlin, working in Adjust. I grew up in a village in Bashkortostan, surrounded by greenery, far from the bustle of the city. Back then I didn’t even think about moving to big cities. Now it’s hard for me to imagine myself living outside a big city. I settled in Berlin 3 years ago, before that I worked in Tokyo for a year.

PG post-apocalypse

Recovering Deleted Data From PostgreSQL Tables

Christophe Berg (Christoph Berd, Cybertec) writes: make backups, use transactions. Well, if something irreparable does happen, it can be corrected – as long as (AUTO)VACUUM does not have time to go through the table! You can fix it using the extension, which it itself supports. Called pg_dirtyread. With it, you can read the table as if there were no rows marked as deleted in it.

But if this doesn’t help (the dead lines have already been cleared), then it’s too early to think about suicide: you can try to extract information from WAL by reading FPW (Full Page Writes). But this is already using the utility pg_waldump. Save data to file. Having estimated the correct LSN (Log Sequence Number – sequential number in the log), remove unnecessary things. And create a new sign.

Recovering from Data Loss Despite Not Having a Backup: A Postgres True Story

The name is similar, but it's about something else. At a Linux conference SCaLE 21x in Pasadena this March Jimmy Angelacos (Jimmy Angelacos) talks about a real disaster, the consequences of which he eliminated. Then the hard drive crashed. The data was mostly intact, but the company that restored the drive returned it as a pile of files with random names. It was necessary to assemble a database from them with a meaningful hierarchy of files. A video of his performance has been posted on youtubeand you can scroll through PDF.

How I did (not) recover from a data loss (featuring ZFS, LXD and PostgreSQL)

The article is 5 years old and was not written by an IT guru. I was interested in its topic: the problems of the collapse of ZFS and LXD are not discussed very often in articles and comments. Stan's blog accordingly, someone is leading Stanislassupporting some social network mstdn.io on the base mastodon – quite popular, by the way, among postgresists. He was involved in saving the data on his own and chose the following strategy: kill part of the data so as not to lose the most important thing in the database. He seems to have coped with this minimum program.

Actually, not so long ago – a year ago – Evgeniy Brednyawho heads the support team at Postgres Professional, gave a talk at PGConf.Russia 2023:

Recovering damaged data

In this page there is a video you can scroll through PDF. In the preface, Evgeniy warns:

● There are not and cannot be ready-made recipes for restoring damaged data, otherwise they would have long ago been implemented in the form of utilities.

● Requires high qualifications: you need to know how postgres works, how MVCC works, where and in what form data is stored on disk, deep and broad knowledge in the subject area.

● Long and painstaking work, which does not always lead to success.

● Motivation: share experiences that can be used.

● Any wrong movement will lead to partial or complete data loss.

● If you interfere with the operation of the DBMS, then all responsibility for possible data loss lies solely with you!

● You should not even try to do anything without fully understanding the consequences of your actions; it is better to call for help.

The presentation has nice pictures:

Your own DBaaS

How we make our DBaaS

The author introduced himself as @Loxmatiymamont, well, Mammoth is Mammoth, almost an Elephant. An important part of the article: Mammoth immediately agrees on terms, on the hierarchy of entities in DBaaS (from bottom to top):

Database -> Instance -> Cluster -> Project.

The latter distinguishes the developed DBaaS from the hierarchies from textbooks. It is created to manage groups of clusters. The project has an owner who creates internal users and gives them access rights.

The service must be flexible. In this service: you can dynamically change the capacity allocated to instances, including the size of the disks. Theoretically, you can even change the number of processors on the fly, but this will lead to a reboot of the node. This is a feature of OpenStack. And, what’s convenient, you can configure both all members of the cluster at once, as well as individual instances. The focus on OpenStack is related to the situation in the Russian cloud market.

Replication is both physical and logical. All these copies are sent to S3.

There is a 15 minute demo video six months ago. Mammoth immediately says that the product is under development, so I want to know the readers’ opinions and listen to their wishes.

Fifteens parallelized

At Postgresso we regularly write about new Fridays Ryan Booze (Ryan Booz), often mentioning that at first post-greeting spots appeared not on him, but on Sean Thomas (Shaun M. Thomas, now at EDB) – for historical justice. PG Phridays (unlike Buzovsky PGSQL Fridays) invariably appeared in the past tense. And the time is 2017. Suddenly it turned out that Sean (under the influence of Ryan's success or on his own) decided resume them on your website BonesMoses.org. And here they are – for March 15, 22 and 29:

PG Phriday: Redefining Postgres High Availability (March 15th)

This is an interesting article. In many ways, his judgments are non-standard. The directions for redefining high availability are:

In point 2, Sean states the problem this way: it’s no coincidence that the storage engine is incredibly robust and completely unprepared for anything else (incredibly robust and entirely unequipped for anything else). In general, he writes in a rather vivid language. What do you need? Eat Postgres pluggable storage API. But this path does not seem promising to Sean.

He talks about the palpable shock in the community when Aurora for Postgres appeared (and recommends watching the video Deep Dive on Amazon Aurora). Then they arrived in time NeonDB, AlloyDB, which solve the storage problem for the cluster in their own way. And he recommends reading conceptual article Heikki Linnakangasa (Heikki Linnakangas, founded Neon with Stas Kelvich), which describes Neon's storage architecture solutions. From which, by the way, you can learn about new serverless databases – DBaaS Socrates And PolarDB.

In point 3, Sean offers his own concept, telling how to combine the advantages of sharding and the model MapReduce.

PG Phriday: Why Postgres is the Best Database Engine (March 22)

I won’t dwell on this article.

PG Phriday: A Dirty Postgres RAG (March 29)

The name is a play on words: like dirty post-Greek rags. But RAG is also Retrieval Augmented Generation – way of working with LLM. But this is not just another article about pg_vector. And it won’t even be played with GPT. A certain Mixtral 7B, moderate in her appetites. And since RAG is a multi-stage process, the article contains many pieces of code of very different properties.

Well, then we move on to the section itself

AI:

Distributed queries for pgvector

Article Jonathan Katz (Jonathan Katz, Amazon RDS) who directly developed this extension.

At all pgvector adapted to partitioning, HNSW indexes are created on all sections with one command. But there are nuances when executing queries and important index settings. And for distributed transactions Jonathan uses postgres_fdw.

In the cloud, he organizes 3 nodes, configures them, generates them using the PL/pgSQL function on two working nodes of 2.5 million rows each, then runs SELECT count

and watches what happens. It makes sure of asynchronous execution and then searches for the 10 nearest neighbors of a certain vector. And it finds it, and on different nodes. But it also launches a benchmarkANN Benchmarks

. Links to his paper on pgvector scalability with a record-breakingly long title:Accelerate HNSW indexing and searching with pgvector on Amazon Aurora PostgreSQL-compatible edition and Amazon RDS for PostgreSQL

. That's all for now. Get ready for PGConf.Russia 2024

or follow the news from her.

Conferences and commitfests