Postgresso #5 (66)

PostgreSQL: PostgreSQL 17 Beta 1 Released!

Beta has been released with 188 innovations. Let us remind you that Bruce Momjian recently emphasized the importance of this release because of its slight focus on optimization, they say, a large number of improvements in optimization, this is a pleasant surprise for me.

The explanatory note for the release also starts with optimization. First of all, they talk about changes in Vacuum. There is a new internal structure, thanks to which it was possible to save 20% of memory, as well as reduce the time of cleaning itself. The last point touches on a rare topic: PostgreSQL 17 has improved support for SIMD instructions.

An interesting, important point is failover control for logical replication, important for fault-tolerant configurations.

The most important thing has appeared in SQL/JSON – JSON TABLE, this is a new level of working with this format. New constructors and other functions have also appeared.

MERGE is now more flexible: a proposal has appeared WHEN NOT MATCHED BY SOURCE to process rows that are in the target table, but not found in the source table. For example, add DELETE.

A standard role has appeared pg_maintain so that you can perform under it VACUUM, ANALYZE, CLUSTER, REFRESH MATERIALIZED VIEW, REINDEX And LOCK TABLE.

The final release of Postgres 17 is scheduled for September/October this year.

We always write about what will be in the release in advance, so in addition to the official release notes, we recommend reading reviews of commitfests Pavel Luzanova. Moreover, the English translation of part 4 has just appeared: PostgreSQL 17: part 4 or CommitFest 2024-01translation Alexandra Meleshko. So now the complete collection:

  • 2023-07 ru/en,

  • 2023-09 ru/en,

  • 2023-11 ru/en,

  • 2024-01 ru/en.

    Russian versions can also be read on Habré, just like the English ones on our list.

As for the beta, this article just came out:

Exploring PostgreSQL 17: A Developer's Guide to New Features – Part 1 – PL/pgSQL

In it Deepak Mahto (Deepak Mahto), whom we often quote in the Migration sections, describes a specific feature: to declare array types differently, using %TYPE and %ROWTYPE. This simplifies your work and adds flexibility.

Conferences, webinars, meetups and their consequences

PGConf.dev 2024

The conference took place May 28-31 in Vancouver. On the first day, 2 specialized meetings took place.

  • about the patches Advanced Patch Feedback Session gathered to listen Heikki Linnakangasa (Heikki Linnakangas)Michel Paquier (Michael Paquier) and Robert Haas (Robert Haas)without witnesses – by invitation only;

  • Extension Ecosystem Summit With David Christensen (David Christensen), Devrim Genduz (Devrim Gündüz)Jeremy Schneider (Jeremy Schneider)Keith Fiske (Keith Fiske) and, of course, with David Wheeler (David E. Wheeler, the same one who founded PGXN, and now with Tembo).

(By the way, Robert Haas lifted the lid on the Advanced Patch Feedback Session. One of the topics was the transition from a multi-process model to a multi-threaded one. Now this topic is again being intensively discussed in the hackers mailing list with the subject [multithreading] “extension compatibility”).

(By the way-2: Tembo has already posted playlist from 6 mini-summits by extensions. David Wheeler will be giving a talk on this topic at POSETTE 2024: State of the Postgres Extension Ecosystem.)

And here's what he writes Peter Eisentrout (Peter Eisentraut) about this conference in his funny article – How engaging was PGConf.dev really?

“It’s time to stop these conferences. Because of them, all development stops,” the joker quotes Peter joker Tomas Wondra (Tomas Vondra).

Peter counts (using Postgres, of course) breaks in commits during conferences. And he comes to the conclusion that during PGConf.dev there was the longest break in 20 years! There was a break in 2015 before pgconf.eubreaks in 2014 and at the end of 2008 do not correlate with conferences, but in May 2008 the pause just intersected with PGCon that year.

well and Robert the Veil Lifter writes in his blog about the last conference enthusiastically: in the first two phrases the word is repeated three times great. But then Robert not only opens the piece, but actually opens the curtains, giving a link to this is the document.

The following article can be attributed to the same topic (today Robert Haas is the hero of the issue): Hacking on PostgreSQL is Really Hard.

“Many people will agree with these statements, but for different reasons. Some may complain about type of discourse in mailingsothers complain that patch reviewers are on fire during the day, while others complain that they feel like hostages capricious committers. But today I’m not talking about that: I want to focus exclusively on the technical side.”

In order not to confuse anyone, Robert takes an example from his own experience: about patches for incremental backups. As a result, there were about 20 of them, and he (and not only him) tinkered with them for almost a year. And from this topic he turns again in the same direction: it is very difficult to break out of the narrow circle of committers – after all, these are people with vast experience, it is difficult to replace them. Vicious circle.

But we forgot about the other participants and the rest of the program. Very briefly:

on the second day she performed among others Alena Rybakinawho prepared the report together Andrey Lepikhov (both Postgres Professional), Adaptive Query Optimization In PostgreSQL. Eat PDF this report.

Andrey Borodin from Yandex presented the report Tricks From In-Memeory Databases. Here are the slides.

New genre: precommitfest

Echoes of the same theme – if not a crisis, then some anxiety. On Saint HighLoad++ 2024 workshop planned Postgres Pre-Commitfest Party. It was an initiative Andrey Borodin (Yandex Cloud) as a way to resolve problems with commitfests, which we described in relatively detail in the previous issue. Andrey suggested discussing upcoming patches first outside the infrastructure commitfests. The organizers of Highload++ liked the idea, and Postgres Professional supported it and will now help with advice Oleg Bartunov (Major Contributor) and Fedor Sigaev (Major Contributor and FreeBSD Contributor), experienced people.

The main problem that the existing commitfest infrastructure does not always solve effectively is finding reviewer, who will review the patch and contribute to its promotion in the community. Precommitfest will help with this, you just need to apply application. We are talking about patches for the 18th (!) version of Postgres. The organizers write:

An impressive number of developments have already been transferred from version 17, something else will be added until July. On the eve of the commitfest, patch developers are invited to meet and agree to work together. Prioritize some developments, promote your patch among potential reviewers, hear quick feedback from the community gathered at StHL. If you want to try to find a reviewer, prepare a short (2-3 minutes) story about your patch.

Well, in the main part of the conference there will be, for example, the following reports:

Do-it-yourself load balancing of sharded PostgreSQL – and using SPQR, a PostgreSQL sharding tool. Denis VolkovYandex Cloud.

Caching user data in storage systems: implementation of the cache synchronization protocol in an Active-Active cluster – a multimaster, but “iron”. Mikhail Motylenok, YADRO.

Less code, more results: using SQLC to work with the database – working in Go with PostgreSQL. Evgeny Konechny, Uzum Tezkor (Uzbek startup).

PGConf.SPb 2024

It will take place on October 1st. Registration openacceptance of reports Same. Speakers are compensated for transportation and stay in the city during the conference.

In honor of the upcoming event, the organizers are sharing recordings of performances from PGConf.SPb 2023, previously available only through your personal account. We started with this:

Fencing in the clouds and more… – this is a report Igor Kosenkov (Postgres Professional). Now there are playlists: 1st day and 2nd.

Olympics, competitions

XV International Olympiad in the field of information technology “IT⁠-⁠Planet 2024”

For some reason that’s the name of a big video that starts with a report Egor Rogov, Director of Educational Development at Postgres Professional. Egor’s report itself is called this:

Back to basics. Should we understand the technologies we work with?.

You will be surprised, but yes, you should 🙂 Don’t be alarmed by the stated duration of the video (ten minutes to three hours): Egor speaks for about 25 minutes, then reports Igor Sychev from Sbertech, then a report on 1C and at the end of this huge video report on educational software and hardware complex Rudiron from Aquarius.

The title of the report is taken from the article Back to Basics Joel Spolsky (Joel Spolsky), known as the author Trello, co-founder of Fog Creek and Stack Overflow. But Egor relies even more on his other article – about the Law of Leaky Abstractions, The Law of Leaky Abstractsalso on his blog Joel on Software. All non-trivial abstractions are full of holes, Yegor agrees with Joel. You can fall into such a hole due to performance problems, for example. There are many other examples.

Egor is directly related to the theme of the entire video. Here is his recent article:

“IT-Planet 2024”: tasks of the second stage on PostgreSQL

Stage 1 – correspondence theoretical test on PostgreSQL, out of almost 3000 people, approximately 200 were selected. Questions for this stage were prepared Evgeniy Davydov.

Stage 2 is also part-time. Here participants were asked to think about five problems Egor Rogov. He examines them in this article.

The final (3rd stage) of the Olympics took place in person in Arkhangelsk. Here winners, there are a lot of nominations. In the PostgreSQL discipline this is:

  1. Alexandra Sukhotina (Peter the Great St. Petersburg Polytechnic University),

  2. Ian Senin (Belarusian State University) and

  3. Vu Nam Hoai (SPb Polytech again).

Analysis of the 3rd stage will appear soon.

UPD: Third stage tasks for PostgreSQL

Not tasks, but a task – Yegor corrects himself. There was only one, but it was big. It was necessary to write an SQL query that would play tic-tac-toe “five in a row.”

Not quite like the ones we all once played – not on an “infinite” board, that is, a piece of checkered paper. In the problem the field is 19×19. The solution should consist of one SQL query.

The organizers provided applicants with a web application, which was built by Ilya Bashtanov, You must register your requests there. The application simplified debugging, allowing you to try and visualize an arbitrary number of options. The basic algorithm for such games is called minimaxit is easy to Google, and participants had full access to the Internet.

Next comes an analysis of the algorithm that Egor himself outlined. The algorithm of the winner, Alexandra Sukhotina, almost coincided with it.

For comparison, here is an analysis of last year’s problems of the second stage of the IT-Planet Olympiad on PostgreSQL and the problems of the third stage.

And running

Festival of Sports and IT

The festival, which is organized by RUNIT, has been taking place since 2018: “For us, running and IT are inseparable things; at the heart of both is movement towards a goal. Sports teach us to make quick and correct decisions and play for the long haul.“.

Medals from Postgres Professional Education Department employee Pavel Tolmachev

Medals from Postgres Professional Education Department employee Pavel Tolmachev

Education

Guide to Databases

Publishing house DMK Press. Author – Vladimir Komarov — A generalist IT specialist: programmer, database administrator, data and infrastructure architect, teacher and a bit of an evangelist.

The Postgres Professional Education Department was involved in preparing the book for printing. So we highly recommend it.

The book talks about the architectural principles on which all modern database management systems are based, as well as the algorithms and data structures that they use. Particular attention is paid to comparing implementations of the same approaches in platforms with similar functionality.

This book should be read by anyone who is not satisfied with the level of preparation in three-month courses like “get into IT.” It will give practical knowledge a solid foundation in the form of an understanding of general patterns. The book is written for information systems architects and leading developers. In other words, for the elite and for those who want to become one.

Book table of contents:

  • Part I. Classification of databases

  • Part II. Data access

  • Part III. DBMS architecture

  • Part IV. Distributed Databases

  • Part V: Disaster Recovery

  • Part VI. Database Operation

  • Part VII. Database Security

Can download a book in PDF format.

SQL antipatterns. How to avoid pitfalls when working with databases

The book by the Peter Publishing House is about SQL, not Postgres. More on examples from MySQL 8.0, but other popular RDBMSs are also mentioned in the text. The code examples use Python 3.9+ or Ruby 2.7+. This is a translation of the book Bill Carvin (Bill Karwin). Just letting you know, we haven’t read it yet.

A short episode from the life of educators

My colleagues in the Postgres Professional Education Department regularly receive questions. Something is not clear in the educational materials, sometimes there are typos/errors/inaccuracies – we are all human. For example, a question arose regarding the course DBA2:

If there is very little free space left on the table page (for example, 2 bytes), will such information also be displayed in the free space map? Or is there some kind of limit and the free space map only shows free space larger than a certain size?

Colleagues usually answer in detail, sometimes clarifying the question. This time Egor Rogov I advised you to read the article, and it’s not my own at all – although you might think that Egor has an article (a chapter of the book) for all cases of post-gress life 🙂 This article may be useful to you:

PostgreSQL. Free space map device

This is an article Mikhail Gilev @o4ina for 24 minutes of reading, according to Habr. It discusses the following questions:

  • What are FSM page categories?

  • How the category tree is arranged on a leaf FSM page.

  • How non-leaf FSM pages and the FSM page tree are structured.

  • Where does the search for free space in the FSM tree structure begin?

  • How does the search algorithm for the desired category work?

  • How the algorithm for updating categories works.

  • What are the features of the FSM layer for indexes?

  • Under what circumstances are FSM pages blocked.

  • Under what circumstances is it necessary to restore FSM pages.

  • When FSM is used when inserting new row versions.

Safety

Protecting Personally Identifiable Information in PostgreSQL: A Critical Requirement for Financial Organizations

Article by pen Umair Shahid (Umair Shahid, Stormatics). It divides security concerns into the security of data in motion – usually TLS (Transport Layer Security) – and the security of data at rest. There is already a choice here:

And then he explains how to do this with the existing Postgres tools. For example, using the module pgcrypto, which just makes it possible to select columns for encryption. And for TDE, it recommends reading the EDB brochure.

There is also a Book from Crunchy Data:

Data Encryption in Postgres: A Guidebook

This is not a book, of course, but a summary of the same genre of basic general principles. But the breadth of coverage is a little different. Greg Noakes (Greg Nokes) for each point gives behind And against, also breaking down into encryption of at-rest encryption methods and transmitted data (encryption on the fly). But in this article he writes about bedridden people and starts with the OS.

Behind:

  • Transparent to applications and database.

  • Makes it easier to maintain because it applies to the entire storage layer.

  • Delegates encryption/decryption to the OS.

  • Minimal impact on performance.

  • Widely known and understood technologies.

    Against:

  • Less granular control for specific databases and tables.

  • By default, backups are not encrypted.

  • Additional overhead for monitoring the correct use of encryption keys.

This is one of 4 methods, here are all 4:

  1. At the OS level.

  2. Encryption at the storage device level.

  3. TDE (for some reason Greg stands for Transparent Disk Encryption, apparently a typo).

  4. Application-level encryption.

Greg then makes some general recommendations.

Crunchy Data and Center for Internet Security Announce Benchmark Update for PostgreSQL 16

Crunchy Data has always put a lot of effort into security. CIS Center for Internet Security (CIS) is a non-profit structure that deals with security in various areas of IT. CIS benchmarks – an important component of their activities. This is free software that evaluates systems from a security point of view. They have a couple dozen such benchmarks. Among them, of course, are for PostgreSQL (for Oracle products – 5, IBM products – 6). The new benchmark is a product of interaction between Crunchy Data and CIS.

PGMeetup: Postgres Pro developments to improve security and data protection

Postgres Professional senior technical consultant spoke at the meetup Andrey Gusakov. By timecodes you can see:

HR in persons of the week

Unexpectedly – I don’t remember anything like this – person of the week (Postgres Person of the Week) became not a developer, but recruiter Phillip Marks (Philip Marks).

Migration

Code Conversion Chronicles – Trigger Order of processing in Oracle to PostgreSQL Migration

Deepak Mahto added another article to my collection of Oracle migration chronicles. Thus, the list from the previous release takes the following form:

The Ultimate Multi-Database Data Comparison Tool

pgCompare – an open source utility developed by Crunchy Data to simplify and improve comparisons in PostgreSQL, Oracle, MySQL and MS SQL databases. He talks about her Brian Pace (Brian Pace).

An assistant to copy data from a remote server

Florent Jardin (Florent Jardin) in on your blog writes: during a recent PG session organized by Dalibo, I led workshop in French about how to migrate to PostgreSQL using FDW. This was an occasion to present the expansion to the public db_migrator (idea was taken from data2pg Dalibo) which I already wrote about in the article On the road to freedom with db_migrator.

But during the workshop, I discovered that copying by the migrator is not fully supported. Using a low-level function, you can transfer table by table, scattering the transfer across different processes, but there are many situations where you will have to write a lot of SQL queries. After working on this for several months, I produced it in PL/pgSQL Assistantwhich simplifies these actions.

DBMate 2.16

I made this migration utility Adrian McNeil (Adrian Macneil), it works from the command line, you can work with it from different languages ​​and frameworks. The author suggests comparison table with other instruments.

pgRoll 0.6 (and not only)

This migration utility claims that both the source schema and the target schema will continue to run in parallel – even if fatal problems are discovered during migration. During migration, views are created; the tables themselves are not affected for the time being. The reversibility of the changes is also stated. Here some explanations. In version 0.6 half a dozen improvements have been added. The utility is open source.

It is developed and supported by the company Xata. She claims a lot. For example, on serverless Postgres and even assertsthat this is the only serverless Postgres platform (it’s even a shame for Neon). A pier built on top PostgreSQL,Xata provides full text and vector search. The search is indeed quite flexible: you can search not only in individual tables, but also in given columns to which you can assign weights, there is a fuzzy search. Individual records can be accessed attach filesexpanded aggregation capabilitiesthe diagram can be easily branch. Eat integration with ChatGPT via API. Internally, it appears to be mostly REST, with support for TypeScript, Python, and interaction via JSON at the user level.

Here Here compare Xata With Neon. Xata is far ahead in terms of supported platforms. Large plans, considerable investment. There are about 30 employees in the photo.

Multimaster – true and more

Myths and realities of the multimaster in the PostgreSQL DBMS architecture

On the channel True Tech – report on MTC True Tech Day 2024 Pavel Konotopova And Mikhail Zhilin (both Postgres Professional) about what is “Honest Multimaster” and does it really exist, what implementations does it have and how can it be used. In the final – say the speakers – let's shift the focus to the performance of “Multimaster”. After all, if a DBMS is slower than a turtle, then it is no longer a database, but a turtle 🙂

The topic is important for the company and simply interesting. On PGConf.Russia 2023 there was a report Andrey RudometovPostgres Professional developer: Hello, built-in multimaster? Comparison of bidirectional replication in vanilla and Postgres Pro Multimasterwhere Andrey also talks about this: in the first reviews, due to the external similarity of the resulting cluster with the PostgreSQL multimaster, the feature was dubbed “built-in multimaster” – let’s figure out how similar they really are and see what’s inside.

This topic is also expressed in text form: on the hub there is a three-part article, a whole study by Konotopov and Zhilin based on their report at HighLoad++ 2023: Myths and realities of “Multimaster” in the PostgreSQL DBMS architecture. Parts 1, 2 And 3.

misc

Unleashing PostgreSQL Performance: Exploring the Power of pg_profile

Indian startup with a nice name OpenSourceDB in his blog I remembered a powerful diagnostic tool – pg_profile Andrey Zubkov. A short note, nothing sensational, but the very fact of addressing this, as the author put it, Venkat Akhil (Venkat Akhil) game-changing instrument is remarkable. He installed the newest version – 4.6, where Grafana. In general, they write on the blog on various topics related to Postgres: performance, testing, multimaster (active-active).

Administering a Patroni Managed PostgreSQL Cluster

Robert Bernier (Robert Bernier, Percona) shows how to run and configure Spilo – that is, an HA PostgreSQL cluster on Patroni in a docker container, its offers Zalando. In it: on one node HAPROXYon three – ETCD, one of 3 master nodes, 2 replicas, on all PostgreSQL. Changes parameters, restarts, checks that the replica is tracking changes.

Optimizing Performance in PostgreSQL: Join Column and ANY Filters

Deepak described an interesting case of inefficient work of the scheduler with ANY and INA Lucas Fittle (Lukas Fittl) from pgananlyze became interested and looked into the topic himself – Postgres Planner Quirks: JOIN Equivalence Classes and IN/ANY filters (video version) – in the next issue of a huge series PG-five minutes.

When joining tables, filter with the expression ANY (ARRAY[1,2,3]) was not applied properly to both tables, and if you write IN (1,2,3), then it is converted to the same ANY (ARRAY[1,2,3]) with the same result. In the end, Deepak outsmarted the optimizer by adding some “extra” line.

Lucas climbs deeper under the hood, remembers the parameterized index scan (Parameterized Index Scans), equivalence classes (Equivalence Classes) and about the corresponding filtersattracts experience Tom Lane (Tom Lane) and Tomas Wondra (Tomas Vondra) who faced similar questions. Constructions with USING and ORDER BY are invited to the stage. Using a technique similar to Deepak's trick, Lucas achieves a 2000-fold speedup of the request.

Review of open free tools for creating backup copies of PostgreSQL DBMS

This is an article by a well-known practitioner: Mikhail Shurutov@shurutov. He performed at the PGConf.Russia, taught master classes. But, oddly enough, this is his 1st publication on Habré. And she's interesting.

Mikhail provides a table where he checks the boxes (yes/no) for 5 such tools:

  • barman – this product is an interface to two modes of creating a reference document using the capabilities of creating a reference document built into (supplied with) the DBMS.

    1. old (before version 9.0, physical RCs were created hot only in this way): pg_backup_start ( label text [, fast boolean] ) && && pg_backup_stop (up to version 14 inclusive: pg_start_backuppg_stop_backup)

    2. new (from version 9.0): pg_basebackup

  • pg_probackupATTENTION! The review considers only the open free version available from Github! Postgres Pro Backup Enterprise is not included in the review.

  • pgbackrest

  • wal-g

Mikhail is not shy about admitting and correcting inaccuracies; there are already 4 updates in the article. At the very least, this article is a good starting point for further discussion.

AI

The 150x pgvector speedup: a year-in-review

The best one is probably a specialist in pg_vector, Jonathan Katz (Jonathan Katz), writes that he wanted to do a review for the year, but was impressed by the release of as many as three performance tests of vector databases (Postgres there is vector, taking into account the expansion pgvector). But he didn’t like the test conditions and found inaccuracies. I decided to test it myself, reminding that the most important parameter is recall.

For testing he used the framework ANN Benchmark, but for this I had to get into the pgvector code. There are changes in the article. He ran versions from 0.4.1 (it’s not even on Github anymore) to 0.7.0 (more has been released since then) 0.7.1) with IVFFlat and with HNSW. Where possible he used parallelization and SIMD-instructions.

Well, at the end those same 150 times. But it turns out that this is a 150x speedup in index building in version 0.7.0 compared to 0.5.0. The article contains a lot of technical details. At the very end thanks for the huge contribution Andrew Kane (Andrew Kane), as well as (for not huge, but) contributions Heikki Linnakangasa (Heikki Linnakangas), Nathan Bossart (Nathan Bossart) Pavel Borisova (Pavel Borisov) and Ardu Aytekina (Arda Aytekin).

Well, then, over to Pavel Borisov (now at Supabase):

What's new in pgvector v0.7.0

In general, there are no such radical changes as the appearance of a new type of index (HNSW in 0.5.0), but they are still worthy of attention:

  • float16 vectors (previously only 32-bit ones were supported);

  • sparse vectors (sparse vectors, saving space for vectors with many zeros);

  • binary quantization (bit vectors).

New distance algorithms and distance functions:

Helping PostgreSQL professionals with AI-assisted performance recommendations

Francesco Tizieux (Francesco Tisiot, Aiven) volunteered to help. Both the Postgres pro and his company: he recommends Aiven AI Database Optimizer. This software gives recommendations on indexes and rewriting SQL queries. For now, it's free for early birds.

We write a search for semantically similar texts (or products) in half an hour in Go and Postgres (pgVector)

Anton Okolelov @varanio, introduced himself as Go-team lead, channel presenter crossjoin, offers a short practical guide on the Karuna blog. enjoy library of Sasha Baranov.

pgvector-remote

This thing was invented at Georgia Tech. The article first compares the pros and cons of pgvector in Postgres and a vector DBMS Pinecone. And then they propose a new paradigm: remote indexing with pgvector-remote. You can store metadata in Postgres, and the embendings themselves in Pinecone (in a pine cone). So far, only the Postgres-Pinecone combination is working, but other vector ones are also planned: Milvus and his brothers.

Operationalizing Vector Databases on Postgres

Tembo offers free experimentation with their stack at Tembo Cloud or run the vector database locally in docker. This short article talks about the expansion pg_vectorize, but in manual the entire stack is brought to VectorDB, where there is and pgvector. This is the stack:

  • pg_stat_statements;

  • pg_vectorize as a simple interface for generating embeddings from text, storing them in Postgres, and then searching for nearby vectors using pgvector;

  • pgvector – search for similar vectors, you can store embeddings, create indexes;

  • pgmq used by the pg_vectorize extension for work queues and separating embedding calculations from the original data;

  • pg_cron pg_vectorize is used for regular embedding updates;

  • You can install additional extensions from Trunk.

What's going on at the neighboring bases?

Experimenting with Vector Databases: Chromadb, Pinecone, Weaviate and Pgvector

Just for context: how are your colleagues in the DBMS workshop doing with this? Article written Vishnoi Sivan (Vishnu Sivan) last fall, but it describes step by step how to install, connect to these databases, and create indexes. Simple examples with Python are used.

Large Language Models Meet Teradata Vantage

Chetan Hirapara (Chetan Hirapara) from Teradata writes about how to build a recommendation service using an internal database function TD_VectorDistance for cosine similarity in Teradata Vantageand for embeddings –FlagEmbedding by Hugging Face. Pieces of Python code. The output is beautiful diagrams.

Here's what's going on in the world Snowflake – one of the fastest growing DBMS with an ideology… uh… that is increasingly eroded.

In 2020 director of Snezhinka Frank Slootman (Frank Slootman) admitted: we supposedly wanted to create a cloud RDBMS – WH with vertical storage so that the data could be processed well in the MPP architecture, but it turned out to be something more. Well, now in their releases and blogs you won’t find any mention of SQL – everything is data-data-data.

And now Frank is no longer in command, since February 2024 he has been heading the company Sridhar Ramaswamy (Sridhar Ramaswamy) who directed purchased Snowflake by Neeva, a startup specializing in AI search. Accordingly, the company turned not only marketing, but also development in this direction. For example, there was Universal Search – with AI, of course, but in addition, it can search through all the company’s resources: databases and files, including Apache format Iseberg tables. It would be surprising if Snowflake did not support Iceberg. But support for it appeared quite recently – as Polaris Catalogwhich was announced just now on Snowflake Data Cloud Summit 2024. And right there, and right there: the company Immuta announced Granular Data Policy Support for Snowflake Iceberg Tables. Industry experts rejoiced: hype is hype, but they don’t forget about the data itself.

Mistral AI Unveils Codestral, Its First GenAI Model For Developers

Models from the French company Mistral, which is only a year old, are popular primarily because they are open source. They are not record holders, but Mixtral 8x7B successfully competed with GPT-3.5.

And on May 29, Codestral 22B appeared, which is focused on coding and is said to be smarter than competitors such as Meta's Llama3 70B. Supports 80 programming languages, many frameworks. It can also be used for teaching languages. It is prohibited for commercial purposes; it cannot be used even by employees of a company if the company is commercial.


That's all for today.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *