pitfalls and tools for transition

PL/SQL is a declarative language that first appeared in Oracle version 6 in 1988. By modern standards, this language is not entirely familiar, because most of the programming languages ​​in which we now write (not counting Python) belong to the C language branch. PL/SQL was created based on the Ada language, which was developed for the needs of the US Department of Defense. Now it is gradually becoming a thing of the past, but for its time it was very thoughtful and powerful – with exceptions and a modular organization.

PL/SQL came about because programming in pure SQL query language is not always easy or simple. The new language added the ability to store data in variables (conditions, branches, loops, exceptions), as well as modularity – the ability to group written procedures and functions into packages, already with their own variables, constants, types, initialization and separation of the specification and body.

You need to understand that the PL/SQL code itself is a completely separate language, it is not like SQL, and it has its own engine. In other words, if the code is executed in SQL and some function is called, then a context switch occurs internally, the PL/SQL executor is launched, and then returns back.

PL/SQL code looks like text, is stored inside the database and is interpreted there when executed. This means that you can write a procedure in the form of a “black box” for the user, thereby effectively protecting the database. That is why it has long been customary to build secure systems using such server-side procedural code.

Because PL/SQL is a programming language and not a query language, it has special data types such as BINARY_INTEGER, %ROWTYPE, TABLE INDEX OF and others. Other advantages of PL/SQL include the presence of special mechanisms such as cursors within the language and the ability to connect external code in C and Java.

PostgreSQL PL/pgSQL

Almost ten years after PL/SQL, in 1997, PL/pgSQL was born. Its creators tried to ease the transition from Oracle, so the syntax of the languages ​​is very similar. The only significant difference is that in PL/pgSQL modularity is organized through schemas, and packages – and even then limited – appeared only in version 15 of PostgreSQL, in 2022.

Otherwise, the picture is already familiar to us:

  • storing code in a database;

  • interpretation;

  • special data types (though with different names – RECORD, SETOF, TABLE and others);

  • cursors;

  • the ability to connect external code in C;

  • a whole family of related languages: pl/Python, pl/Perl, pl/Tcl.

Data Type Discrepancies

As can be seen from the history of the issue, the languages ​​are as similar as possible, so I will not describe in detail all the similarities, but rather highlight the most unpleasant moments that may arise when migrating from Oracle to PostgreSQL. Some of these problems are related to SQL, that is, to the structure of tables, and some are related to the migration of server code from PL/SQL.

Strings

Historical feature of languages: an empty string in PostgreSQL is not equal NULL, these are two completely different values, but in Oracle it is equal. A simple way to solve this problem when migrating a DBMS is to add a comparison to NULL in all places where Oracle had a comparison with an empty string (IS NULL).

Another historical feature: Oracle traditionally uses the data type for strings VARCHAR2which is actually identical to the standard type VARCHAR. PostgreSQL only supports VARCHAR, so the code will have to be corrected. In addition, PostgreSQL has its own data type for texts – TEXT. In general, as they say, the correspondence between plug and socket is 99%.

Whole numbers

It's even more fun with numeric data types. A popular type in Oracle is called NUMBERand in the standard and in PostgreSQL – NUMERIC. However, Oracle allows you to process large integers with 38 decimal places. When switching to PostgreSQL you can, of course, use NUMERICbut this will be inefficient since summing such numbers will take hundreds of times longer than summing simple integer types such as bigint, smallint, integer. Here it makes sense to think about what data restrictions we have. If, for example, the number of products is no more than a million, a simple integer will definitely be enough.

In addition, PL/SQL itself has different data types: BINARY_INTEGER, PLS_INTEGER, BINARY_FLOAT And BINARY_DOUBLE. In fact, they are more efficient and correspond to what PostgreSQL has: integer, float, double precision.

I won’t even touch on a huge block of problems with data types for dates and times in this article – these data types are made differently in all systems. There's a guy somewhere datesomewhere – datetimesomewhere – timestamp.

Large objects

The next feature of Oracle is storing large objects separately from the rest of the fields of the record. Binary data is stored in types BLOBand text ones – in CLOB. To access them, special functions and procedures are used. The way Oracle stores large objects has changed over the years, with version 11 adding automatic deduplication, compression, and encryption capabilities.

PostgreSQL has similar mechanisms, but with limitations. In PostgreSQL, each database has its own table pg_largeobject. The maximum table size is 32 TB, one record is 4 TB, and the number of records is about 4 billion (integer). It is enough for everyday tasks, but for a large business it may be a problem.

Temporary tables, autonomous transactions, and other discrepancies

Temporary tables

In practice, it can be convenient to create some kind of table on the fly: take data from somewhere, group it, clean it, and so on. However, in some cases – for example, when the request is written by different people – it may be better to do all this in parts and break the task into stages. Then those temporary tables that are filled with data during the execution of a huge query will subsequently be unnecessary.

Oracle has global and, more recently, session temporary tables. In PostgreSQL there are no global temporary tables at all – there are only session tables with different modes:

  • deleting a table after a commit (invalidating the data dictionary cache);

  • deleting records from the table after a commit;

  • saving records after a commit.

In general, in order to move a global temporary table from Oracle to PostgreSQL, you will have to do a little fiddling.

Autonomous Transactions

Autonomous transactions were invented in PL/SQL so that some of the operations – INSERT, UPDATE and the like – was executed separately, and not in the context of the main transaction. This is convenient, for example, to ensure that entries in the audit log are guaranteed to be preserved even if it is rolled back.

This has always been a problem in PL/pgSQL. A good workaround is an extension to run queries against other DBMSs or in a separate session:

PERFORM dblink_exec('dbname=" || current_database(), ‘INSERT …’);

This is a salvation, but rather slow, since each time the command is executed, a connection is established, access rights are checked, and other operations, which can take up to a few extra seconds. As a result, you have to dodge again – for example, create a connection in advance via dblink_connect and use it many times.

For a long time, autonomous transactions in PL/pgSQL were performed this way, until an extension package was invented pg_variables, creating the possibility of global variables within a single connection session. By default they do not take transactions into account, but transactionality can be enabled:

SELECT pgv_set("vars', 'int1', 101);
BEGIN;
SELECT pgv_set('vars', 'int2', 102);
ROLLBACK;

SELECT * FROM pgv_list() order by package, name;
 package | name | is_transactional
---------+------+------------------
 vars    | int1 | f
 vars    | int2 | f

Through this mechanism, you can store both simply scalar variables and collections.

Finally, autonomous transactions were added in the paid version of PostgresPro. They look like this:

CREATE OR REPLACE FUNCTION myaudit() RETURNS boolean AS $$

BEGIN AUTONOMOUS

        INSERT INTO audit_schedule VALUES ('new audit',now());...

        RETURN true;

END;

$$ LANGUAGE plpgsql;

Fault Tolerance Tools

Let's move on to the next DBMS difference. Enterprise versions of Oracle have a built-in Oracle Data Guard mechanism, which allows you to create a hot backup database instance and automatically switch to it in the event of a system failure.

PostgreSQL has an alternative streaming replication mechanism, as well as additional external systems that handle backups and automatic failover: Patroni, Stolon, and Pg_auto_failover.

List of other incompatibilities

What other troubles arise when moving from PL/SQL to PL/pgSQL and require manual configuration:

  • hierarchical queries (in Oracle they are usually implemented with a special syntax unique to this DBMS, and PostgreSQL follows the standard);

  • parallel data processing (in Oracle this is more complicated, and in PostgreSQL there are more restrictions, which affects performance);

  • filtering conditions calculated at runtime;

  • use of queues;

  • using PL/SQL collections;

  • user constants and exceptions;

  • package-level global data structures;

  • operators MERGE, INSERT FIRST And INSERT ALL;

  • function DECODE;

  • scheduler.

Migratz tools

We've listed the problems, now let's talk about the good stuff – how to solve them. Let me make a reservation right away that today there is no “turnkey” solution from the “click – get a result” series. If they try to convince you otherwise, don’t believe it. Some vendors advertise their developments with the slogan “100% transition”, but put an interesting footnote in small print on the landing page: “*With our specialists.” That is, some of the processes will indeed be automated, but then someone will still “finish” the result manually.

Migration tools from PL/SQL to PL/pgSQL can be compared to a conditional Google Translate: yes, translation from one language to another will be carried out automatically, but before sending the text to a native speaker, it is advisable to have a professional translator edit it manually. Otherwise, you can be stupid and disgrace yourself. And in our case, it’s to make critical mistakes in a productive environment.

Nevertheless, there are a number of tools that, with varying degrees of success, mitigate the above-mentioned migration problems.

ora2pg

The international community of developers has been leading the project for a long time ora2pg, which deals specifically with data transfer from Oracle to PostgreSQL. At the same time, it connects to an existing Oracle database and only generates scripts for PostgreSQL. The project itself is written in Perl, so you will need to install it first.

The tool carries very well:

  • tables, views, indexes, sequences, restrictions RK/FK/unique/check;

  • access rights;

  • data, including BLOB (as INSERT or file for COPY);

  • code of procedures/functions/packages/triggers (requires review and manual modification).

In addition, ora2pg can:

  • create external data wrappers (fdw) for tables;

  • export view and materialized view data;

  • assess the complexity and cost of migration.

However, ora2pg has significant limitations:

  • in general, the PL/pgSQL code requires manual modification;

  • the tool does not support: local functions, compound triggers, variables in packages, standard Oracle packages (DBMS_*, UTL_*) and recursive queries with CONNECT BY.

Thus, ora2pg covers on average about 80% of the task of translating code from PL/SQL to PL/pgSQL. The solution has many useful features and custom settings. However, you need to understand that “every unhappy family is unhappy in its own way”: migration projects differ in complexity and scale, and in some places ora2pg will be of practically no use, while in others it will do plus or minus everything.

orafce

Another open source project for migration is orafce. It implements various packages and non-standard Oracle features in PostgreSQL, in particular:

  • pre-table DUAL;

  • data type date and functions for it;

  • packages DBMS_OUTPUT, UTL_FILE and some others;

  • triggers to fix comparisons with empty string/NULL.

By including the library, you won’t have to rewrite thousands of large queries—orafce will add missing types and functions and smooth out key differences in SQL query language dialects. In simple cases, the tool can also automate a large share of moving tasks.

ora2pgpro

ora2pgpro — an extended paid version of ora2pg from PostgreSQL. The tool migrates data not to PostgreSQL, but to Postgres Pro, using its additional capabilities, in particular the mentioned autonomous transactions.

Domestic developers have done a tremendous job and tried to take into account all the pitfalls. This version of the program is already very close to a full-fledged automated data transfer and probably covers more than 95% of the tasks. The project continues to develop rapidly. Nevertheless, risks remain, so you still have to check everything with your eyes and run tests.

Choosing a moving tool

There is no silver bullet here, so the choice depends mainly on the target platform, available resources and the source code arrangement in your project. If in your case there is no server code, then using orafce you can transfer only requests with little effort. If there is server code, then you will need to choose between two versions of ora2pg. The pro version is designed to port the DBMS to the paid PostgresPro platform with paid support, which not every business uses. If you use ora2pgpro in conjunction with standard PostgreSQL, the tool will have to be configured additionally, excluding certain paid platform features from processing.

Of course, professional DBMS migration requires deep expertise, but I hope that with this article I have a little “outlined the clearing” for further study of the issue.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *