SQL query optimization in Oracle

Hi all! Mikhail Potapov, Chief System Analyst of Rostelecom Information Technologies, is in touch. In the company I am engaged in the development of reporting for the B2B segment. The performance of each report directly depends on correctly built SQL queries to the Oracle database, since when working with large amounts of data, the speed of query execution can be significantly reduced. Reducing the speed greatly complicates the work with reports for the end user, and in some cases even makes it impossible.

In this article, we will look at the basic principles of query optimization in Oracle SQL, which will help speed up the work with the database and increase work efficiency. We will talk about various query optimization techniques, such as using indexes, proper query writing, and other techniques that will allow you to achieve maximum performance when working with Oracle SQL.

Basic aspects of query optimization

1. Minimize the use of DISTINCT.

Keyword DISTINCT can be used to remove duplicates from the resulting data set. However, using DISTINCT can slow down the query because the database must perform additional operations to remove duplicates. Therefore, it is recommended to use DISTINCT only when it is really necessary.

2. Minimize the use of ORDER BY.
Keyword ORDER BY is used to order the result dataset by one or more columns. However, using ORDER BY can slow down query execution, especially if you are sorting on a large amount of data. If sorting is not necessary, then it is better to avoid using ORDER BY, or use it in the program interface:

  • by column filter in DBMS – Oracle SQL Developer;

  • by exporting to excel and then using the built-in filters;

  • by sorting directly in Oracle BI.

3. Using partitions.
Partitioning (partitioning) tables is the process of dividing a table into smaller ones, called partitions, to improve performance and facilitate data management. Each partition contains a separate set of data that can be processed and stored independently from other partitions.

When accessing a table that has partitions, it is necessary to put a filter on the partitioned field (in the WHERE clause or the JOIN filter). In this case, do not use date conversion to another format, since in this case the entire table will be scanned instead of a specific partition.

For example, when accessing the SALES table partitioned by the SALE_DT field, the query will look like this:

SELECT * FROM SALES WHERE SALE_DT >= TRUNC(SYSDATE, 'MM');
SELECT * FROM SALES WHERE SALE_DT >= '01.07.2023 00:00:00';

4. Selection of necessary fields in SELECT.

When referring to the table, enter in the selection only those fields that are really needed. Do not use the “*” character to call all fields from the table, this increases the query / subquery execution time.

SELECT * FROM SALES;
SELECT SALE_ID, SALE_DT FROM SALES;

5. Connection of tables (JOIN’s) by keys.

5.1. Don’t use a connection by field inequality.

For example, if you want to select only those rows from the SALES table for which there is no SALE_ID in the ORDERS table, you can use the MINUS function.

SELECT s.SALE_ID FROM SALES s
    LEFT JOIN ORDERS o ON o.SALE_ID != s.SALE_ID;
SELECT SALE_ID FROM SALES
    MINUS
SELECT SALE_ID FROM ORDERS;

5.2. Do not use key data transformation table joins.

SELECT * FROM SALES s
    LEFT JOIN ORDERS o ON TO_NUMBER(o.SALE_ID) = s.SALE_ID;
SELECT * FROM SALES s
    LEFT JOIN ORDERS o ON o.SALE_ID = s.SALE_ID;

5.3. Minimize use as keys attributes whose relationships are performed on a many-to-many basis.

5.4. Don’t use a connection by text fields (with type VARCHAR2 / CLOB / LONG).

5.5. Do not use operator OR in the table join condition. If you need to link simultaneously by condition from different tables, it is better to make 2 separate joins, and then link their result in select.

SELECT o.order_name FROM SALES s
    LEFT JOIN JOBS j on j.job_id = s.job_id
    LEFT JOIN ORDERS o ON (o.SALE_ID = s.SALE_ID OR o.job_id = j.job_id);
SELECT nvl(o1.order_name, o2.order_name) as order_name FROM SALES s
    LEFT JOIN JOBS j on j.job_id = s.job_id
    LEFT JOIN ORDERS o1 ON o1.SALE_ID = s.SALE_ID
    LEFT JOIN ORDERS o2 ON o2.job_id = j.job_id;

6. The right choice of operators.

6.1. Instead of using operator IN to check if a value is in a list, you can use operator EXISTS. The EXISTS operator stops at the first match, while the “IN” operator performs a full scan of the list of values.

SELECT * FROM employees WHERE department_id IN
    (SELECT department_id FROM departments);
SELECT * FROM employees e WHERE EXISTS
    (SELECT 1 FROM departments d WHERE e.department_id = d.department_id);

6.2. Do not use operator LIKE where you can use the exact definition of a value through equality, or use a reference to determine by identifiers.

SELECT * FROM employees where department_name like '%Тестир%';
SELECT * FROM departments where department_name="Тестирование";
 
--ИЛИ--
 
SELECT * FROM departments where department_id = 1;

6.3. Do not use operator UNION, if you do not want to remove duplicates when joining tables. Enough to use UNION ALL.

SELECT sale_num FROM sales
UNION
SELECT sale_num FROM sale_services;
SELECT sale_num FROM sales
UNION ALL
SELECT sale_num FROM sale_services;

7. Using CTEs.

General expression table, or Common Table Expression (CTE) is a temporary named subquery table that can be used inside the main query or another CTE. It provides a more readable and modular way to write complex queries.

CTE can be used to optimize queries in Oracle for several reasons:

  • Improved readability: CTE allows you to break a complex query into smaller logical blocks, making it clearer and easier to maintain and debug. In addition, the CTE can be named so that it can be reused in other parts of the query.

  • Reducing code repetition: If the same logic is used in several parts of the request, then it can be moved to the CTE and used in different parts of the request. This avoids code duplication and makes it easier to maintain.

  • Performance improvement: Using CTEs can help the query optimizer make better decisions about the query execution plan. The optimizer can use the CTE information to make decisions about the order in which operations are performed and the choice of optimal indexes.

  • Recursive Queries: The CTE can also be used to write recursive queries that require a query to be executed based on its own results. This is especially useful for working with hierarchical data such as trees or graphs.

However, be aware that the use of CTEs may have some limitations and may require additional resources. For example, if a CTE is used multiple times within a request, then a complete CTE request will be executed each time. Therefore, it is necessary to carefully evaluate the benefits of using CTE and make sure that it exceeds the costs of its implementation.

Usage example and syntax:

WITH sales_summary AS (
    SELECT product_id, SUM(quantity) AS total_quantity
    FROM sales
    WHERE sale_date >= TRUNC(SYSDATE, 'MM') -- начало текущего месяца
    GROUP BY product_id
    )
    SELECT p.product_name, s.total_quantity
    FROM sales_summary s
    JOIN products p ON p.product_id = s.product_id
    ORDER BY s.total_quantity DESC;


-- ИЛИ ТАК (ТОЖЕ ПРАВИЛЬНО):
SELECT p.product_name, s.total_quantity FROM (
    SELECT product_id, SUM(quantity) AS total_quantity
    FROM sales
    WHERE sale_date >= TRUNC(SYSDATE, 'MM') -- начало текущего месяца
    GROUP BY product_id) s
    JOIN products p ON p.product_id = s.product_id
ORDER BY s.total_quantity DESC;

8. Using indexed views.

Indexed Views (Materialized Views or MVs) are views that are stored as physical tables and are automatically updated when the data changes. Using indexed views can greatly improve query performance, especially for queries with large amounts of data.

CREATE MATERIALIZED VIEW mv_employee_names
BUILD IMMEDIATE
REFRESH FAST ON COMMIT
AS SELECT first_name, last_name FROM employees;
 
SELECT * FROM mv_employee_names WHERE first_name="John";

9. Minimize the use of multiple nested subqueries.

Multiple nested subqueries can be bad practice due to the following reasons:

  • Poor performance: Each subquery will be executed separately, which can result in increased load on the database server and increased query execution time.

  • Difficulty in reading and understanding: The more nested subqueries, the more difficult it becomes to read and understand the query. This can make it difficult to maintain and maintain the code in the future.

  • Limited Optimization Options: When using multiple nested subqueries, the database optimizer may have limited options for optimizing the query and choosing the optimal execution plan.

Alternatively, nested subqueries can easily be replaced with CTEs.

SELECT sale_num FROM (
    SELECT DISTINCT sale_num, sale_dt FROM (
        SELECT * FROM sales where sale_num is not null));
WITH stg1 as (
    SELECT * FROM sales where sale_num is not null)
,stg2 as (
    SELECT DISTINCT sale_num, sale_dt FROM stg1)
SELECT sale_num FROM stg2;

10. Using hints HINT.

Hints hint allow you to specify specific methods for executing queries to the database optimizer. For example, you can tell the optimizer to use a specific JOIN type, or choose a specific index.

10.1. If the database and server allow, you can use parallel query execution to distribute the load and speed up query execution.

SELECT /*+ PARALLEL(employees, 4) */ * FROM employees;

10.2. You can tell the optimizer to use specific index or union.

SELECT /*+ INDEX(employees idx_employees_name) */ * FROM employees
WHERE first_name="John" AND last_name="Doe";

10.3. There are many other tips you can use, including:

  • /*+ FULL(table_name) */ – tells the optimizer to use a full table scan instead of an index scan.

  • /*+ ORDERED */ – tells the optimizer to keep the table join order as specified in the query.

  • /*+ USE_HASH (table_name) */ – tells the optimizer to use a hash join to execute the query.

  • /*+ LEADING (table_name) */ – tells the optimizer to start joining tables from the specified table.

  • /*+ NO_MERGE */ – tells the optimizer not to combine multiple operations into one.

  • /*+ NO_REWRITE */ – tells the optimizer not to use views to execute the query.

  • /*+ NO_EXPAND */ – tells the optimizer not to use view extension to execute the query.

  • /*+ OPT_PARAM (parameter value) */ – allows you to set the value of the query optimization parameter.

However, you should be careful when using HINT hints, as misuse can lead to undesirable results.

11. Using indexed attributes.

Indices in Oracle, these are data structures created on the basis of table columns that allow you to speed up the search and sorting of data. Indexes allow you to quickly find rows in a table that contain specific values ​​in indexed columns. Indexes in Oracle can be created on one or more columns of a table.

There are several types of indexes in Oracle:

  • B-Tree (Binary Tree) Indexes: This is the most common type of index in Oracle. They use a tree structure to organize data and provide a quick search through column values. B-Tree indexes are suitable for equal and range queries.

  • Bitmap indexes: this type of index is used to optimize the execution of queries containing conditions for comparing the presence or absence of values ​​in columns. Bitmap indexes create a bitmap where each bit corresponds to a value in the indexed column.

  • Functional indexes: These are created based on expressions or functions applied to the columns of a table. Functional indexes allow you to speed up the execution of queries containing search terms based on values ​​obtained from expressions or functions.

  • Clustered indexes: These are used to physically organize the data in a table. Clustered indexes define the order in which rows are stored in a table based on the values ​​of the indexed column.

  • Unique Indexes: They guarantee uniqueness of values ​​in an indexed column or combination of columns. Unique indexes allow you to quickly check for duplicates and ensure data integrity.

However, keep in mind that creating and maintaining indexes requires additional resources and can slow down the execution of data modification operations (insert, update, delete). Therefore, you need to carefully evaluate the need to create indexes and choose the appropriate index types for specific queries and tables.

Suppose we have a table “employees” with columns “employee_id”, “first_name” and “last_name”. To find an employee by first and last name, we can create an index on the “first_name” and “last_name” columns. This will allow the optimizer to quickly find matching records.

CREATE INDEX idx_employees_name ON employees (first_name, last_name);
 
SELECT * FROM employees WHERE first_name="John" AND last_name="Doe";

12. Choice of JOIN type when joining tables.

12.1. Do not use CROSS JOINbecause it returns the cartesian product of the rows from both tables.

SELECT s.SALE_NUM, o.ORDER_ID FROM sales, orders
WHERE sales.sale_id = orders.sale_id
SELECT s.SALE_NUM, o.ORDER_ID FROM sales
JOIN orders ON sales.sale_id = orders.sale_id

12.2. Do not use LEFT JOIN in cases where the use of INNER is sufficient (for example, in subqueries or CTEs).

12.3. Do not use FULL JOINunless absolutely necessary. Unlike LEFT, INNER and RIGHT, where the hash table is built for one table, with FULL JOIN, the hash table is built for two at once, which increases the query execution time.

What’s the output?

The execution time of a complex query in the database in the worst case should not exceed a couple of minutes.

Important! The query execution time is evaluated strictly when unloading the entire amount of data, and not the first 50 rows, as it works, for example, in Oracle SQL Developer. To unload the entire volume, just click in any cell of the table and press the combination CTRL + A (“select all”).

If following the basic rules above did not lead to the desired result, you need to contact query plan.

Query Plan Analysis

Query Plan in Oracle SQL is information about how Oracle plans to execute a query and what operations will be used. Reading the query plan helps you understand what type of JOIN and what indexes are being used, as well as evaluate query performance.

Building a query plan (let’s consider the example of Oracle SQL Developer software) is called through F10 or by selecting the appropriate button on the panel (the cursor should be on the query, without highlighting its individual parts):

The general view of the query plan looks like this:

1. operation (Operation Type) – Indicates the type of operation to be performed on the query plan, such as TABLE ACCESS, INDEX SCAN, JOIN, and so on.

2. Object_name (Object name) – points to the tables/indexes/views that are used in the query plan. You can see which tables are being scanned in full (FULL), which are being scanned using indexes (INDEX), and which tables are being joined (JOIN).

3. Options (Parameters) – indicates a specific action when executing a request for each type of operation.

4. cardinality (cardinality, or number of rows) – refers to an estimate of the number of rows that will be processed or returned as a result of query execution. The higher the value of this indicator, the less efficient the query. Cardinality can be evaluated for each operation in the query plan and is used by the optimizer to select the most efficient query execution plan. Incorrect cardinality estimation can lead to incorrect execution plan selection and, consequently, poor query performance. The Oracle optimizer uses table and index statistics as well as data distribution information to estimate cardinality.

5. cost (cost estimate)indicates the cost estimate for each operation in the query plan. Lower cost usually means more efficient operation.

How to read?

When parsing, the plan is viewed from the bottom up and from the nested process itself (i.e. from right to left).

1. In the process of viewing, first of all, attention is drawn to lines with a large value COST And CARDINALITY.

2. In addition to finding big COST and CARDINALITY in the rows of the plan, you should look at the column OPERATION plan for the presence of HASH JOIN. Connection by HASH JOIN results in in-memory table joins and seemingly more efficient than nested joins NESTED LOOP. The disadvantage of this connection is that if there is not enough memory for the table (tables), disks will be used, which will significantly slow down the query.

3. Particular attention in the plan should also be given to the lines in the plan with operations full table and index scans into a column OPTIONS: FULL – for tables and FULL SCAN, FAST FULL SCAN, SKIP SCAN – for indexes. The reasons for a full scan can be problems with indexes: missing indexes, index inefficiencies, blocking (for example, when applying a string function on an indexed field). With a small number of rows in a table, a FULL table scan can be normal and more efficient than using indexes.

4. The presence of a parameter in the OPERATION and OPTIONS columns MERGE JOIN CARTESIAN says that there is no complete link between some tables. This operation occurs when using CROSS JOINwhich is highly discouraged.

Physical JOIN’s

Various types of JOINs are available in Oracle SQL. Each of them has its own characteristics and can be effective in different scenarios.

1. MERGE JOIN – is used to join two tables based on sorted columns. It is based on the principle of “divide and conquer”, where each table is divided into sorted blocks of data, and then these blocks are combined. MERGE JOIN is efficient when performing JOIN operations on large datasets, especially if both tables are sorted by their respective columns. It can also be used in JOIN operations on non-key columns.

However, MERGE JOIN requires data to be sorted first, which can take time and resources. Therefore, it may be less efficient if the data is not sorted or if a large amount of data needs to be sorted.

2. HASH JOIN – used to join two tables based on the hash values ​​of the columns. It creates a hash table where each element contains a pair of values ​​from two tables that have the same hash value. HASH JOIN is effective when performing JOIN operations on large datasets, especially if tables are not sorted or if JOIN is required on non-key columns.

However, HASH JOIN requires a large amount of memory to create a hash table, so it may be less efficient if available memory is limited.

3. NESTED LOOP JOIN – uses nested loops to perform a JOIN operation. It sequentially processes each row from one table and, for each row, looks up the corresponding row in the other table. NESTED LOOP JOIN is effective when performing JOIN operations on small datasets or when one of the tables has few rows. It can also be efficient if indexes on the appropriate columns are available for the JOIN operation.

However, a NESTED LOOP JOIN can be slow if one of the tables has a large number of rows, or if there are no matching indexes for the JOIN operation.

Working without an index

Working without an index

Working with the index

Working with the index

Additional information

Beautiful animations taken from the site https://bertwagner.com/there are also links to a video (in English), which explains in detail about the physical connections of the tables.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *