Where there is no alternative to the while loop

I’m sure many working programmers know this without me, but I decided to publish a couple of implementations of while loops, which I actively use as an automator, tester and ETL developer.

while not all(check_list)

When transferring data from one database to another, it is often necessary to transfer not one table, but two or more. To transfer means to delete the data in the table on the receiver and then insert the data taken from the source:

-- SQL

delete from my_shema.my_table 
      where my_column = 'my_variable'; 
commit;

insert into my_shema.my_table 
     select * 
       from my_shema.my_table@src_dblink 
      where my_column = 'my_variable';
commit;

Additionally, we need to crawl through the schema and find out dependencies between tables that can cause exceptions like ORA-02291 and ORA-02292 (this is when we cannot either delete data in a table without first deleting data from another table, or insert without first inserting data to another table).

Roughly speaking, we need to provide cascade deletion of data, and then cascade insertion.

To avoid this, you can send a DML operation to the end of the queue when a database exception is triggered, and you can use a loop for this while not all(check_list):

# python3.8
import cx_Oracle


# составляем чек-лист переноса данных, 
# если значение в списке 0, значит данные не перенесены, обратно, если 1
etl_objects = [('MY_SCHEMA', 'MY_TABLE1'), ('MY_SCHEMA', 'MY_TABLE2'), ...]
check_list = [0 for _ in range(len(etl_objects))]
i = 0
while not all(check_list):
  i = i % len(check_list)  # нужно для второго и последующих циклов
  try:
    if not check_list[i]:
      with cx_Oracle(user=user,
                     password=password,
                     tns=tns,
                     encoding='utf-8') as db_conn:
        with db_conn() as cursor:
          sql_text = (f"delete from {etl_objects[i][0]}.{etl_objects[i][1]} "
                      "where my_column = 'my_variable'")
          cursor.execute(sql_text)
          cursor.execute('commit')
          sql_text = (f"insert into {etl_objects[i][0]}.{etl_objects[i][1]} "
                      "select * "
                      f"from {etl_objects[i][0]}.{etl_objects[i][1]}@src_dblink "
                      "where my_column = 'my_variable';")
          cursor.execute(sql_text)
          cursor.execute('commit')
      check_list[i] = 1
  except Exception as e:
    print(f"Сработало исключение {str(e)}: объект загрузки отправлен в конец очереди")
  i += 1
  

It is clear that formally the download object is not sent to the end of the queue, but the loop while will be repeated until there are no unloaded objects left. It is also understandable that if exceptions other than ORA-02291 and ORA-02292 are triggered, the loop risks becoming infinite, but this example assumes that source and destination restrictions will otherwise not interfere with loading.

while i < len(table_trg) and j < len(table_src)

When testing DDL tables in different databases, you may need to compare their sets of fields and data types, which may differ. That is, the semantics of the tables are the same, but the implementation is different. For example, both tables store the same set of basic data, but the field names may differ, and fields with the same name and purpose may have different data types or data type restrictions.

In Oracle, you can display a set of fields and data types from two sources using queries:

-- SQL

select column_name, data_type, data_length ,..., 
  from all_tab_columns
 where owner="MY_SHEMA" and table_name="MY_TABLE";

select column_name, data_type, data_length ,..., 
  from all_tab_columns@src_dblink
 where owner="MY_SHEMA" and table_name="MY_TABLE";

Ideally, sets and field values ​​should be identical, but this is not always the case, especially when databases live their own lives and one life lags behind the other in development. So, to find out where it lags behind or where they differ from each other, you can use the code:

# python3.8
import cx_Oracle


with cx_Oracle(user=user,
               password=password,
               tns=tns,
               encoding='utf-8') as db_conn:
  with db_conn() as cursor:
    sql_text = ("select column_name, data_type, data_length "
                "from all_tab_columns "
                "where owner="MY_SHEMA" and table_name="MY_TABLE" "
                "order by column_name")
    cursor.execute(sql_text)
    table_trg = cursor.fetchall()
    sql_text = ("select column_name, data_type, data_length "
                "from all_tab_columns@src_dblink "
                "where owner="MY_SHEMA" and table_name="MY_TABLE" "
                "order by column_name")
    cursor.execute(sql_text)
    table_src = cursor.fetchall()

i, j = 0, 0
tmp_table_trg, tmp_table_src = [], []
while i < len(table_trg) and j < len(table_src):
  if table_trg[i][0] == table_src[j][0]:  # сравниваем названия полей
    tmp_table_trg.append(table_trg[i])
    tmp_table_src.append(table_src[j])
    i += 1
    j += 1
  elif table_trg[i][0] < table_src[j][0]:
  # если название поля меньше, то в правой таблице поля с таким же названием нет
    tmp_table_trg.append(table_trg[i])
    tmp_table_src.append((None, None, None))  # добавляем в правую таблицу "пустую" строку
    i += 1
  else:
    tmp_table_trg.append((None, None, None))  # добавляем в левую таблицу "пустую" строку
    tmp_table_src.append(table_src[j])
    j += 1

diffs = list(filter(lambda x: x[0] != x[1], zip(tmp_table_trg, tmp_table_src)))

Here are two implementations of loops through while I use it in my work.

First example ( while not all() ) helps to run cycles according to checklists, and the second ( while i < len(table_trg) and j < len(table_src) ), helps align lists of tuples.

So that my code above can be checked, if you do not have the ability to make queries to the database, you can rewrite my code to work with tables csv.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *