How we learned to effectively manage data growth with the transition to BW
Hi all! My name is Sergey Vyaltsev. I am an analytics systems architect in the Finance direction in the Data and Analytics domain. I would like to share how, thanks to migration to BW/4HANA, we were able to relieve the SAP BW storage database server without resorting to purchasing expensive equipment. I will dwell in more detail on the description of the new NSE technology and the results of its application in our system.
Prerequisites for revising the approach to data storage
Speaking about the server architecture of the HANA database of the SAP BW corporate data warehouse in Tape, it is worth immediately noting that we use a Scale-Up approach. This means that we have one physical server, which in our case can hold up to 16 TB of RAM.
Within the scope of this article, I will not dwell in detail on the reasons for choosing this particular architecture, but taking into account the large number of implemented projects, as well as the organic growth of the company, already in 2020, at times of peak loads, the HANA database server degraded in performance, which led to delays in data downloads from source systems, long response times of dialog processes, freezing of reports, and as a result – receiving negativity from business users.
Schematically, the process of increasing infrastructure capacity, starting in 2016, looked like this:
Thus, already in 2020, the issue of changing the approach to data lifecycle management became urgent.
Possible options at that time:
Reduce, having previously agreed with the business, the data storage horizon by archiving data, say, older than the “Current + 1 year” horizon, and, if necessary, upon request from users, retrieve them from the archives. This concept was already obsolete by 2020, as shrinking data storage horizons were reaching their limit, and businesses were not enthusiastic about proposals to shrink their demands even further.
Take advantage of technology NSEwith which you can upload part of the data into “warm” storage on disks, the cost of which is several times cheaper than HANA memory. To be able to use this technology at the application level, system migration to BW/4HANA was required.
Purchase of a new HANA server (with 32 TB of memory), data migration while maintaining the Scale-Up architecture.
Purchase of a smaller HANA server (16 TB), transition from Scale-Up to Scale-Out architecture.
After analyzing the above options, my team and I settled on the second option. The size of the disk space that can be used for NSE technology was 32 TB and made it possible to upload part of the data for “warm” storage. In addition, the costs of implementing the migration project to BW/4HANA were noticeably lower than the acquisition of new infrastructure capacity.
As a result, it took us about 8 months to migrate the system from BWonHANA 7.5 to BW/4HANA 3.0. As a result of this, we had access to data management using new technology, which at that time was only available for Scale-Up architecture.
Managing data volumes with NSE
NSE technology allows you to control the size of data loaded into “hot” memory, with virtually no loss in performance when working with data from “warm” partitions.
Graphically it looks like this:
HANA defines the maximum percentage of RAM (main memory) for using NSE data – the so-called buffer cache (BC). The default is 10%.
This parameter can be changed in database configuration using parameters max_size And max_size_relbut we did not change it, leaving the default value.
Conceptually, NSE is intended for data that changes rarely or not at all and is also accessed less frequently than hot memory data. If the data in NSE changes, then the delta area (not included in BC) is used for changes, similar to the data in hot memory. Therefore, it is also advisable to control the memory size in the delta area for NSE partitions.
In system HANA View M_CS_TABLES, the value of the MEMORY_SIZE_IN_TOTAL field takes into account NSE data. Therefore, in a situation where table sections translated to NSE are completely placed in NSE, M_CS_TABLES.MEMORY_SIZE_IN_TOTAL for “warm” partitions can only differ slightly from the values in “hot” partitions.
BC cannot be more than the specified percentage, and when “warm” partitions do not have enough space in it, repression to disk. Empirical ratio 8 to 1 to assess the size of BC sufficiency. 8 TB of warm partition data on disk corresponds to 1 TB in BC.
So, having gained access to the new technology, the first thing we needed to do was create a list of ADSOs with the highest memory consumption.
This can be done in the HANA SQL console using the script:
SELECT SUBSTRING( LEFT( table_name, LENGTH( table_name )-1 ), 7 ) AS adso, -- тех. имя ADSO
mem_tot_gb, -- размер в Гб
rec_cnt -- число записей
FROM ( SELECT table_name,
ROUND( SUM( (memory_size_in_total)/1024/1024/1024 ),2 ) AS mem_tot_gb,
SUM( record_count ) AS rec_cnt
FROM m_cs_tables
WHERE table_name LIKE '/BIC/A%2'
GROUP BY table_name
)
WHERE mem_tot_gb > <лимит_потребления_памяти_на_одну_таблицу>
ORDER BY mem_tot_gb DESC
Since the amount of memory occupied directly by data after migration to BW/4HANA was about 11.5 TB, at the initial stage it was necessary to reduce this amount as much as possible.
Therefore the parameter
The field by which ADSO is partitioned must meet the following criteria:
be part of the key, which always full;
be a calendar sign;
the calendar attribute must be used when accessing ADSO (when loading from it upstream, as well as when filtering in reports);
the field should allow you to create uniform partitions with the number of records per partition being 200-250 million.
Of the previously formed ADSOs, about 85% fell under the above-mentioned partitioning criteria and were successfully partitioned in all systems of the SAP BW landscape. Then some of the batches were transferred to “warm” storage.
The “warm” storage horizon was selected based on the size of the object, the frequency of data use, and the granularity of the field by which the object was partitioned. As a rule, these are partitions older than the current year + half of the previous one.
It is worth noting that there are two ADSO partitioning options: Static and Dynamic partitioning. We settled on the more reliable and proven one – Static. This allowed us to avoid the risks of dynamic partitioning, including the loss of critical data, being disconnected from vendor support.
Yes, static partitioning requires maintaining the current state of the partitions, monitoring the volume of data to reduce the periods stored in the partitions (they must be balanced in size), as well as the availability of time and funds to carry out the corresponding work.
We resolved this issue with the help of the approved regulations, according to which, once a year, work is carried out to identify new potential objects for transfer to NSE, add partitions (a year in advance) to existing objects and transfer the periods obsolete over the past year to “warm” storage.
Difficulties of translation to NSE
We encountered a number of difficulties, despite the technical simplicity of transferring the object to NSE, consisting of the following points:
cutting the necessary batches;
turning on the Warm checkbox and selecting the Temperature Maintenance mode on the Data Tiering Properties tab:
Firstlyslicing partitions and enabling the Warm checkbox requires object remodeling in the RSMONITOR transaction, which for large amounts of data can exceed 12 hours (in our case, this is an object of ~180 GB in size, ~6 billion records). This means that there is a risk of not having time to start regularly loading data into the object, and in case of possible errors during remodeling, of losing access to important reports (it is worth noting that during the remodeling process, reporting built on the object is available). Thus, the translation into NSE of memory-heavy and reporting-critical ADSOs has virtually no margin for error.
An alternative to remodeling on live data is:
creating a copy of ADSO (type Staging DataStore Object), transferred to NSE. For a copy of an object, it is necessary to take into account that the partition cannot contain more than 2 billion records, so if the original object was partitioned at the database level using an SQL expression, for example, by hash, then similar partitioning must be applied to its copy;
transferring data to a copy;
complete data deletion mainly ADSO;
transfer of transport requests with sliced partitions and the Warm property enabled;
transfer data back to the main ADSO;
distribution of partitions by temperature storage areas.
This approach also has its disadvantages:
Complete deletion of a large number of records is a rather lengthy process that can take more than one hour, especially if the deletion is performed at the application level. A faster way would be the command truncate table from the database level;
if the source object was previously partitioned manually from the database level by hash, for example, into 8 partitions, then when transferring transport requests with chopped static partitions, ADSO will be activated, as a result of which the number of hash partitions will become equal to 1. How relevant this is is up to the person responsible for deciding transfer of the object to NSE;
The reverse transfer of data to the source ADSO (based on personal experience) is recommended to be performed with queries not exceeding 150 million records, so that their further activation/compression is completed in adequate time.
Secondlywhen remodeling data, errors of the following type may occur:
Such errors occurred mainly in the production system. Running the RSDDB_LOGINDEX_CREATE report helped correct the error, after which the remodeling run completed successfully.
An important point when transferring an object to NSE is to follow the sequence of actions with ADSO:
cutting partitions (separate transport request) → Remodeling;
turning on the Warm checkbox and selecting the On Partition Level option (separate transport request) → Remodeling.
Transfer to target systems must also be carried out in the specified sequence, because at step 1, ALTER TABLE *1, *2, *3, *5 (for non-cumulative) ADSO tables are executed, in which HASH partitions remain as partitions of the first level, and RANGE partitions are cut at the second level.
At the second step, reverse repartition occurs, in which RANGE partitions become primary, and HASH partitions become second-level partitions.
This is why the order in which transport requests are transferred is important. In one of the cases, we combined two steps into one request and, as a result, caught a crash in the remodeling process with a further rollback to the original state and re-transfer of each of these steps already in individual requests.
Procedure for working with objects that are not subject to transfer to NSE
As I mentioned earlier, out of the entire list of potential objects for transfer to NSE, about 85% of ADSOs met the necessary criteria.
In the case where the calendar attribute is in the key, but is not used when accessing ADSO, or the attribute is not in the key and is always full, we have agreed on a storage horizon that is most critical for users. Historical data began to be uploaded to a copy of ADSO (hereinafter referred to as archived ADSO) with naming included in the relevant documents on data archiving and object naming.
Next, we upload the archived data to disk using the UNLOAD command, called in the Z-program.
The advantages of this approach:
Only BW consultants have access to the archived ADSO, since reporting is not based on them;
fast data recovery;
The data is not physically stored in hot memory. Even if the consultant brings some of them into memory, the procedures we have configured to control the use of data will return them back to disk.
Unfortunately, there are large ADSOs that do not contain calendar features. Accordingly, it is not possible to divide the uploading of historical data into equal intervals.
We put such objects on hold, periodically looking at their volumes. EWA reports help us with this, as well as a number of our own developments for monitoring the state of the system.
As an alternative, when the partition sizes in such ADSOs approach 2 billion, we consider the possibility of hash partitioning at the database level.
Brief summary
Thanks to the new NSE technology, which became available after migration to BW/4HANA, as well as the mechanism for forcing historical data to be uploaded to disk, we were able to reduce data storage volumes in “hot” memory to 6 TB, i.e. almost twice:
And the most important effect that was achieved is ability to manage “hot” data, which, with increasing load during peak sales seasons, helps the system cope with a sharp increase in data, perform calculations and ensure that data is ready for reporting in an acceptable time that meets the established SLA.
Plans
Of course, taking into account the organic growth of the company, M&A, and increased sales volumes, we cannot count on the fact that we will be able to endlessly unload the “hot” memory. This is confirmed by preliminary measurements of the increase in memory occupied by data.
That is why we have plans to switch to Open Source solutions that are integral components LakeHouse architecturewhich will provide a level of performance that is not inferior to reading data from “warm” storage into HANA memory. Historical data will be moved there first, and in the future, depending on the performance shown, a final transition from SAP BW is possible.
Time will show!