Methods for integrating 1C and corporate data storage

The process of transferring information from 1C to a corporate data warehouse often turns into a headache for integrators. In general, 1C information systems easily interact with each other due to platform mechanisms and rules. To integrate 1C with QCD, you need to use third-party methods – we will consider them in the article.

Preparing for integration – determining the composition of the transferred data

The nature of the integration of the source system with the enterprise data warehouse differs from the exchanges within transactional systems. Firstly, because we are talking about unidirectional data transfer – when sources transmit information to one receiving system.

Secondly, the integration flow between source and receiver systems is modified much more often than in the case of information exchange between transactional systems. Due to changes in business demands, the structure of the QCD changes, which affects integration. New aspects of accounting in transactional systems inevitably fall into the space of analytical systems.

Frequent changes in requirements for the data collected are the main distinguishing feature of integration with analytical warehouses.

The very nature of data requests when working with an analytical warehouse sets the requirements for the variability of its structure. Company employees who decide to design integration flows must first answer a number of questions. The answer to any of them gives rise to a layer of iterative analytical work, which assumes that in order to build a stable process, constant interaction between the data analyst and the source system is necessary:

“Do the systems really have the data we need?”
“Are the accounting sections in all sources sufficiently filled in to build end-to-end analytics on them?”
“Can the data received be trusted?” etc.

Therefore, a single center for obtaining and transforming data is the main tool for data analysts when developing QCD. This is one of the aspects that puts ETL systems in the red corner of solutions involved in creating analytical warehouses.

Main methods of integration of 1C and KHD

Whatever the method of data transfer, the rules for generating data compositions must be replicated to a large number of sources. To simplify the data collection process, it is important that data analysts or data engineers can change the composition of the received data in QCD with minimal or no involvement of programmers.

Let's look at the most popular ways to obtain data from 1C.

Upload via file

Perhaps the most common method of integrating 1C with a corporate data warehouse. The 1C programmer writes the upload processing to a file and sets up an upload schedule to the directory, and then the files are uploaded to the CDD. With such an exchange, the 1C and QCD circuits remain separated from each other.

As a rule, two teams of specialists are responsible for developing such a solution: one works with unloading, the other with loading.

The advantage of this approach is its relative technological simplicity. It also gives rise to a large number of disadvantages:

Lack of information about the fact of loading data. The system operates in a one-way mode and is “not interested” in what happened to the file after it was saved.
You cannot download newly updated data. Because the system does not understand which data was successfully loaded (updated) into the QCD and which data was not.
In order for the storage to be guaranteed to have correct data, it is necessary to support unloading redundancy. This increases the load on the circuits and the data generation time.
Inconsistency of the upload structure and QCD in the event of changes in the structure of the receiver or source. An effect that often leads to stopping the storage enrichment process. Due to the fact that different teams are responsible for unloading and loading, this increases the time needed to debug integration processes.
The need to control the development and maintenance of downloads. Before creating a processing group for unloading, it is necessary to approve a corporate processing template, which will include a description of the software interface, as well as methods and methods for forming the data structure added from 1C. Otherwise, the process of maintaining QCD treatments that have been changed several times will become more difficult and expensive.
Decentralized support. To collect information from several sources, you need to set up synchronous updating of upload processing. And, in case of failures, each time answer the question: “Is the processing currently installed up-to-date?” This process requires automation and elaboration before starting an integration project.

Uploading via file is a simple and fast way to integrate 1C with corporate storage. But in practice it turns out that it contains many pitfalls, the number of which only increases with the growth of integration volume.

Integration via OData using REST interface

OData is an open web protocol for querying and updating data. It allows you to manipulate data using HTTP commands as requests.

The method assumes that 1C acts as a service for issuing data in response to incoming REST requests. Data is loaded into the receiver immediately, which affects the speed of integration and simplifies the process of its maintenance compared to the “file” option.

The benefits of this solution include:

1C does not store the processing performed in each of its instances – there is no need to maintain their relevance.
The connection speed of integration streams is higher compared to file exchange.

The disadvantages of this method are:

The difficulty of writing queries to an interface that has a rather complex structure.
Difficulties when working with large volumes of data. When transmitting them, the web server may cut off some of the transmitted messages. To prevent this from happening, it is necessary to provide mechanisms for dividing data into “portions”, which will complicate the support of working with queries.
Difficulty in error analysis. Errors that occur when receiving data are described rather sparingly by the interface. Debugging and correcting them takes considerable time.
As in the case of file exchange, it is necessary to maintain upload redundancy for consistency of 1C and KHD data.

The REST interface is clearly better than file exchange, because it can be used to transfer small amounts of data with elements that do not require writing complex queries (if you are willing to transfer part of the calculation logic to QCD).

Connecting to the DBMS directly

You can connect the ETL tool not to 1C, but to the DBMS on which 1C operates. There are methods and even commercial processors that allow you to write such queries. The advantage of this solution is the ability to use standard ETL tools to obtain data from SQL.

There are two disadvantages:

The storage structure is controlled by the 1C platform.
The method is unacceptable from the point of view of the 1C licensing policy.

In addition, this method of integrating 1C and QCD does not allow you to monitor changes in the databases.

Integration via HTTP and WS services

The essence of the solution is to create separate services for data transfer, which are built into the source configuration. Such services are used both on the source side and on the receiver side (or both).

By creating independent HTTP and WS services, specialists develop formats and data exchange logic from scratch and spend effort on writing an API for interaction between sources and receivers. But all the problems indicated in the previous methods of integrating 1C with QCD are solved almost one hundred percent.

The main advantage of integration via HTTP and WS services is the possibility of guaranteed data delivery. The disadvantage is the need to develop integration services (with the implementation of the logic of operation and interaction, taking into account the customization of the solution and with an eye to further expansion).

The Modus team chose this method as the main solution for integrating 1C with QCD. The HTTP service (adapter) performs two-way data exchange between ETL and 1C sources. It contains interfaces for receiving data, delivering it and controlling changes.

ESB systems and message brokers

The corporate integration data bus as a class of systems in a service-oriented architecture is the main tool for guaranteed delivery of information. Using ESB systems and message brokers (Apache Kafka, RabbitMQ), we deliver generated messages to the data warehouse at high speed.

Recently, in the 1C ecosystem there has been a special product “1C: Bus”. True to its name, it is used to interconnect systems via a data service bus.

Integrations for Apache Kafka and RabbitMQ are divided into two types:

The Apache Kafka or RabbitMQ client is initialized through separate microservices that launch HTTP services to exchange data with the 1C system.
The message broker client is initialized as a COM object (in which access to the object's data is achieved solely through one or more sets of related functions).

The advantage of exchanging via a bus or message brokers is guaranteed and high-speed delivery of information to recipients.

The disadvantage of this approach is revealed when filling the analytical repository. Programmers have to cope with difficulties when creating tools for working with data streams (customizing their logic, as well as setting rules for recording changes and generating messages).

And while at the service bus level the issue of efficient and secure message transport can be resolved, the same message brokers do not offer tools for configuring transmitted data.

In addition, when using message brokers that do not support standard exchange with 1C (and this is all brokers except the 1C: Bus product), you will have to introduce a number of microservices into the administration perimeter to establish interaction on each 1C instance.

How Modus ETL works with 1C

Modus ETL was developed as a product capable of daily collecting data from 800 institutions with their own application instances into a consolidated database. Integration flows were set up by 1C analysts in the quantities needed by business users.

The first requirement for working with data acquisition is the presence of centralized management of data flows and the option of creating rules that must be managed in a single system.

We combined data sources into sets with a common configuration and the same purpose. Each set has its own rules for obtaining data. In the case of 1C, these rules allow analysts and programmers in ETL:

write queries to be executed on the 1C source side;
receive the results of ACS schemes in configuration reports;
receive changes on exchange nodes;
run external processing and execute arbitrary code on the source side.

To obtain a large volume of data, analysts can set up batch loading, dividing the flows of collected information. Portions can be transmitted either in parallel or sequentially, one after another.

Integration mechanism

When receiving data from 1C, a two-way exchange of ETL with an HTTP service is performed.

The body of the request along with its authorization parameters is transferred to the source database. After the query is executed, the result is sent to ETL and written to the analytical data warehouse. If the response was not received or returned error information, ETL repeats its request several times. The number of repeated requests is determined on the ETL side. If the data has not been received, the transfer process is considered unsuccessful.

Here the process can be interrupted or go further – this also depends on the ETL settings.

Access control, authorization and administration

When installing the adapter on the source system, the administrator must determine how to authorize ETL and the source database.

If we want to give ETL the right to collect all data, we need to install the role of query execution in privileged mode. In other words, you need to define the availability of data for ETL at the level of roles and group profiles, and also establish a secure data retrieval role.

Authorization with the source can take place according to the basic authentication scheme through a login and password or using operating system tools. Administration of 1C infobases involves publishing an adapter service on the source side, which is a typical story for 1C administrators.

Let's sum it up

When choosing a method for integrating 1C with QCD, you need to take into account the fact that the composition of the information received often changes. Accordingly, to meet business needs, it is necessary to quickly change integration flows.

The preferred solutions for creating integration between 1C and a corporate data warehouse are buses and message brokers, as well as special integration services with ETL systems.

Regardless of the chosen method of integrating 1C with QCD, it is necessary to create a single data management center to collect consolidated information from source systems. Then data analysts will be able to change the composition of the downloaded information either independently or with minimal assistance from programmers. And this, in turn, will significantly reduce the development time of integration processes.

Methods for integrating 1C and corporate data storage

Preparing for integration – determining the composition of the transferred data