Pros and cons of different DWHs as data sources for BI

By 2028, the cloud storage market is expected to reach 7.69 billion dollars. At the same time, the global market for cloud data storage in 2026 should reach 10.42 billion dollars – this is almost 3 times more than the same figure in 2021.

The enterprise data warehouse has long been the most popular source of data for business intelligence, and this is unlikely to change radically anytime soon. But along with it there are also hybrid systems Hybrid Transaction / Analytical Processing, which combine data analytics of transactional (accounting) systems and data analysis systems. It is also very popular to use data lakes directly linked to the layer. B.I.cloud data storage technologies are actively developing.

In this article, we will briefly look at several options for organizing analytical warehouses. So, let's go!

Single Source of Truth

Trust in data is the main and main requirement for analytical systems. Analytics will not produce the expected results if there are doubts about the numbers it shows. In addition, it must be taken into account that the volume of data in the analytical warehouse will grow over time, and, unlike accounting (transactional systems), QCD are not designed for anything to be removed from them.

Often customers want a single corporate repository so that managers can receive only “the truth and nothing but the truth.” Being a single source of truth is a common challenge for enterprise repositories. Knowledge and information about all business processes and business objects should be collected and stored in one place. All businesses must be described in one place and once. A centralized storage unites information from all parts of the enterprise.

Enterprise relational data warehouse

A centralized relational repository is a repository that combines knowledge and information about all business processes and business objects in the form of interconnected tabular structures. Data in the warehouse must arrive in a pre-prepared structured form suitable for placement inside tables. Normalization of data in the warehouse is completely determined by the approach to organizing the analytical warehouse. On the one hand, the normalized structure used in transactional stores does not provide adequate performance. On the other hand, denormalization, which provides cheaper and faster analysis, leads to data redundancy and increased update time. There are many standard and hybrid methodologies for organizing data storage in analytical systems. The following are the pros and cons of relational storage.

Pros:

  • High quality of incoming data. The data that will go into storage already has a tabular structure, which greatly simplifies further manipulation with it. And data providers – owners of source systems – are responsible for their preparation.

  • The repository core can act as a single “source of truth” for analysis. Based on certain amounts of data, you can build showcases that will be a reliable source of information.

  • Ability to analyze data over a long period. This opportunity is achieved by storing historical data in a single structure.

  • High scalability and support for large data volumes (storage, processing and fast delivery).

  • Controlled access to a single repository.

Cons:

  • Inevitable, gradual increase in costs for administration and support of the storage structure.

  • An increase in the volume of data leads to growing doubts about its reliability. With the growth of data volumes and the number of their consumers, contradictions about the “truth” of this data increase, as it becomes increasingly difficult to agree on terms and methodology.

  • The cost of making changes. Risk of breaking existing ETL-processes: reusing display cases as sources for subsequent analysis leads to extreme complexity of the storage structure.

Cloud Data Warehouse

In this case, storage is deployed in a cloud infrastructure, providing flexible scaling by increasing the number or quality of servers and high availability without the need to create and manage physical infrastructure.

Pros:

  • High availability and reliability of data, remote administration and maintenance. In particular, the cloud infrastructure allows you to quickly expand the fleet of analytical storage components. For example, organize hybrid structures (data lake/ DW-storage) in the cloud is much simpler.

  • Fast implementation and integration with cloud tools for data processing, access control, DevOps and other services.

Cons:

  • Dependency on cloud service provider.

  • Possible data security issues due to the transmission of information over the Internet.

Operational Data Store (ODS)

This storage is essentially a layer DW-storage, which in turn can be used as an independent source. The main feature of the ODS storage is its high speed of updating the analyzed data, which in some cases makes it possible to solve operational analytical problems on highly relevant data without the load of transactional systems.

Pros:

  • Data in O.D.S.-layer relevant, which allows analysis without loading the transactional system with “expensive”, from the point of view of relational databases, analytical queries.

  • Serves as a data source for operational showcases – analytical systems that require a high degree of data relevance, for example, situational centers where current information from many company systems is collected.

Cons:

  • Not intended for long-term data storage and complex analyticsbecause O.D.S. often overwrites or updates old data with new ones.

  • There are no capabilities to analyze historical data.

Data lakes

When working with big data, it is difficult to bring all of its types into a form suitable for storage. Such work will require complex ETL-processes that will take too long to complete when working with large amounts of data. Therefore, the data lake concept follows the ELT approach. Data is recorded in the “lake” in the form in which it comes from primary sources with minimal costs for their preparation. This greatly simplifies the issue of storing arbitrary data. It is assumed that when working with them, data engineers will structure the data in such a way as to answer specific business questions. In general, a data lake allows data analysts to process unstructured data and load the results into B.I.-layer.

Pros:

  • Cheap storage of large amounts of data

  • Data is not pre-clearedAccordingly, they are less susceptible to transformation, which provides almost unlimited possibilities for their analysis according to arbitrary criteria.

Flaws:

  • Due to the lack of structure, information from the data can be interpreted differentlywhich ultimately leads to discrepancies between the same concepts and meanings of the same business indicators.

  • Unsystematic collection of “everything that is” can lead to an increase in the entropy factor in the data lake. As a result, the company does not understand what data is being analyzed, and the “lake” turns into a “swamp” of data, which leads to an even greater decrease in trust in the information.

What to look for when choosing an enterprise storage model

Choice DWH may impact the effectiveness of business intelligence and data management. Here you need to take into account many factors, since each storage has its own characteristics and capabilities that solve specific types of problems. For example:

Volume and growth of data

Scalability is a key aspect for choosing storage. If the company's data is growing rapidly, then DWH must cope with increased volumes without significantly reducing productivity. For example, cloud storage can easily scale, whereas on-premise DWs may require significant hardware investment.

Performance and processing speed

For B.I. fast results are critical. Choice DWH must consider what data and how quickly the system must present it, who the recipient of the data is, how current the data must be and how often it must be updated.

Implementation speed

Fast Deployment DWH very important for organizations that need operational data analysis – retail chains, banks, streaming services, etc. For example, cloud DWH typically implemented faster than traditional on-premises storage, which requires the deployment of physical infrastructure. But the issue of speed is inextricably linked with the structure of the collection organization. For example, choosing relational DW-storage, organizing a normalized core will take much more time than organizing thematic showcases. But in the future, the situation with storage maintenance as it grows may change radically.

Price

Companies must evaluate not only current costs, but also forecast long-term costs. For example, implementing and maintaining an on-premises DW may require significant investment in company infrastructure and IT personnel. On the other hand, cloud solutions are paid as you use them – this can reduce initial costs, but if the company’s data is actively growing, the price for cloud DWH may change dramatically.

Flexibility and adaptability

Business strategies and goals can change – this requires DWH Flexibility to support new queries, data sources, and analytics models. All this helps reduce adaptation time and reduce the costs of change.

Support for ETL/ELT processes

ETL/ELT-processes are an important part of data management, which allows you to quickly and correctly load and transform information. That's why DWH must either support integration with ETL-tools, or have built-in mechanisms for their implementation.

Conclusion

Data warehouses play a key role in business intelligence. They help scale your business, conduct historical and multi-level data analysis, and monitor information security.

In this case, choosing a suitable DWH depends on the goals and capabilities of the business. Corporate storage should be considered as a separate product in a company's IT infrastructure. Accordingly, when determining the amount of investment, it is necessary to understand the benefits obtained from the product and the risks that may occur.

As for technology and component architecture, it is also necessary to formulate business needs. There is hardly any need to design a complex multi-level QCDif the business is limited in time and funds. In this case, you can get by with thematic data marts in cloud storage.

At the same time, this approach will not suit a large branch structure that will use the storage for many years without radically restructuring it. Such a company, already at the start of work, needs to consider hybrid storage schemes that involve the use of data lakes and relational storage. This approach also requires ensuring sufficient documentation of ETL processes and the formation of a unified dictionary of terms and definitions that will accompany the repository.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *