An elegant data stack for embedded analytics

Context

At work, I had to deal with analytical stacks of all configurations and sizes. We have learned from our own experience that the price of an embedded data analytics stack located behind the frontend can instantly increase so much that there is no return on investment to speak of. This risk exists unless you carefully consider 1) pricing models for different technologies and unit costs, 2) realized value, 3) developer productivity.

There's a whole wave of tools out there that are built specifically for embedded analytics solutions, so I thought I'd put together this post to show how some of these tools fit together and why they're so good.

This article will explore the cost/value ratios and benefits of several data-centric stacks, namely MotherDuck / Cube / React (MDCuRe)

Pricing models for different technologies and unit costs

Let's start with the data…

Let's approach the matter from the practical side. Most of the requests coming from third party users are not that big. Let's consider a classic data monetization strategy:

benchmarking

. The slice of granular data your user has access to is relatively small compared to the entire pie. And the benchmarking metrics that you provide are pre-aggregated, so they are incomparably smaller than detailed datasets.

In this case, there is no need to equip your embedded analytics solutions with large-scale computing capabilities if the vast majority of queries that need to be calculated are small. In fact, if you put together a much more lightweight solution, you can dramatically reduce the cost per unit of production in the scenario of providing data to external users. this is the case, there's no need to build massive compute capabilities.

By deeply understanding the pricing models used in your embedded analytics stack, it will be easy to predict what ROI you can expect from your product. Ideally, the technology you use should either scale on a logarithmic scale or have a low enough unit cost that you can easily invest in the initial procurement and early development phase.

Let's go a little deeper…

If we take a closer look at the pricing models of common business intelligence tools and semantic layers, it turns out that too often we are talking about providing a license “in terms of users” (per-user model). Typically, there is a base rate for access to the platform, which can be allowed to a small group of users, and any additional users will be charged separately (monthly or annually). In the case of embedded analytics, this is one of the most disadvantageous pricing schemes, since as your customer base grows, the price of working with it also scales linearly. There is practically nothing to save here, except that the fee for using the platform is obviously reduced in terms of the cost of the unit of production.

For comparison, consider pricing model used in Cube. Here, it is the active use of the system that is charged, and nuances are deliberately built into the model, thanks to which products for embedded analytics not only turn out to be economically viable, but also much more attractive in case of scaling.

Value for customers (and business)

In order for you to charge access to your data, the data must be valuable. With the technology stack described here, you can significantly increase the value of your existing data without putting in too much effort. Here are some ways to achieve this:

Speed

The MotherDuck stack allows you to shard computing power per user (yes, per user), resulting in an environment that scales extremely well to clients running analytics queries. This is a unique feature of MotherDuck, which in some ways can be considered a breakthrough in embedded analytics, since such products often struggle to cope with high competition under load. Even with highly parallel processing (MPP) data warehouses that are mainstream today, it is possible that hundreds or even thousands of users access the same storage, causing competition for computing resources.

But the most important aspect of MotherDuck is that it can run within a web application as a local cache written in WASM. With such a cache, you can do without a continuous connection to the cloud data storage, if the user needs to access his data segment every now and then. In this situation, analytics data can be returned at the same speed as transaction data on your site.

Data curation and reliability

Cube gives us the ability to embed business logic into the data we provide to our clients, resulting in more valuable data. Consider a simple excerpt of raw data provided to a client (or web developer). On the client's machine, they are ingested, converted, and then business metrics are calculated based on them. If you can do this work for the user, then he will save time and effort, therefore, the product itself will be more valuable.

If you use the semantic level, then you can get additional capitalization on it. Given that Cube has built-in user profiles and authentication, it is possible to provide clients with different levels of data access and data exploration capabilities, depending on which data plan the user is on. Because Cube is entirely API driven, we can also easily do an insert-update on the user's profile data through the React client app, and then give them new permissions when the user upgrades their subscription.

By working with these two tools, you can serve the bulk of your embedded analytics results running on the backend. This results in reasonable costs per unit of production, as well as extremely low fees for using the platform.

Customer ecosystem and performance assurance

React continues to dominate among front-end application development frameworks. React stands out for its rich ecosystem, ease of use, and high performance. Although there are many other high-quality client-side frameworks, React is often preferred for many front-end projects. There are also secondary click-and-expand web frameworks based on React, such as Vercel or Netlify, whose strength is speed. In addition, the compiler that is planned to be released in React 19 is very interesting. It lazily rebuilds components not with every new display, but only when the state changes. In this case, it is possible to significantly reduce the amount of boilerplate code servicing the visualization components used in React, which are unnecessarily complex from a functional point of view and previously greatly slowed down the operation of applications.

Developer Productivity

Amazon popularized the concept

“a team that can be fed with two pizzas”

— we are talking about a minimum group capable of fully supporting the product or implementing the end-to-end functionality of a business feature. How many people can eat 2 pizzas? It is estimated that there are 5-8 people, maybe a little more if the pizza is in a deep dish. However, if you try to ensure results with the help of a self-sufficient team of this size, then high-quality communication is maintained, decisions are made quickly, and so the job gets done. On the other hand, in such a team delegation of tasks is minimized and almost no dependencies arise, so the team can pay more attention to the needs of the client than to self-organization within itself. Ideally, just one web developer from the team should be able to deliver the feature himself, with minimal recourse to people outside the team for help.

The selection of tools in a product development team can make the difference between success and failure. Just as the entire team structure is geared toward effective communication, team integration, and seamless delivery, the toolset should align with those same priorities. We want the stack to reduce and not increase friction in the team, whose task is to continuously deliver working software.

The MotherDuck / Cube / React stack or the MDCuRe stack will be of interest to teams developing products for embedded analytics using the continuous delivery. Using each of the listed tools, the team can manage changes through source control, as well as build a smooth path from development on the local machine to integration environments and further into production. In fact, with the help of this stack, a specialist can create a completely clean sandbox for development on his workstation.

In this case, it’s quite possible to imagine how a developer assembles and deploys a feature from start to finish, including:

  • Prescribes migration across data warehouses (using a tool such as Atlas) and applies these changes to the local DuckDB instance
  • Creates and loads auxiliary elements into DuckDB (using tools such as Mimesis or Faker)
  • Defines a data model using IDE Cube and validates it in the sandbox
  • Sets data access rights in code
  • Implements React client code that consumes data from Cube via a REST or GraphQL interface
  • Using the source control method, it registers data migrations from storage to storage, the Cube model, as well as changes made in the client part
  • Deploys all code changes to higher-level environments, for example, to staging or production. To orchestrate these changes, a CI/CD tool is used, for example, Github Actions, which supports working with instances Cube Cloud And Mother Duckdeployed on the host.

This approach to organizing a team and selecting tools contributes to rapid iterations, which means more frequent releases. This creates more opportunities to receive feedback from the client and adjust the direction of product development.

Eventually

Let me remind you what advantages you get when working with the MDCuRe stack:

Database: MotherDuck

  • Because calculations can be segmented down to the individual user, competition between clients of your embedded analytics solution disappears. Accordingly, the consistency of requests and their speed for each user increases.
  • Thanks to lightning-fast data processing directly in duckDB's RAM, MotherDuck software is perfect for both storing data and computing on it if you need to organize built-in information analytics.
  • Along with DuckDB's technical strengths, Motherduck also has a fair and clear pricing system. In most practical cases where embedded analytics may be needed, clients perform relatively small queries that address narrow segments of the existing data set. If you manage to do a good job with the data in your enterprise architecture stack, then it can be optimized so much that the unit cost of storing data and computing on it will be pennies.

Semantic level: Cube.dev

  • Because Cube uses a semantic layer that supports javascript, yaml, or python, programmers will quickly master all aspects of semantic data modeling and the full product development cycle.
  • If you skillfully use the semantic layer, controlled via the API, you can almost immediately roll out new valuable features to users, all they need to do is renew their subscription.
  • Cube provides granular security controls so business leaders can rest assured that any access to content is secure.

Client side: React

  • Supports custom analytical task flows and allows for fine-tuning of branding-related aspects
  • Provides maximum flexibility in building information architecture and interaction patterns. Therefore, it is possible to develop user interfaces and scenarios that fully meet the needs of each client
  • Supports performance optimizations, including server-side rendering and webGL-based rendering libraries
  • Widespread industry support and a giant ecosystem of free libraries
  • Easily supports integration with the activation layer, so the user can ask a question, get a hint and take an action – all within one screen segment.

Needless to say, the components outlined in this post are the fundamental components of a complete data stack, and the finished implementation may also include many other tools and processes for orchestration, data ingestion and transformation, data quality and observability, and error handling. This entire stack will serve as a great foundation for a platform that has built-in data analytics.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *