how to use data for decision making in FinTech
The data-driven concept is an approach in which data and analytics serve as the basis for decision making at all stages of product development. The main idea behind this approach is that stakeholders can analyze data to better understand how users interact with the product. This may include identifying popular features, where customers are losing interest in purchasing, and identifying required interface elements. The information obtained allows you to update the service by adding, changing or eliminating certain functions.
The data-driven approach helps identify new opportunities for product development and helps make informed decisions on the strategic direction of the business: determine in which region to expand its presence, which product to invest in for promotion, and which of them should be closed.
The data-driven approach is based on four key principles:
Iterative data collection and analysis.
It is necessary to constantly examine how the product functions as it “lives” and adapts to business requirements and customer needs, as well as the market situation. This requires continuous monitoring.
Research of user behavior and preferences.
When creating a product, you need to take into account the needs and desired functions of users in order to develop a service that will satisfy them. For example, the decision to choose a platform for a product may depend on an analysis of customer behavior: whether to invest in a mobile application or focus on developing a website.
Experiments and tests.
To determine which changes will have a positive impact on performance, it is important to conduct experiments and test hypotheses. For example, you can test different ad banner designs before implementing a new feature in your application.
Collaboration.
The data-driven approach involves active collaboration between different teams: developers, designers, marketers and analysts. Every team must use data to inform its product change decisions.
The data-driven product development cycle consists of six stages:
Ideas. At this initial stage, assumptions are made about possible directions for product development.
Creation. When analytics confirm the viability of the hypothesis, changes are made to the product.
Product. An updated product is launched, data is collected to evaluate the effectiveness of the changes.
Measurement. Analysts monitor changes occurring with the product.
Data. The collected data is interpreted and used to analyze the impact of the changes.
Education. Based on the results obtained, new hypotheses are formed for further development of the product.
To assess the success of a product within the data-driven approach, metrics such as:
Conversion — the percentage of users who completed a target action, for example, purchasing a product or registering.
Hold — the proportion of users who continue to use the product after the first interaction. This indicator is critical for online services, as users can quickly switch between applications.
Outflow — the number of users who stopped using the product over a certain period. Retention and churn are interconnected, and to improve retention, you need to analyze the reasons for churn.
Engagement — user activity when interacting with the product. Includes the number of sessions, time spent in the service, and actions performed.
Net Promoter Score (NPS) — the likelihood that users will recommend the product, which reflects the level of customer satisfaction.
Metrics can be used both together and separately, depending on the company's goals.
Disadvantages of the data-driven approach
Although the data-driven approach provides useful tools for making business decisions, it is important to consider its disadvantages:
Resource costs. Collecting and analyzing data can be quite costly, both financially and time-consuming. Companies may experience resource constraints at all stages of the cycle, especially if employees are just beginning to use this approach.
Risk of incorrect conclusions. Incorrectly collected or analyzed data can lead to erroneous decisions.
The importance of experience. A data-driven approach cannot completely replace experience and intuition, which can be critical when faced with new or unexpected situations.
Difficulty in reconciling data between different departments. For example, if the marketing department and the risk management department have different priorities, then the analysis may produce conflicting results.
Possibility of using data-driven in fintech projects
Increased efficiency: Using data allows you to optimize processes. For example, banks can reduce the time it takes to process loan applications by automating the analysis of borrower data.
Risk management: Financial institutions can predict and mitigate risks using historical data. Machine learning models help in identifying potential defaults and fraudulent activities.
Risk prediction models help companies identify potential threats and take action to minimize them. There are various statistical and machine learning models that can be used to predict risks. The main models include:
Personalization of services: Customer data analytics helps create personalized offers, which increases customer satisfaction and strengthens customer relationships.
Improved decision making: Data serves as a reliable source of information, which reduces the likelihood of errors in strategic decision making.
How to implement a data-driven approach in FinTech
1. Define business goals and metrics
First and foremost, companies need to clearly define their business goals and the metrics to achieve them. This could be an increase in the customer base, a decrease in loan default rates, or an improvement in the application approval process. Setting clear goals will allow you to choose the right data for analysis.
2. Data collection and storage
Quality data is the basis of any data-driven approach. FinTech companies must organize a process for collecting data from various sources, such as transactional data, customer behavior on the website, social networks and others. Data storage must allow for rapid access and scalability in the future.
In FinTech projects, data collection is a multi-step process that begins with the integration of various sources of information. This could be transaction data, credit history, user behavior, and even external factors such as economic indicators or news. The main stages of data collection include:
Identifying Data Sources: Selection of relevant sources, for example, banking systems, payment platforms, social networks, etc.
Data collection: Based on the requirements of the legislation on the protection of personal data, data can be collected both in real time and in batches.
Data cleaning: Removing duplicates, correcting errors and filling in gaps are important steps without which the analysis will be incorrect.
Data analysis and transformation: The ability to apply various methods to normalize and standardize data, as well as create new features that can improve prediction models.
3. Data analysis
At this stage, statistical methods and machine learning algorithms are applied to analyze the collected data. The use of modern analytics tools allows for in-depth analysis and visualization of results.
One of the key applications of ML in FinTech is default forecasting. Typically, this task uses classification models that are trained on historical data about borrowers, including factors such as income, credit history, and behavioral characteristics.
Model tuning can be done using algorithms such as gradient boosting (e.g. XGBoost, LightGBM). These algorithms are well suited for working with large volumes of data and are highly accurate due to their ability to identify complex dependencies.
Let's take a closer look at the applicability of gradient boosting for risk classification. For example, XYZ Company is developing a credit scoring system. As they work, they collect data about borrowers and train a model based on historical default data.
Data collection: At this stage, data about borrowers is collected: Age, Income, Previous Defaults, Credit Score and other important factors.
Formation of a training sample: A dataset is created containing information about borrowers who have either repaid their loans or failed to do so.
Model training: Gradient boosting is trained on this set. The model optimizes a subjective loss function that reflects how well it classifies borrowers by their probability of default.
Testing and Validation: The model goes through cross-validation and back-testing steps to avoid overfitting and ensure its applicability in the real world.
Implementation and monitoring: After successful testing, the model is introduced into the decision-making process for issuing loans. It is also important to monitor it and periodically retrain it to maintain its relevance.
Data quality management is also critical. Data quality plays a key role in business decision making, analytics, and building machine learning models. Poor data quality can lead to erroneous conclusions, ineffective strategies, and ultimately financial loss. Data quality management includes the following key aspects:
Accuracy: Data must accurately reflect actual events or entities.
Completeness: Availability of all necessary data to perform the analysis.
Reliability: Data must be collected from reliable sources.
Relevance: Information must be fresh and updated.
One of the main steps in data quality management is identifying problems such as missing values and outliers. Let's consider several approaches and methods for this task.
Gaps in data can occur for various reasons: human errors, system failures, insufficient integration of sources, etc. The following methods can be used to identify gaps:
Descriptive Statistics: Simply applying methods such as summation, mean, or standard deviation can help you understand which variables are missing.
Graphical methods: Visualizations such as histograms and scatter plots allow you to quickly identify variables with empty values.
Pass tables: Creating special tables that show the number of gaps for each column can greatly simplify the analysis process.
Apart from this, there are various approaches for anomaly detection:
Statistical methods: Using methods such as z-score and interquartile range (IQR) can identify outliers that deviate from the statistical norm.
Machine learning: Modern algorithms such as Isolation Forest and Local Outlier Factor (LOF) can process large data sets and detect anomalies effectively.
Data visualization: Graphical tools such as box plots and scatter plots make it easier to detect outliers in data.
Once missing values and anomalous values are identified, an important step is to process them. Depending on the situation and data characteristics, the following approaches can be applied:
Processing passes
Replacing values: Gaps can be filled with means, medians, or modal values to preserve data volume.
Deleting entries: If there are too many gaps, you can remove records with insufficient data, especially if they are not critical for analysis.
Imputation: Using more sophisticated techniques such as regression imputation can help predict missing values.
Handling anomalous values
Correction: For some cases, it is possible to replace outlier values with more appropriate ones, for example, using median values.
Removal: In cases where anomalies are the result of errors, they can be removed from the data to avoid distortion.
Identification and Analysis: Sometimes anomalous values can be significant data that is worth highlighting and analyzing separately.
4. Applying findings to decision making
Data analysis should directly influence decision making in the company. For example, predictive analytics can be used to tailor credit offers that best match customer needs, while aggregated customer data can help update marketing strategies.
5. Continuous improvement
Monitoring your data and models is a critical aspect of your work and an indispensable tool for ensuring their reliability and efficiency. With specialized tools like Prometheus and Grafana, organizations can monitor model performance, data quality, and prevent overfitting issues.
But what about in practice?
The first thing you'll encounter on your journey to achieving excellence in data-driven decision making is a lack of metrics. Therefore, it is important to start by creating an infrastructure for collecting and storing data. In most projects, it is customary to use replication of the product database for the backend. Classic tools such as Google Analytics and Yandex.Metrica are suitable for collecting front-end data (page views, interaction with interface elements, scrolling and clicks). The basic functionality of these tools is sufficient for marketing tasks, and you can use the Google Reporting API to analyze product funnels and A/B tests.
Once you start collecting statistics, it is important that the development of the product goes in unison with its metrics. When implementing a new feature in a product, you need to answer the following questions:
What key business metrics will this impact?
What changes will be made to the customer journey or backend algorithms? How will this affect existing metrics?
How to break down a new feature into stages/components so that you can collect metrics on each of them and later analyze their performance?
Next, you should make sure that the data collection and storage subsystem is as important to your development team and IT department as the production system. For example, we had a problem with Google Analytics tracking disappearing on different pages until we discussed with the developers the importance of these aspects.
However, the availability of data does not guarantee its effective use. The following problems often arise:
Where can I get a certain metric?
Is it going correctly?
How to structure a report to draw conclusions from it?
Is this indicator statistically significant?
Is it possible to collect additional data to gain a deeper understanding of the situation or to validate the collected metrics in other ways?
It turned out that this is a rather labor-intensive task, requiring specialized skills and considerable time, which gives rise to the need to create an analytics department.
As the volume of data increases, distribution issues can also arise: data may be stored in different places, and some analysts can only work with certain stores, while others can work with others. Some databases may be completely unfamiliar to the team, making data comparison difficult. The solution could be the implementation of a data warehouse (DWH).
One of the main tasks that Data Warehouse solves is the integration of data from many disparate sources. This may include system databases, CRM, ERP, spreadsheets, as well as external data such as market information or social media data.
With DWH the following happens:
Data collection: The necessary ETL processes (Extract, Transform, Load) are implemented, which allow you to extract data from various sources, convert them into the required format and load them into a central storage.
Eliminate duplication: During the integration process, data is cleared of duplicates and inconsistencies, which ensures uniformity and integrity of information.
Creating a Unified Data Model: DWH provides a structure and a common format for presenting data, making it easier to use.
The data integration process enabled by DWH makes it much easier for analysts to access the information they need. Previously, analysts had to manually collect data from different sources, which was time-consuming and often error-prone. With the introduction of DWH, access to data has become much easier and faster thanks to the following mechanisms:
Centralized access: All data is stored in one system, making it easy to search and retrieve. Analysts can quickly find the information they need without wasting time sorting through various systems and databases.
Simplified queries: DWH provides the ability to work with data through simple SQL queries or more advanced analytical tools, reducing time for reporting and analysis.
Interactive control panels: Modern DWHs are often integrated with BI tools, allowing the creation of interactive visualizations and dashboards. This makes information more accessible and understandable to users without deep technical knowledge.
The implementation of Data Warehouse has a positive effect on data quality. Thanks to centralized storage and purification processes, information becomes more consistent and reliable. Here are a few aspects that help improve data quality:
Data standardization: The DWH establishes common rules for the presentation and storage of data, which helps eliminate differences between the formats of different sources.
Data quality control: When loading data into DWH, you can set validation rules, which allows you to identify and correct errors at the integration stage.
Audit and traceability: DWH provides capabilities to track data changes, allowing for greater transparency in analysis results and allowing data sources to be easily verified.
All of these benefits ultimately result in a significant reduction in the time spent analyzing data. Analysts can focus on interpreting information rather than searching and preparing it. Fast, up-to-date data enables organizations to make more informed decisions in response to changes in the business environment and customer preferences.
However, as the company grows, it may become obvious that not all employees understand the importance of data and know how to work with it. There are two key issues here: internal promotion and hiring the right talent.
In terms of internal promotion, if the founders of a company promote a data culture, this influences top management, and then middle management, and so on. It is also important when hiring employees to check whether they can focus on numbers in their work.
In addition, you should pay attention to the financial side of the issue: when it comes to lending, it is important not only to issue money, but also to ensure its return. The amount of refunds affects the amount of funds available. In this context, the role of predictive models becomes critical as they help predict the future of P&L. For example, you can use models to predict profit based on data on overdue debt, average ticket taking into account customer segmentation, or the number of loans issued based on data on collections, etc.
Conclusion
The introduction of a data-driven approach in FinTech companies opens up wide opportunities for increasing efficiency and competitiveness. The ability to quickly analyze and respond to changes in data is becoming the most important factor for success in the modern financial market. By using data as a strategic resource, companies can not only improve their internal processes, but also create added value for customers, thereby strengthening their position in the industry.