Applying CRISP-DM Methods to Big Sales Data Analysis

Fig.1 CRISP-DM methodology

Fig.1 CRISP-DM methodology

A method of processing data received in the process of selling people (calls/meetings) in order to optimize the funnel, shorten the deal cycle and increase conversions.

Main process

Calls/meetings are conducted with a SalesAI connection. Integration with a CRM system allows you to collect clean and accurate data that can be analyzed by Big Data tools: Process Mining, Spaghetti Diagram, Reverse Engineering, etc.

If we have a high-quality and large array of clean data, we can perform a Reverse Engineering operation to optimize the sales process, for example, rebuild the sales funnel or determine the shortest process, or collect the most effective tactics / phrases that lead to a conversion with a high degree of probability . The inverse problem can also be solved: identify epic fails in the process.

We tell in detail how it looks step by step and what result we can get from this.

Let’s break the process into stages, in accordance with the CRISP-DM methodology


Stage goals



business understanding

The goal of the project is to determine the growth points of sales conversion. Depending on products/regions and other variables, due to in-depth analysis of available internal data. The secondary goal is to determine the external data that needs to be taken into account for a more accurate result.


• Define growth areas

• Define loss zones

• Describe a new effective process

• Define process performance metrics

The sample size may not be sufficient for unambiguous conclusions. Storage of data, the possibility of obtaining them. Lack of interesting insights for the customer.

Data Understanding

Ideally, since all dialogs are passed through SalesAI and there is integration with CRM, then all the necessary data is stored in CRM: a list of events for each transaction, Some records of conversations / meetings, Records of the status of the transaction and related objects, Perhaps meeting minutes, E-mails , Contracts / NDA, TK, KP (all versions), presale documentation, etc. Understand the strengths and weaknesses of the existing set of documents.

Understanding the total volume and categories of data available for analysis. Data enrichment with third-party sources. Determination of initial hypotheses for analysis:


Descriptive statistics of data (target metrics) are calculated, their graphs are built

Adjustment of business goals based on the state of the data | Data consistency problems are that there may be very little complete data set united by one transaction ID: no historical data, data in CRM with errors (90%), not complete, untimely changed, missing bricks, documents not saved, or stored in different places, unreadable, communication was not carried out in authorized channels that are not subject to accounting, etc. The main problem after the fact will be to collect the full chain of events for each transaction (Data Evidence). Otherwise, the accuracy of the analysis will be too low. That is why it is difficult to do a retrospective analysis, because you need to poke around in the archives for a long time to find everything. For the most efficient path, the Data Management Policy and SalesAI are implemented first, then the data is collected, then analysis is carried out on the collected data. | | Data preparation | Suppose we do not have complex transactions that are carried out by phone and there are no long correspondences and exchanges of documents. The CP is redone a maximum of once. Suppose you need to figure out what and how to say on the phone to the manager so that the conversion grows by itself. Then we need to take the entire volume of available call records for the last 3-24 months and do the preparatory work. | Preparing datasets:

  1. Arranging calls (events) in chronological chains in relation to each transaction ID, because within the framework of one transaction we can have several stages of negotiations.

  2. Transcription/recognition of voice to text.

  3. Labeling the data Stage 1. We need to determine the purpose of each call and assign a label to each conversation. If the call was helpful. For example, we may have calls to qualify a lead, may be to close a deal, and so on. Also, the initiator of the call can be a client or a manager, we also need to reflect this in the properties.

  4. Markup data stage 2, a deeper level: decompose each call into elements of the context of what happened inside. Here, you may have to use a hypothesis, in which to define a set of entities and decompose everything within a given set of entities, or a set of hypotheses and each hypothesis will have its own set of entities. In the future, it will be possible to determine which hypothesis turned out to be the most successful.

  5. Carry out Clustering with the model and compare the results with the hypotheses used. | • Recognition quality • a set of hypotheses can be false and give fake positives • selection of the wrong entities, etc. • few unbroken chains | | Modeling | Data visualization and creation of a scoring model | 1. Determine the representativeness of the sample: Spaghetti diagram, which will show: • the number of complete chains • the number of successful chains • the number of broken event chains

  6. Detail of successful chains: • chains of events on the timeline that increased the lead’s scoring. With indication of scoring • KFU in each mailstone • Detailing to key phrases • Successful negotiation frameworks

  7. Similarly for unsuccessful circuits

  8. General characteristics of clients with whom deals are closed successfully: qualification criteria.

  9. How many deals could be closed on that array using the new process.

  10. What does the new process look like on the time/speed graph | Difficulty in making the zoomin/zoomout process look nice | | Evaluation | Assessing the quality of the model Conducting a test simulation (retrospective analysis) or a pilot in the form of A / B testing, for example, how much we could sell in the past (on the same dataset) if we applied a new process. | Determine the performance metrics of the new process (perhaps better: evaluate the performance of the new process with predefined metrics)

The design of the experiment in the form of A / B testing was developed:

  • the volume of the control and target samples is determined

  • duration of the experiment

  • stop criteria

  • the resources and technologies necessary for the experiment were determined.

Pilot launched

We make another iteration of changes, if necessary. We repeat the experiment, if necessary, if the result is achieved (the metrics have changed), then we proceed to the implementation | Environmental impact Human factor Seasonality | | deployment | Implementation plan:

  • Implement data collection automation with SalesAI.

  • Presentation of analysis results to the team.

  • Hyperbolization of the difference: how much you spent and how much you could earn if you worked differently.

  • An explanation of what the new process will now look like.

  • What routine they will no longer have

  • Real-time sales tips. | A clean data collection pipeline and a feedback loop that allows you to change the elements of the process and actually tailor the process to each customer segment or to each target customer (ABM in action). | Human factor/self-preservation habits UI/UX habits Unwillingness to change environment Environmental changes Seasonality Learning curve |

How to become a Data Driven Sales Director, our Telegram channel: **VP of sales.


  1. One-time process improvements as deep as you have clustering. This can be done manually if there is clean data.

  2. A continuous process of process improvement, coupled with a constant receipt of clean data – when using SalesAI.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *