Burnout Prediction Using Interpretable Machine Learning Method

Burnout occurs due to a discrepancy between a person's goals and reality, which leads to resource depletion and decreased productivity.

Having data on the deviation of the expectations of company employees from reality, we will set the task of predicting the presence of burnout in an employee. An interpretable machine learning method will be proposed, similar to a two-layer perceptron, in which all weights have a clear meaning.


Initial data

The initial data for the analysis are the processed results of a survey of employees of different organizations. The sample consists of 219 elements, the number of input features is 29 – these are assessments of the importance of events by employees (numbers from 0 to 1), taken with a plus sign in the case of the implementation of this event in the organization and with a minus sign in the absence of the event. Thus, the higher the trait, the more important it is expected to be in reducing overall burnout.

It was observed that when the score is above 75, burnout has little effect on labor productivity. Therefore, let’s convert the output data to binary: let’s say that with an indicator greater than 75 there is no burnout (y = 1), and when the indicator is less than 75, there is burnout (y = 0).

Basic method

Let's analyze our data using logistic regression. Cross-validation shows an average value of the ROC-AUC metric of 0.83, the same metric in the final testing was 0.69.

Suggested Method

At the first stage, we find the main components of the source data. At the second stage, we dichotomize each main component according to the degree of influence on the end point (if the component is above the threshold, then we set the corresponding attribute equal to 1, otherwise 0). At the third stage, the binarized data is fed to the input of logistic regression, which predicts the probability of assigning a point to class “1”.

Main components

Nine principal components were identified with the following proportions of explained variance: 30%, 13%, 7%, 5%, 4%, 4%, 3%, 3%, 3%.

Dichotomization

For each principal component, we find two thresholds: one sets the feature value equal to 1 when the principal component value is above the threshold, the other sets the feature value equal to 1 when the principal component value is below the threshold.

We find all thresholds from the condition of minimum entropy. To do this, we go through the possible values ​​of the threshold, calculate the uncertainty matrix and the Kullback-Leibler distance.

Final forecasting

We train a logistic regression model on binary inputs. Cross-validation shows an average value of the ROC-AUC metric of 0.81, the same metric in the final testing was 0.7.

conclusions

Despite the reduction in the number of features (18 instead of 29) and their dichotomization, there was no noticeable deterioration in the quality of the model compared to the base one.

The figure shows the values ​​of the continuous burnout indicator depending on the integral indicator (weighting coefficients were found from the multiple linear regression model).

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *