Detection of abnormal heartbeat sounds based on heart rate recordings

Author of the article: Victoria Lyalikova

Hi all! Today we'll look at the problem of detecting heart sound anomalies using audio recordings of heartbeat sounds. To do this, we will use the librosa library for working with audio files, as well as classical machine learning algorithms and deep learning methods.

Let’s take the “Heartbeat Sound” dataset, which contains audio fragments of heart rhythms of varying durations from 1 to 30 seconds, both from healthy patients and those with abnormal heartbeat sounds. The set contains 813 audio files with recordings divided into categories: artefact, extrastole, murmur, normal and unlabel. Let's try to figure out what these categories mean.

Normal – as the name suggests, a normal strong rhythmic heartbeat.

Murmur – recordings of the sound of the heart, where there is some kind of noise, for example, whistling, roaring, rumbling. The presence of such a noise can be a symptom of many heart diseases.

Etrastole are extrasystolic (extra) sound recordings that may appear from time to time and can be identified by the absence of a heart sound, including extra or missed heartbeats. Extrasystole may not be a sign of disease, but in some situations it can be caused by heart disease.

Artefact is not essentially a heartbeat, and is characterized by a wide range of different sounds. This category contains a wide range of different sounds, including squeals, echoes, speech, music. Usually there are no audible heart sounds, it is important to identify this category of recordings so that the study can be repeated.

Now, having briefly familiarized ourselves with the categories of diseases that our dataset has, we can move on to analysis in order to better understand what information we will work with.

Data analysis

Let's download the folder with our data and look at the heart rate graphs for the audio file without anomalies and with anomalies. An example code is provided to plot a heart rate graph for normal.

import os
import matplotlib.pyplot as plt
import librosa.display as ld
data_path="D:/Heartbeat_sound"
os.listdir(data_path)
['artifact','extrastole', 'murmur', 'normal', 'unlabel']
import matplotlib.pyplot as plt
import librosa.display as ld
sr=22050
signal="*/normal/normal__103_1305031931979_B.wav"
normal, sr = librosa.load(signal, sr = 22050)
librosa.display.waveshow(normal, sr=sr, label="Normal")

What can you see here? The first graph, that is, the normal heartbeat graph, shows the uniform distribution of amplitudes and consistency between beats and beats of the sound wave. If you compare a heart murmur graph to a normal graph, you will notice that it appears less consistent and contains many extra sound waves passing between heartbeats. Extrasystolic sounds have a higher amplitude compared to a normal heartbeat, and there is greater unevenness between different sound waves, indicating that a skipped heartbeat may be occurring. The artefact plot simply shows the noisy data that can be encountered if a recording is attempted incorrectly.

Now let's try to combine all these graphs with heart rate records into one picture.

It is seen that extrasystolic heart sounds have a larger amplitude compared to normal heart sounds, and all the different classifications of heartbeats have an irregular rhythm.

Next, let's look at the distribution of classes by counting the number of records in each folder and look at the result using a diagram.

We see that the data is very unbalanced; we will try to take this fact into account later when building models. Now let's look at the duration of each record and how it is distributed by the number of records. Let's look at the graph

The vast majority of recordings are 9 seconds long, but others range from 0 to 27 seconds. There are also records with 0 duration, we will remove them from our dataset.

Data preparation

We cannot use raw audio as input for our future models. To work with an audio signal, it must be digitized, that is, the sound wave must be converted into a series of numbers. Thanks to sound discrediting, a fairly large number of different characteristics can be extracted from audio. For example, such as Mel-Cepstral Coefficients (MFCCs), Spectrum, Spectral Centroid, Spectral Rolloff, Zero Crossing Rate and so on.

MFCC or Mel-Cepstral Coefficients are a widely used method for extracting features from an audio signal. The librosa library will help us with this, with the help of which we will process our audio signals.

We have 563 labeled audio files to train our models. Let's expand this data set by generating synthetic data. We will use methods of stretching and narrowing the sound signal in time. Thus, each audio signal will have 3 forms, which will make our classifier more stable and will help in the correct classification of heart rhythms. As the main characteristics of the audio signal, we will calculate the mel-cepstral coefficients. Let's take the following steps to process our heart sound recordings.

  1. We convert audio files into files with the same duration. For this we will use the method librosa.util.fix_length().

  2. Let's increase our data set by applying methods of stretching and narrowing the audio signal with coefficients of 1.2 and 0.8, respectively, using the method librosa.effects.time_stretch().

  3. Let's extract the mel-cepstral coefficients using the method librosa.feature.mfcc() for each audio file and calculate the average for each coefficient.

Let's write a function that will process audio files.

def load_file_data (folder, file_names, duration=9, sr=22050):
	input_length=sr*duration
	features = 52
	data = []
	for file_name in file_names:
    	try:
        	sound_file = folder+file_name
        	X, sr = librosa.load( sound_file, sr=sr, duration=duration)
        	dur = librosa.get_duration(y=X, sr=sr)
#        	меням длительность
        	if (round(dur) < duration):
            	print ("fixing audio lenght :", file_name)
            	X = librosa.util.fix_length(data=X, size=input_length)  
           	 
        	# извлекаем mfcc коэффициенты, используя 52 характеристики
        	mfccs = np.mean(librosa.feature.mfcc(y=X, sr=sr, n_mfcc=features).T,axis=0)
        	featuress = np.array(mfccs).reshape([-1,1])
        	data.append(featuress)
       	        	 
    	# сужаем уадиосигнал с коэффициентом 0.8
        	squeeze_data = librosa.effects.time_stretch(y=X, rate=0.8)
        	mfccs_squeeze = np.mean(librosa.feature.mfcc(y=squeeze_data, sr=sr, n_mfcc=features).T,axis=0)
        	feature_1 = np.array(mfccs_squeeze).reshape([-1,1])
        	data.append(feature_1)
       	 
    	# растягиваем сигнал с коэффициентом 1.2    
        	stretch_data = librosa.effects.time_stretch(y=X, rate=1.2)
        	mfccs_stretch = np.mean(librosa.feature.mfcc(y=stretch_data, sr=sr, n_mfcc=features).T,axis=0)
        	feature_2 = np.array(mfccs_stretch).reshape([-1,1])
        	data.append(feature_2)
    	except Exception as e:
        	print("Error encountered while parsing file: ", file_name)   	 
   	 
	return data

And we will process all the files contained in the folders: normal, murmur, artifact, axtrastole, creating an array with labels.

max_duration=9
artifact_files = fnmatch.filter(os.listdir(artifact_data), 'artifact*.wav')
artifact_sounds = load_file_data (folder=artifact_data, file_names = artifact_files, duration=max_duration)
artifact_labels = [0 for items in artifact_sounds]
murmur_files = fnmatch.filter(os.listdir(murmur_data), 'murmur*.wav')
murmur_sounds = load_file_data(folder=murmur_data,file_names=murmur_files, duration=max_duration)
murmur_labels = [1 for items in murmur_sounds]
extrastole_files = fnmatch.filter(os.listdir(extrastole_data), 'extrastole*.wav')
extrastole_sounds = load_file_data(folder=extrastole_data,file_names=extrastole_files, duration=max_duration)
extrastole_labels = [2 for items in extrastole_sounds]
normal_files = fnmatch.filter(os.listdir(normal_data), 'normal*.wav')
normal_sounds = load_file_data(folder=normal_data,file_names=normal_files, duration=max_duration)
normal_labels = [3 for items in normal_sounds]

Next, we define the class labels.

classes= ['artifact','murmur','extrastole','normal']
labels = {k:v for v,k in enumerate(classes)}
{'artifact': 0, 'murmur': 1, 'extrastole': 2, 'normal': 3}

Now you can collect all the data and divide the sample into training, test and validation.

x_data = np.concatenate((artifact_sounds, normal_sounds,murmur_sounds,extrastole_sounds))

y_data = np.concatenate((artifact_labels, normal_labels,murmur_labels,extrastole_labels))

from sklearn.model_selection import train_test_split
# shuffle - whether or not to shuffle the data before splitting. If shuffle=False then stratify must be None.

# split data into Train, Validation and Test
x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, train_size=0.8, random_state=42, shuffle=True)
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, train_size=0.9, random_state=42, shuffle=True)

Now we have processed data, divided into samples for training, testing and validation, we can start building models.

Let's start with classic machine learning models. The Random Forest (RF) algorithm is one of the most popular and reliable methods for classification problems among data scientists. The model has many parameters, the settings of which can significantly affect the final result, so we will go through some combinations of parameters and select one with the best estimate.

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV
forest_grid_search = GridSearchCV(RandomForestClassifier(), {
	'n_estimators': [300,400,500,700,800,1000],
	'max_features': ['sqrt','log2'],
	'max_depth': [3,5,7,9,11,13,15,17,19,21],
	'bootstrap': [True, False]},cv=3,verbose=1, n_jobs=-1 )
forest_grid_search.fit(X_train,y_train)

Let's look at the parameters of the best model

forest_grid_search.best_params_
{'bootstrap': False,
 'max_depth': 19,
 'max_features': 'sqrt',
 'n_estimators': 800}

Let's now evaluate the accuracy and efficiency of our model to understand how well it works in different classes by building a classification report.

Having calculated the accuracy of the algorithm, we get 0.89645. The accuracy value turned out to be quite good, but we see that the algorithm has problems with the artifact and extrastole classes. Low recall for the extrastole class indicates that it is difficult for the random forest algorithm to detect this class at all. The algorithm can detect the artifact class, but does not distinguish it well from other classes.

Now let's look at the XGBoost gradient boosting algorithm and also try to configure its parameters.

from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score
from xgboost import XGBClassifier
from scipy import stats
xgb_grid_search = GridSearchCV(XGBClassifier(), {
	'n_estimators':[200,300,400,500,700,800,900,1000],
	'max_depth': [3,4,5,6,7,9,11,13,15,17,19,21]},verbose=3, n_jobs=-1 )
xgb_grid_search.fit(X_train,y_train)

Having calculated the accuracy, we get 0.85207, which is even slightly lower than the accuracy of the random forest algorithm. And the algorithm also has a hard time dealing with the artifact and extrastole classes. But it is clear that both algorithms confuse diseases with each other, but generally cope well with detecting or not detecting diseases.

As a neural network, we will take a combination of a convolutional network and an LSTM network. It’s probably more interesting with neural networks, since we can input both characteristics extracted from an audio signal and use them for classification, and, for example, spectrograms that we can generate for each signal, thereby transforming the problem of audio classification into an image classification problem . But since we are comparing algorithms with each other, we will also submit mel-cepstral coefficients extracted from the audio signal to the input of the neural network.

Let's try to correct the imbalance in the data set using class weighting. To do this, we determine the weights for each class.

# class weight
train_count = 563
count_0 = 40  #artifact
count_1 = 127 #murmur
count_2 = 46 #extrastole
count_3 = 350 #normal
weight_for_0 = train_count / (4 * COUNT_0)
weight_for_1 = train_count / (4 * COUNT_1)
weight_for_2 = train_count / (4 * COUNT_2)
weight_for_3 = train_count / (4 * COUNT_3)
class_weight = {0: weight_for_0, 1: weight_for_1, 2: weight_for_2,3: weight_for_3}

{0: 3.5375,
 1: 1.0968992248062015,
 2: 3.0760869565217392,
 3: 0.4031339031339031}

The neural network model has 3 blocks with multiple convolutional layers (Conv1D), max pooling layers (MaxPooling1D) and BatchNormalization layers that normalize the data before each input to the layer, 2 fully connected layers (Dense) with Dropout layers to prevent model overfitting, and an output layer with 4 outputs according to the number of emotion classes. The output layer has a softmax activation function since it returns the probability distribution of the target classes in a multiclass classification problem. All convolutional layers use the relu activation function.

lstm_model = Sequential()

lstm_model.add(Conv1D(1024, kernel_size=5, strides=1, padding='same', activation='relu', input_shape=(52, 1)))
lstm_model.add(MaxPooling1D(pool_size=2, strides = 2, padding = 'same'))
lstm_model.add(BatchNormalization())

lstm_model.add(Conv1D(512, kernel_size=5, strides=1, padding='same', activation='relu'))
lstm_model.add(MaxPooling1D(pool_size=2, strides = 2, padding = 'same'))
lstm_model.add(BatchNormalization())

lstm_model.add(Conv1D(256, kernel_size=5, strides=1, padding='same', activation='relu'))
lstm_model.add(MaxPooling1D(pool_size=2, strides = 2, padding = 'same'))
lstm_model.add(BatchNormalization())

lstm_model.add(LSTM(128, return_sequences=True))
lstm_model.add(LSTM(128))

lstm_model.add(Dense(64, activation='relu'))
lstm_model.add(Dropout(0.3))

lstm_model.add(Dense(32, activation='relu'))
lstm_model.add(Dropout(0.3))

lstm_model.add(Dense(4, activation='softmax')) #4 класса для классификации
lstm_model.summary()

Let's define the Adam optimizer, set up the training step, set the loss function and compile the model.

optimiser = tf.keras.optimizers.Adam(learning_rate = 0.0001)
lstm_model1.compile(optimizer=optimiser,
              	loss="categorical_crossentropy",
              	metrics=['accuracy'])
cb = [EarlyStopping(patience=20,monitor="val_accuracy",mode="max",restore_best_weights=True)]

We start training our network, not forgetting to specify the weights for each class and the early stopping point.

history = lstm_model.fit(x_train, y_train,
                     	validation_data=(x_val, y_val),
                     	batch_size=8, epochs=250,
                     	class_weight=class_weight,callbacks = cb )

At the end of the training we will create a report.

from sklearn.metrics import classification_report
classes = ["artifact" ,"murmur ", 'exrastole', "normal"]

preds = lstm_model.predict(x_test)
classpreds = [ np.argmax
y_testclass = [np.argmax
print(classification_report(y_testclass, classpreds, target_names=classes))

Calculating the accuracy of our neural network model on the validation set, we get 0.9438. We see that our neural network coped with the task very well. And it does a much better job of distributing artifact and extrastole classes.

Let's summarize. Most likely, if we work a little more with the sample, maybe work with the spectrogram, maybe expand the set of features, increase the dataset, change the parameters of the neural network, then we will get an even better result. I believe that in the future it is worth working with the constructed neural network model and considering other methods for eliminating class imbalance to obtain more accurate indicators.

By the way, my colleagues from OTUS will soon have free webinar about a unique experiment that began in 2020 in Moscow, where city clinics are connected to a single database, and thousands of radiographs are analyzed daily by artificial intelligence. Register, it will be interesting!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *