Geospatial Image Segmentation Using Topographic Data and Transfer Learning
Data Science is not just about customer data. To the start of our flagship course Let’s consider an example of geospatial semantic segmentation, where using data from a digital elevation model, we will display cinder cones in Hawaii.
There are a variety of examples of semantic segmentation of images with pixel-by-pixel identification of objects – from unmanned vehicles before detection ships on satellite images. But what about use cases that don’t use these regular image datasets?
Let’s first understand what “slag cones” are, and how to use QGIS quickly create a dataset with labeled data. Then we will process the data and select / train the model, and then test the trained model on new data.
What are “slag cones”?
“Cinder cones are the simplest type of volcano. They are formed by particles and clots of solidified lava ejected from a single vent. <...> Most cinder cones are crowned with a bowl-shaped crater“. USGS.
Instead of a thousand words:
So, we need cone-shaped objects with steep slopes and a round crater in the middle. Let’s see how they look in the dataset.
Loading and labeling data
For the data set, we use DEM data provided by USGS¹ — topographic data from the largest island of Hawaii, where the morphology of cinder cones is visible. Let’s map the cinder cones of a dormant volcano mauna keato build a display model of the neighboring volcano’s cones Hualalai.
For marking cinder cones, we use QGIS – an excellent application for working with geospatial data. Rendering the cinder cones by loading the .geotiff data and outputting contour lines at 10m intervals (hint: look for small, densely spaced concentric circles):
These are cinder cones based on already defined spatial criteria. Let’s plot them on the map to create labeled masks. For simplicity, we will assume that the slag cones are round. This means that the labels will be circular polygons. Let’s create a shapefile of these shapes and start loading its labels and geotiff DEM data:
#Loading a geotiff shapefile def readimage(): print("reading image…") with rasterio.open(path_image, "r") as ds: arr = ds.read() arr = np.transpose(arr, (1, 2, 0)) #Clip negative values to 0 arr = arr.clip(0) print(arr.shape) return arr #Loading the shapefile mask def readmask(link): print("reading mask…") geo = gpd.read_file(link) with rasterio.open(path_image) as src: raster = src.read() geo = geo.to_crs(src.crs) out_image, out_transform = rasterio.mask.mask(src, geo.geometry, filled=True) masks = out_image[0,:,:] #Set the mask labels to 1 and the rest to 0 masks[np.where(masks<=0)] = 0 masks[np.where(masks>0)] = 1 masks = masks.astype(np.int8) masks = np.expand_dims(masks, axis=2) return masks
In the DEM array, we have the entire island. Let’s remove everything around the marked Mauna Kea mask, and remake this array from 2D into 3D with three channels. What for? For compatibility with a segmentation model that uses a pre-trained encoder. This encoder usually requires a three-channel input image, such as color:
#Crop both the image and masks to the same Mauna Kea area image = image[ymin:ymax, xmin:xmax,:] masks = masks[ymin:ymax, xmin:xmax,:] #Stack the image array and normalise image = np.dstack((image, image, image)) image = (image — image.min())/(image.max() — image.min()) original_size = image.shape
The result is the shape (4000, 6500, 3). Rendering the image and mask:
After duplicating the same image three times, apply filters. To make the cinder cones stand out against the background of the environment, we will increase the contrast. And add a Sobel filter to make their circular shapes clearer:
#Contrast enhancing image_eq = exposure.equalize_adapthist(image, clip_limit=0.05) #Sobel filter and normalising image_sobel = filters.sobel(np.squeeze(image)) image_sobel = (image_sobel — image_sobel.min())/(image_sobel.max() — image_sobel.min()) #concatenate standard image, equalised and sobel together images = np.dstack((image[:,:,0], image_sobel[:,:,0], image_sobel[:,:,0]))
From this large array, we used three channels. We will divide the array into small arrays, which we will send to the model. Here is the split result:
Number of samples, height, width, number of channels:
#Making image tiles size = 224 step = int(size/2) patch_arr = skimage.util.view_as_windows(image, (size, size, layer.shape), step = step) output = patch_arr.reshape((-1,) + (size, size, layer.shape))
The process is repeated for the input image and masks so that they complement each other. The result is the following input data:
What interesting images! The resulting array will look like this: (1938, 224, 224, 3). There are not so many examples for creating a model, and therefore a pre-trained model is needed. The last stage is the division of data into training and control samples:
x_train, x_val, y_train, y_val = train_test_split(images, masks, test_size=0.2, shuffle=True, random_state=123) print(x_train.shape) print(x_val.shape) print(y_train.shape) print(y_val.shape) y_train = tf.cast(y_train, tf.float32) y_val = tf.cast(y_val, tf.float32)
Building and training the model
Let us use, apparently, the best version of the segmentation architecture – the UNET model – with a pretrained encoder from Keras. Our encoder is InceptionResNetV2, it has good performance on the imagenet dataset, although any pre-trained model will do:
input_shape = (size, size, 3) inception = InceptionResNetV2(include_top = False, weights = "imagenet", input_tensor = input_shape) inception.summary() layer_names = [layer.name for layer in model.layers]
You need to add fast access connections between encoder and decoder. But first, let’s create a UNET decoder by loading a pre-trained encoder with input dimensions and getting a list of layer names for it.
A suitable decoder will be created after setting include_top to False. With the help of the decoder, the output of the encoder is returned to the original dimensions of the input image, and we get the corresponding segmentation mask:
x = Conv2DTranspose(num_filters, (2,2), strides=2, padding="same")(inputs) x = Conv2D(size, 2, padding="same", dilation_rate = 1, kernel_initializer = "he_normal")(x) x = BatchNormalization()(x) x = Dropout(0.2)(x) x = Activation("LeakyReLU")(x)
Adding these blocks one on top of the other from the largest to the smallest filter, we get a decoder. Shortcut connections are added to activation layers where the input layer size is 224, 112, 56, and 28. Depending on the model used, the layer size may vary. Then you need to fill in:
skip_connection_2 = inception.get_layer(index = 3).output skip_connection_2 = ZeroPadding2D(( (1, 0), (1, 0)) (skip_connection_2)
Look: the fourth layer (index 3) must be filled to the desired shape. Thanks to fast access connections, the encoder and decoder are connected by concatenating the output of the specified activation levels and the transposed decoder blocks:
x = Concatenate()([x, skip_connection_2])
Having created the decoder, we complete the model with a final output layer using the sigmoid activation function, because our task is binary segmentation, that is, determining whether we are dealing with a cinder cone or not:
outputs = Conv2D(1, (1,1), padding="same", activation="sigmoid")(last_layer)
When compiling the model in the loss function, the imbalance of the classes must be taken into account. The number of cinder cones relative to the background area is small, so we will use Tversky Loss.
There are many different loss functions. To measure the percentage of overlapping pixels between the input mask and the segmentation mask, take the Jaccard similarity coefficient:
model.compile(tf.keras.optimizers.Adam(5e-5), loss= tversky_loss, metrics=[jaccard_coefficient],) reduce_lr = ReduceLROnPlateau(monitor="loss", factor=0.2, patience=15, verbose=1, mode="max", min_lr=1e-6) early = tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=10) history = model.fit(x_train, y_train, epochs=100, batch_size = 32, validation_data = (x_val,y_val), callbacks = [reduce_lr, early])
With a low learning rate of 1e-5 and ReduceLROnPlateau over 100 epochs, the model achieved just over 80% validation IoU (join over intersection). Not so bad.
A more advanced labeling strategy can improve the indicator, without simplifications in the form of circular figures as labels, or another encoder model. You can experiment with freezing different layers of the encoder.
Testing the Model on Another Volcano
Let’s see how well the segmentation model recognizes the cinder cones of the Hualalai volcano:
We go through the same steps of preprocessing the input data to make it compatible with the model. Again we remove all unnecessary, apply filters, divide the array into tiles and get the shape (420, 224, 224, 3).
There is no need to increase the size of the samples here, so we will do without obtaining image tiles. Here are the input tiles:
After using the model to generate forecasts:
y_pred = model.predict(tiled_Hualalai_image)
Look at the segmented tiles before merging the image tiles again to make a segmentation mask with the dimensions of the original image:
Predictions are OK, but the model has problems segmenting small cinder cones. They may be related to the size of the cinder cones in the Mauna Kea training sample, which are usually larger.
Let’s look at the segmentation image superimposed on the original input image. To make it easier to detect cinder cones, let’s overlay the predicted segmentation masks on the image with a Sobel filter.
Cinder cones are depicted as small dots or rings:
The image looks much better:
The location of masks and cinder cones along the volcano generally coincides (northwest and southeast).
Cinder cones of various sizes are segmented, not just large ones.
All cinder cones generally correspond in shape to segmentation masks.
We reviewed an example of using image segmentation with geospatial datasets to identify cinder cones in Hawaii. By converting the segmentation masks from the input DEM data back to the original coordinates, it is possible to make data sets from them for further analysis (for example, analysis of the size distribution of cinder cones or changes in their density). Thank you for your attention!
 USGS, 2013, USGS 13 arcsecond s203156 1 x 1 degree: USGS. Data obtained from the US Geological Survey and the National Geospatial Program. The National Map data is free and taken from open sources.
You can continue learning Data Science or Python in our courses:
Learn more here.
Other professions and courses
Data Science and Machine Learning
Python, web development
Java and C#
From basics to depth
As well as