# Creating a 3D mesh from an image with Python

A few years ago, generating a 3D mesh from a single 2D image was a difficult task. But today, thanks to the advancement of deep learning, many monocular depth estimation models have been developed that give an accurate estimate of the image depth map. With this map, you can create a mesh by performing a surface reconstruction. Details before the start of our __Python full stack development course__.

## Introduction

*Point cloud. original file **Open3D*

You can start with an unstructured point cloud and get a mesh, that is, a three-dimensional representation of an object from many vertices and polygons [полигонов]. The most common type of mesh is a triangular mesh, which consists of many 3D triangles connected by common edges or vertices. In the literature, you will find several methods for obtaining a triangular mesh from a point cloud; the most popular are alpha shape¹, ball rotation² and Poisson surface reconstruction³. These methods are called surface reconstruction algorithms.

*triangular grid. original file open3d*

The procedure for creating a grid from an image in this tutorial consists of three steps:

- Depth Estimation: Using a monocular depth estimation model, a depth map of the input image is generated.
- Point cloud generation: The depth map is converted to a point cloud.
- Mesh Generation: Using a surface reconstruction algorithm, a mesh is generated from a point cloud.

To complete this procedure, you will need an image. If you don’t have it, download it here:

*Bedroom. Image from NYU Depth V2*

## 1. Depth estimation

The monocular depth estimation model of choice for this guide is GLPN⁴. You can get it at Hugging Face Model Hub using the library Transformers by Hugging Face.

To do this, install the latest version of Transformers from PyPI:

`pip install transformers`

The code below evaluates the depth of the input image:

```
import matplotlib
matplotlib.use('TkAgg')
from matplotlib import pyplot as plt
from PIL import Image
import torch
from transformers import GLPNFeatureExtractor, GLPNForDepthEstimation
feature_extractor = GLPNFeatureExtractor.from_pretrained("vinvino02/glpn-nyu")
model = GLPNForDepthEstimation.from_pretrained("vinvino02/glpn-nyu")
# load and resize the input image
image = Image.open("image.jpg")
new_height = 480 if image.height > 480 else image.height
new_height -= (new_height % 32)
new_width = int(new_height * image.width / image.height)
diff = new_width % 32
new_width = new_width - diff if diff < 16 else new_width + 32 - diff
new_size = (new_width, new_height)
image = image.resize(new_size)
# prepare image for the model
inputs = feature_extractor(images=image, return_tensors="pt")
# get the prediction from the model
with torch.no_grad():
outputs = model(**inputs)
predicted_depth = outputs.predicted_depth
# remove borders
pad = 16
output = predicted_depth.squeeze().cpu().numpy() * 1000.0
output = output[pad:-pad, pad:-pad]
image = image.crop((pad, pad, image.width - pad, image.height - pad))
# visualize the prediction
fig, ax = plt.subplots(1, 2)
ax[0].imshow(image)
ax[0].tick_params(left=False, bottom=False, labelleft=False, labelbottom=False)
ax[1].imshow(output, cmap='plasma')
ax[1].tick_params(left=False, bottom=False, labelleft=False, labelbottom=False)
plt.tight_layout()
plt.pause(5)
```

To work with GLPN, the Transformers library provides two classes: `GLPNFeatureExtractor`

– for preprocessing the input data, and the model class – `GLPNForDepthEstimation`

.

Due to the architecture, the output size of the model is:

*output size. Image generated with codecogs*

So the size `image`

is changed so that the height and width are a multiple of 32, otherwise the model output will be smaller than the input. This is necessary because the point cloud will be rendered using image pixels, which requires the input image and output depth map to be the same size.

Monocular depth estimation models try to get high quality predictions near boundaries, so the output (`output`

) are truncated in the center (line 33). To keep the same dimensions, also cropped in the center `image`

(line 34).

Here are some predictions:

*Bedroom depth forecast. Input image from NYU Depth V2*

*Game room depth prediction. Input image from NYU Depth V2*

*Office depth forecast. Input image from NYU Depth V2*

## 2. Building a point cloud

The 3D rendering part will use Open3d⁵. This is probably the best Python library for this kind of task.

Install the latest Open3d from PyPI:

`pip install open3d`

The code below converts the estimated depth map into an Open3D point cloud object:

```
import numpy as np
import open3d as o3d
width, height = image.size
depth_image = (output * 255 / np.max(output)).astype('uint8')
image = np.array(image)
# create rgbd image
depth_o3d = o3d.geometry.Image(depth_image)
image_o3d = o3d.geometry.Image(image)
rgbd_image = o3d.geometry.RGBDImage.create_from_color_and_depth(image_o3d, depth_o3d, convert_rgb_to_intensity=False)
# camera settings
camera_intrinsic = o3d.camera.PinholeCameraIntrinsic()
camera_intrinsic.set_intrinsics(width, height, 500, 500, width/2, height/2)
# create point cloud
pcd = o3d.geometry.PointCloud.create_from_rgbd_image(rgbd_image, camera_intrinsic)
```

An RGBD image is simply a combination of an RGB image and a corresponding depth image. Class `PinholeCameraIntrinsic`

stores the so-called internal matrix of the camera. With this matrix, Open3D can create a point cloud from an RGBD image with the correct spacing between points. Leave the internal settings as they are. For more information, see additional resources at the end of the guide.

To visualize run this line:

`o3d.visualization.draw_geometries([pcd])`

## 3. Mesh generation

Among the various methods for this task that you will find in the literature, this one uses the Poisson surface reconstruction algorithm³: it usually gives better and softer results than others.

Using the algorithm from the Poisson point cloud obtained in the last step, this code generates a grid:

```
# outliers removal
cl, ind = pcd.remove_statistical_outlier(nb_neighbors=20, std_ratio=20.0)
pcd = pcd.select_by_index(ind)
# estimate normals
pcd.estimate_normals()
pcd.orient_normals_to_align_with_direction()
# surface reconstruction
mesh = o3d.geometry.TriangleMesh.create_from_point_cloud_poisson(pcd, depth=10, n_threads=1)[0]
# rotate the mesh
rotation = mesh.get_rotation_matrix_from_xyz((np.pi, 0, 0))
mesh.rotate(rotation, center=(0, 0, 0))
# save the mesh
o3d.io.write_triangle_mesh(f'./mesh.obj', mesh)
```

First, the code removes outliers from the point cloud. A cloud can contain noise and artifacts for various reasons. In this scenario, the model could predict some depths that differ too much from neighboring depths.

The next step is to evaluate the normal. A normal is a vector (naturally having magnitude and direction) perpendicular to a surface or object, and must be evaluated in order to be processed by the Poisson algorithm. For more information about these vectors, see the additional resources at the end of the guide.

Finally, the algorithm is executed. The level of detail of the grid is determined by the value `depth`

. In addition to improving mesh quality, a higher depth value increases output sizes.

To visualize the grid, I advise you to download MeshLabbecause there are 3D visualization programs in b/w only.

Here is the final result:

*Generated Mesh*

*Grid from a different angle*

Since the final result varies depending on the value `depth`

is a comparison of its various values:

*Comparison of different depth values*

Algorithm with `depth=5`

resulted in a 375 KB grid, `depth=6`

– to 1.2 MB, `depth=7`

– to 5 MB, `depth=8`

– to 19 MB, `depth=9`

– to 70, and `depth=10`

– to 86 MB.

## Conclusion

Despite the use of one image, the result is quite good. By tweaking the 3D, you can achieve even better results. This guide cannot fully cover all the details of 3D data processing, so I encourage you to read other resources (listed below) to better understand all aspects.

**Additional resources:**

Thanks for reading. I hope you found the material useful.

**Literature**

[1] H. Edelsbrunner, and E. P. Mücke, Three-dimensional Alpha Shapes (1994)

[2] F. Bernardini, J. Mittleman, H. Rushmeier, C. Silva, and G. Taubin, [The ball-pivoting algorithm for surface reconstruction](http://the ball-pivoting algorithm for surface reconstruction) (1999)

[3] M. Kazhdan, M. Bolitho and H. Hoppe, Poisson Surface Reconstruction (2006)

[4] D. Kim, W. Ga, P. Ahn, D. Joo, S. Chun, and J. Kim, Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth (2022)

[5] Q. Zhou, J. Park, and V. Koltun, Open3D: A Modern Library for 3D Data Processing (2018)

[6] N. Silberman, D. Hoiem, P. Kohli, and Rob Fergus, Indoor Segmentation and Support Inference from RGBD Images (2012)

And we will teach you how to work with Python so that you can upgrade your career or become a sought-after IT specialist:

To view all courses, click on the banner:

**Brief catalog of courses**

**Data Science and Machine Learning**

**Python, web development**

**Mobile development**

**Java and C#**

**From basics to depth**

**As well as**