Calling Go functions from Python using ctypes

Python channelwe discussed how to do this, and this article was born as a result.

To implement all the steps we will need: Python, Go compiler and GCC (MinGW for Windows). Code examples are available in the repo at Github.

It's worth noting that there are other ways to call Go from Python − SWIG, For example. Here we will look at ctypesbecause it doesn't require any additional dependencies and is very simple.

Go!

Plan

Hello world

Let's start with hello-world, where would we be without it?

hello.go

package main

import "C"
import "fmt"

//export hello
func hello() {
    fmt.Println("Hello world!")
}

func main() {}

Now we collect based on hello.go file hello.dll – for this we compile the first one with the flag -buildmode=c-shared

# Windows:
go build -o hello.dll -buildmode=c-shared hello.go
# Linux:
go build -o hello.so -buildmode=c-shared hello.go

Now, for use hello.dll in Python we need to include this file using ctypes.CDLL():

hello.py

import ctypes

lib = ctypes.CDLL('./hello.dll')  # Or hello.so if on Linux.
hello = lib.hello

hello()

Let's run:

> python hello.py
Hello world!

Great; There are a couple of points worth noting here:

  1. The Go code is absolutely standard, the only thing is that you need to specify //export hello to export a function hello for external use

  2. Assembly with flag -buildmode=c-shared creates a shared C-style library.

  3. Loading such a library in Python is done via ctypes.CDLL()

Simple input and output

Well, now the function will take some arguments and return a value:

primitive.go

//export add
func add(a, b int64) int64 {
    return a + b
}

primitive.py

lib = ctypes.CDLL('./primitive.dll')
add = lib.add

# конвертируем значения в C-представление
add.argtypes = [ctypes.c_int64, ctypes.c_int64]
add.restype = ctypes.c_int64

print('10 + 15 =', add(10, 15))

Let's launch:

> python add.py
10 + 15 = 25

So, to pass input and receive output from a Go function, you need to use attributes argtypes And restype from the library ctypes. A couple of points:

  1. argtypes checks arguments before calling library code

  2. Using these attributes tells Python how to convert Python input values ​​to values ctypesand how to convert the output values ​​back to Python values.

By the way, you can look for correspondence between C types and Go types in the generated file .h after compiling your Go code with -buildmode=c-shared.

Attention: strictly speaking, types are closely related to hardware architecture. In general, it is safer to use dimension types (int64) than dimensionless (int)

Arrays and slices

Okay, now let's talk about arrays and slices. This brings us into unsafe memory access territory – although Python and Go are generally memory safe, working with raw pointers can lead to buffer overflows or leaks.

arrays.go

// возвращает квадраты введённых чисел
//
//export squares
func squares(numsPtr *float64, outPtr *float64, n int64) {
	nums := unsafe.Slice(numsPtr, n)
	out := unsafe.Slice(outPtr, n)

	// кстати, для Go < 1.17
	// nums := (*[1 << 30]float64)(unsafe.Pointer(numsPtr))[:n:n]
	// out := (*[1 << 30]float64)(unsafe.Pointer(outPtr))[:n:n]

	for i, x := range nums {
		out[i] = x * x
	}
}

arrays.py

And we call this dll in Python:

lib = ctypes.CDLL('./arrays.dll')
squares = lib.squares

squares.argtypes = [
    ctypes.POINTER(ctypes.c_double),
    ctypes.POINTER(ctypes.c_double),
    ctypes.c_int64,
]

# использовать from_buffer() более эффективно, чем просто делать:
# (ctypes.c_double * 3)(*[1, 2, 3])
nums = array('d', [1, 2, 3])
nums_ptr = (ctypes.c_double * len(nums)).from_buffer(nums)
out = array('d', [0, 0, 0])
out_ptr = (ctypes.c_double * len(out)).from_buffer(out)

squares(nums_ptr, out_ptr, len(nums))
print('nums:', list(nums))
print('out:', list(out))

Let's launch:

> python squares.py
nums: [1.0, 2.0, 3.0]
out: [1.0, 4.0, 9.0]

Summary: To work with lists, we need to convert them into C arrays. To do this, we need to create an array using (ctypes.my_type * my_length)(1, 2, 3 ...). A faster way is to use a library array, as shown above. We'll touch on this a little later, when we talk about benchmarks.

In Go you can make a C-like pointer point to a slice. This way you can use Go syntax when working with Python buffers.

A couple more points: you can't return a Go pointer when using CGo, it will cause an error. Instead, you can allocate a pointer to C from Go using C.malloc() and return it. However, the garbage collector does not interact with such pointers in any way, so you need to provide a mechanism for removing such pointers to avoid memory leaks.

What can we recommend here? In order for a function to safely return an array from Go, you must either first allocate memory for them in Python and pass it to Go as arguments, or generate arrays in Go and wrap them in a safe structure (we’ll touch on this a little later).

Let's summarize the dangers:

  • Returning Go pointers in Python. Error.

  • Returning C pointers from Go to Python without explicit removal. Memory leak.

  • Lost link ctypes while Go code is running (for example, when receiving ctypes.addressof and resetting the pointer object). Possible segmentation error.

Strings

Strings are designed much like arrays in terms of memory management, so everything about arrays also applies to them. Let's discuss a couple of useful techniques and some pitfalls:

string.go

//export repeat
func repeat(s *C.char, n int64, out *byte, outN int64) *byte {
	// помещаем наш выходной буфер в буфер Go 
	outBytes := unsafe.Slice(out, outN)[:0]
	buf := bytes.NewBuffer(outBytes)

	var goString string = C.GoString(s) // копируем ввод в пространство памяти Go
	for i := int64(0); i < n; i++ {
		buf.WriteString(goString)
	}
	buf.WriteByte(0) // важно - нулевой байт в конец строки
	return out
}

string.py

lib = ctypes.CDLL('./string.dll')
repeat = lib.repeat

repeat.argtypes = [
    ctypes.c_char_p,
    ctypes.c_int64,
    ctypes.c_char_p,
    ctypes.c_int64,
]
repeat.restype = ctypes.c_char_p

# 
buf_size = 1000
buf = ctypes.create_string_buffer(buf_size)

result = repeat(b'Badger', 4, buf, buf_size)  # type(result) = bytes
print('Badger * 4 =', result.decode())

result = repeat(b'Snake', 5, buf, buf_size)
print('Snake * 5 =', result.decode())

Let's launch:

> python repeat.py
Badger * 4 = BadgerBadgerBadgerBadger
Snake * 5 = SnakeSnakeSnakeSnakeSnake

Strings are passed by converting a Python string to an object bytes (usually by calling encode()), then to a C pointer and then to a Go string.

Usage ctypes.c_char_p V argtypes makes Python expect an object bytes and convert it to C *char. IN restype it converts the returned *char to object bytes.

In Go you can convert *char into a Go line using C.GoString. This copies the data and creates a new row, managed by Go from a garbage collection perspective. To create *char as a return value, you can call C.CString. However, the pointer will be lost if you don't keep a reference to it, and then a memory leak will occur. To return strings from Go, you can use the same techniques as when working with arrays.

If a pointer to the output was passed to Python, Go can return it and Python will automatically create a bytes object from it.

So what problems might arise?

  • Return C.CString without saving the link for later deletion. Memory leak.

  • Not adding a null byte to the end of the output string. Buffer overflow when converting to Python object.

  • Lack of output buffer size check in Go. Buffer overflow or incomplete output.

String array

By the way, you can pass an array of strings like this:

join.go

func goStrings(cstrs **C.char) []string {
	var result []string
	slice := unsafe.Slice(cstrs, 1<<30)
	for i := 0; slice[i] != nil; i++ {
		result = append(result, C.GoString(slice[i]))
	}
	return result
}

join.py

def to_c_str_array(strs: List[str]):
    ptr = (ctypes.c_char_p * (len(strs) + 1))()
    ptr[:-1] = [s.encode() for s in strs]
    ptr[-1] = None  
    return ptr

Numpy and Pandas

NumPy buffers are accessed using the syntax .ctypes.data_as(ctypes.whatever). In pandas you can use the attribute .values to get the underlying NumPy array, and then use NumPy syntax to get the actual pointer. Thus, you can change the array/table in place, looks like:

numpypandas.go

//export increase
func increase(numsPtr *int64, n int64, a int64) {
	nums := unsafe.Slice(numsPtr, n)
	for i := range nums {
		nums[i] += a
	}
}

numpypandas.py

lib = ctypes.CDLL('./numpypandas.dll')
increase = lib.increase

increase.argtypes = [
    ctypes.POINTER(ctypes.c_int64),
    ctypes.c_int64,
    ctypes.c_int64,
]

people = pandas.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [20, 30, 40],
})

# проверяем тип
ages = people.age
if str(ages.dtypes) != 'int64':
    raise TypeError(f'Expected type int64, got {ages.dtypes}')

values = ages.values  # type=numpy.Array
ptr = values.ctypes.data_as(ctypes.POINTER(ctypes.c_int64))

print('Before')
print(people)

print('After')
increase(ptr, len(people), 5)
print(people)

Let's launch:

> python table.py
Before
      name  age
0    Alice   20
1      Bob   30
2  Charlie   40
After
      name  age
0    Alice   25
1      Bob   35
2  Charlie   45
>

It's important to check the type of an array before passing it to a Go function, as the data may be of a different numeric type (int<->float), different size (int64<->int32) or like in general object.

Another thing to keep in mind is that Pandas copies tables when rows are selected. Let's say if we have DataFrame With name peopleThat people[people['age'] < 40] will return a copy people. Therefore, passing a copy to Go will not affect the original table.

Structures

To work with structs, you must define them in both Python and C. Go structs cannot be exported.

structs.go

/*
struct person {
  char* firstName;
  char* lastName;
  char* fullName;
  long long fullNameLen;
};
*/
import "C"
import (
	"bytes"
	"unsafe"
)

//export fill
func fill(p *C.struct_person) {
	buf := bytes.NewBuffer(unsafe.Slice((*byte)(unsafe.Pointer(p.fullName)),
		p.fullNameLen)[:0])
	first := C.GoString(p.firstName)
	last := C.GoString(p.lastName)
	buf.WriteString(first + " " + last)
	buf.WriteByte(0)
}

structs.py

class Person(ctypes.Structure):
    _fields_ = [
        ('first_name', ctypes.c_char_p),
        ('last_name', ctypes.c_char_p),
        ('full_name', ctypes.c_char_p),
        ('full_name_len', ctypes.c_int64),
    ]


lib = ctypes.CDLL('./structs.dll')

fill = lib.fill
fill.argtypes = [ctypes.POINTER(Person)]

buf_size = 1000
buf = ctypes.create_string_buffer(buf_size)
person = Person(b'John', b'Galt', buf.value, len(buf))
fill(ctypes.pointer(person))

print(person.full_name)

Since we can't export Go structures, we define them in C by adding a comment above the line import "C". By the way, as you can see, in Go the structure person denoted as C.struct_person. In Python we define an equivalent class ctypes.Structurewhich has exactly the same fields.

You can fill in the fields struct in Go using simple primitives. If arrays and strings are used, the same restrictions apply as before.

Automatic memory management using __del__

Setting up a convenient and safe memory management scheme is the last thing left to do, let's get started. Using Python dunder method (__del__)we can conveniently allocate memory to buffers in Go (C), and free it in Python when the object is deleted.

This scheme is simple and requires 2 things: a function in Go that will allocate memory for an object, and a function in Python that will call the Go function.

The Python function will be called automatically when the number of references to the object becomes zero.

del.go

/*
#include <stdlib.h>
struct userInfo {
  char* info;
};
*/
import "C"
import (
	"fmt"
	"unsafe"
)

// аллоцируем память для объекта
//
//export getUserInfo
func getUserInfo(cname *C.char) C.struct_userInfo {
	var result C.struct_userInfo
	name := C.GoString(cname)
	result.info = C.CString(
		fmt.Sprintf("User %q has %v letters in their name",
			name, len(name)))
	return result
}

// деаллоцируем память для объекта
//
//export delUserInfo
func delUserInfo(info C.struct_userInfo) {
	// печатаем для наглядности
	fmt.Printf("Freeing user info: %s\n", C.GoString(info.info))
	C.free(unsafe.Pointer(info.info))
}

del.py

class UserInfo(ctypes.Structure):
    _fields_ = [('info', ctypes.c_char_p)]

    def __del__(self):
        del_user_info(self)

lib = ctypes.CDLL('del.dll')
get_user_info = lib.getUserInfo
get_user_info.argtypes = [ctypes.c_char_p]
get_user_info.restype = UserInfo
del_user_info = lib.delUserInfo
del_user_info.argtypes = [UserInfo]

def work_work():
    user1 = get_user_info('Alice'.encode())
    print('Info:', user1.info.decode())
    print('-----------')

    user2 = get_user_info('Bob'.encode())
    print('Info:', user2.info.decode())
    print('-----------')

    # В этот момент объекты user1 и user2 должны быть удалены

work_work()
print('Did I remember to free my memory?')

Let's launch:

Name: Alice
Description: User "Alice" has 5 letters in their name
Name length: 5
-----------
Name: Bob
Description: User "Bob" has 3 letters in their name
Name length: 3
-----------
Freeing user info: User "Alice" has 5 letters in their name
Freeing user info: User "Bob" has 3 letters in their name
Did I remember to free my memory?

Fabulous

Error processing

Passing Go errors back to Python is necessary for the program to function properly. To do this, we will create a reusable error type.

error.go

/*
#include <stdlib.h>
typedef struct {
	char* err;
} error;
*/
import "C"

// ...

func newError(s string, args ...interface{}) C.error {
	if s == "" {
		return C.error{}  // эквивалентно ошибке nil в Go 
	}
	msg := fmt.Sprintf(s, args...)
	return C.error{C.CString(msg)}
}

//export delError
func delError(err C.error) {
	if err.err == nil {
		return
	}
	C.free(unsafe.Pointer(err.err))
}

error.py

class Error(ctypes.Structure):
    _fields_ = [('err', ctypes.c_char_p)]

    def __del__(self):
        if self.err is not None:
            del_error(self)

    def raise_if_err(self):
        if self.err is not None:
            raise IOError(self.err.decode())

# ...

del_error = lib.delError
del_error.argtypes = [Error]

Great, now we can use the new type Error in structures and functions with multiple return values

A little about improving productivity

Cost of an empty call

The cost of an empty function call is around 5 μs. Quite a lot compared to calling a native function. It turns out that CGo has high call overhead. Moreover, this also happens when calling Go from native C code, regardless of whether the Go code is linked through a dynamic or static library.

This overhead should be taken into account when designing the API. If each function call takes 5 µs of Go work, then 50% of the time will be spent on call overhead. If there are 500 Go operations for each function call, then the call overhead will be about 1%.

Memory reuse

For calls that are repeated many times, if it makes sense, you can allocate memory 1 time using ctypes and reuse it for all repeated calls.

It looks like this:

# обёртка ctypes для функции в Go
my_function = my_lib.my_function

def my_function_with_buffer(n: int):
    buffer = (ctypes.c_char * n)(*([0] * n))
    def my_function_with_closure():
        my_function(buffer, n)
    return my_function_with_closure

def work_work():
    my_function_buffered = my_function_with_buffer(1000)
    my_function_buffered()
    my_function_buffered()
    my_function_buffered()

Using the array library for memory allocation

As mentioned above, using the library array for memory allocation faster than regular value constructor (ctypes.type * n).

Benchmarks

And finally, a few comparisons illustrating the benefits of calling Go functions from Python compared to just using Python functions. For completeness, all measurements include the overhead of converting values ​​to and from C representation.

Calculation of π

A simple calculation of π to get an idea of ​​how much faster Go can be.

pi

pi

Shuffle 10M elements in random order

Hmm, it turns out that using Go can be faster than even Python's built-in modules.

shuffle

shuffle

Using array and the method recommended by ctypes

Comparison of the method recommended ctypesusing array to convert Python values ​​to C values.

# используем ctypes
cvals = (ctypes.c_double * n)(*nums)

# используем array
arr = array('d', nums)
cvals = (ctypes.c_double * n).from_buffer(arr)
list

list

Well, so we discussed how you can call Go from Python, thanks to Guido for the opportunity to use C libraries in Python

If there were any inaccuracies, please correct them in the comments.

By the way, I'm driving telegram channel on Python, in which I describe interesting frameworks, libraries, open-source tools and more
and for those who love and study Golang, I can recommend another excellent resource. You can probably find something useful for yourself there, so welcome)

Thank you very much for reading this article!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *