Rust - Python FFI From Scratch

I was recently working on a side project that involves communication between binaries written in Rust and web interfaces written in Python. Moving a part of my project onto a language like Rust is under several considerations: 1) the logic is all about manipulating byte arrays, where Python has deficit and system language like Rust is superior; 2) the logic happens to be complicated, I need a static type system to ensure the correctness, and also the match expression of Rust is found helpful in getting things concise; 3) I was planning to develop CLI tools with Rust, which calls this fraction of functionality, and I don’t want to rewrite the stuff in the future.

For half a day, I was reading about FFI between the two languages, and finally got everything work as intended. Before stepping into the technical sharing, I would firstly set up the stage of our story.

My project stores a list of “widgets” in local files on the server side. As of what “widgets” are, it’s sufficient to know that they are stored in a compact binary format. A web endpoint recieves user-provided modification in a similar format, applies them on existing widgets and writes the new widgets back to the files. Technically, the endpoint has two byte arrays as input, mods extracted from the HTTP request and widgets read from local files, which are processed at the Rust side by a function named mod_widgets(), and yields one byte array new_widgets as output.

Rust SidePython Sidemod_widgets()consumenew_widgets...new_widgetsrecieved modswidgets,mods

“byte array” here refers to an array of consecutively stored unsigned 8-bit integers, which is a language-agnostic term. Different languages may adopt their own types or representations for this term, e.g., bytearray in Python, Vec<u8> in Rust or Uint8Array in Javascript.

I have learned about PyO3, a fascinating framework for writing Python extensions in Rust. But this time I choose not to use it but build everything from scratch. The data passed around is in a basic representation of byte array, simple enough even without the help of PyO3. Besides, building from scratch will give me a better understanding of how the gear works, and how to accomplish in a memory-safe 1 and efficient 2 way.

To this end, I was going to write a dynamic library in Rust, from which the Python code calls certain functions to do the job. The post is organized in three parts. The first part briefly introduces the fundamental of calling functions written by a different programming language, and the following parts show detailed implementation in both Rust and Python.

FFI Fundamental

There’s technical term FFI 3 to describe the mechanism of calling functions from a different programming language. One of the common ways to do FFI is wrapping the functions written in one language in a dynamically linked library 4, which is then loaded from another language. However in practice, there’re tons of troubles to concern about and coding can get really cumbersome.

One of the troubles is ABI compatibility. ABI here refers to Application Binary Inteface, an interface between two binary program modules. The term is less familiar to whom work at a higher level such as Python users, but it does exist everywhere in your computer.

When a program calls functions from a library, they should conform the same conventions in various aspects, such as calling conventions 5, memory layout 6, etc.. For example, C organizes struct fields in the order of their definition, while Rust by default does not guarantee. There’re also other details like name mangling, where Rust may give the function a different name when producing bianries. If not turned off, the functions can not be located from the Python side.

Currently, most foreign code exposes a C ABI, i.e., the ABI generated by the C compiler on your platform. While not standardized, the mechanism serves as a widely used ABI in many cases.

Another concern is the implications of different languages that might cause undesired behavior on the other side. For example, in GC-based languages like Python, if you pass the pointer to an variable to a foreign function and then drop the variable, the variable might be recycled and the pointer becomes invalid at the other side:

def func_getb():
a = bytearray(...)
b = ffi_atob(ptr_of(a)) # b hold a reference to a
return b
# a dropped here!

def func_btoc(b):
return ffi_btoc(b)

b = func_getb() # a might be recycled, which invalidates b
c = func_btoc(b) # BOOM!

Similar cases exist in Rust. If one allocates a Vec<u8> on the stack and passes out its inner raw pointer, the vec will be dropped at the end of the function and invalidates the pointer:

pub unsafe extern "C" fn foo() -> *const u8 {
let v = vec![0u8; 1024];
v.as_ptr()
// v dropped here! returned pointer becomes invalid.
}

Exception handling is another headache. Since different languages have their own approaches to propagate exceptions, it’s essential to stop them before crossing the binary boundary, or otherwise undefined behaviors will occur or even the whole program be crashed.

In this story, I follow the convention of using C ABI as the bridge of foreign calling, that is, exposing interfaces with extern "C" at Rust and calling with ctypes at Python. Efforts also should be my to ensure exception and memory safety.

The Rust Side

Let’s start by creating a new cargo package. Simply open a terminal in a new directory and type cargo init --lib. The brilliant Rust package manager will prepare the scaffold to build a shared library. We then add the following lines to the manifest file Cargo.toml

# -- snip --
[lib]
crate-type = ["cdylib"]

As described in the doc, the crate-type = ["cdylib"] is essential for generating FFI-purposed libraries, without which the library won’t be built with a stable ABI.

Our core implementation will go into the file lib.rs. In this simple library, we are going to expose a function mod_widgets. Let’s write down the a skeleton for it

// lib.rs
#[no_mangle]
pub unsafe extern "C" fn mod_widgets(widgets: *mut FFIVec, mods: *mut FFIVec) -> *const FFIVec {
todo!()
}

#[no_mangle] here disables name mangling for this function. It will therefore appear in the symbol table with exactly the name mod_widgets and become accessible from outside. The extern "C" syntax tells the compiler that our function should conform the C ABI, so that other binaries can call it with the same convention.

FFIVec is a struct encapsulates a byte array and acts as a bridge to exchange data between the two languages. The details will be introduced shortly later, but before which I would like to clarify a fact, that is, there actually exist two kinds of byte arrays in our story

  1. Byte arrays created at Python and accessed from Rust;
  2. Byte arrays created at Rust and accessed from Python.

Why is it important? Because objects created at one language cannot be de-allocated at the other side. Imagine, for instance, you apply for a loan at Bank A but later you pay back the money to Bank B. Bank B will have no idea how to deal with the money and Bank A will never remove your loan record, causing a mess in both sides. Each language has its own and distinct implementation for memory allocation, and we must pair the allocator with corresponding de-allocator. We must keep this in mind when crafting the details of FFIVec.

Byte Arrays from Python

To access the first kind of byte array in Rust, it is sufficient to use the following data structure, which tells the starting location and the length of the array

#[repr(C)]
#[derive(Debug)]
pub struct FFIVec {
len: usize,
data: *const u8,
}

The attribute #[repr(C)] enforces the compiler to use a memory layout that conform the C ABI. Otherwise Rust code might read unaligned or corrupted data.

Code at Python side can construct a similar struct using something like ctypes and pass its reference to mod_widgets(). Code at Rust side then re-interprets the *mut FFIVec pointer to type &[u8] for easier further processing, with the help of function FFIVec::raw_to_slice()

impl FFIVec {
unsafe fn raw_to_slice<'a>(ptr: *mut Self) -> &'a [u8] {
let v = Box::from_raw(ptr);
let slice = core::slice::from_raw_parts(v.data, v.len);
// since v "owns" the buffer, we forget it to not let the struct get dropped
std::mem::forget(v);
slice
}
}

It’s worth noting that raw_to_slice() does not perform buffer copying, and therefore impose little runtime overhead, which is efficient.

Byte Arrays from Rust

As of returning a byte array to Python (i.e., the second kind), however, things get a bit complicated. Remember the pre-caution above – if a piece of memory is allocated by Rust code, it should eventually be de-allocated by Rust code. Our library thus need to export another function

#[no_mangle]
pub extern "C" fn free_ffi_vec(v: *mut FFIVec) {
todo!()
}

The Python code is responsible to call free_ffi_vec() manually after consuming the FFIVec struct returned by mod_widgets(). Our working flow is thus extended into

Rust SidePython Sidemod_widgets()consumenew_widgets...new_widgetsfree_ffi_vec()received modswidgets, modsnew_widgets

At Rust side, the duty of free_ffi_vec() is to de-allocate the underlying memory, given the argument of type *mut FFIVec as handle. However, it’s not that trivial as we thought, if we stick to a design of FFIVec like above.

To allocate a byte array, a “Rust-ic” and common way is by constructing a Vec<u8>. It’s straight-forward to write a function that converts a Vec<u8> into *mut FFIVec, in order that it could be pass across the FFI boundary

impl FFIVec {
fn from_vec(vec: Vec<u8>) -> *mut Self {
// build an FFIVec on heap
let v = Box::new(Self {
len: vec.len(),
data: vec.as_ptr(),
});
std::mem::forget(vec);
Box::into_raw(v)
}
}

Fairly neat and sound. But it upsets you soon, since there’s no way to implement free_ffi_vec() with such a design!

The problem is, our only way to do de-allocation is recovering the *mut FFIVec back to a Vec<u8> and call the drop() function. However, the recover job is impossible, since we “lose” it after the forget() calling.

As a workaround, we have to make our FFIVec a litte bit “fatter”, and invite the original vec to live within the struct. Namely, the implementation is modified into

#[repr(C)]
#[derive(Debug)]
pub struct FFIVec {
len: usize,
data: *const u8,
// hl: begin
storage: *mut Vec<u8>,
// hl: end
}

impl FFIVec {
fn from_vec(vec: Vec<u8>) -> *mut Self {
// hl: begin
let vec = Box::new(vec);
// hl: end
let v = Box::new(Self {
len: vec.len(),
data: vec.as_ptr(),
// hl: begin
storage: Box::into_raw(vec),
// hl: end
});
Box::into_raw(v)
}
}

A pointer to the original vec is stored as a member in the new design of FFIVec. We now are able to recover it even with a bare *mut FFIVec, and happy to implement the de-allocation

#[no_mangle]
pub extern "C" fn free_ffi_vec(v: *mut FFIVec) {
if v != 0 as *mut FFIVec {
unsafe {
let v = Box::from_raw(v);
drop(Box::from_raw(v.storage));
drop(v)
}
}
}

Don’t Panic

As aforementioned, we have to care a bunch of implications of Rust, and panic is one of them. The Python interpreter have no idea of who it is interacting with, not even the alien mechanism of panic. It will goes “really panick” if such things cross the boundary, and it’s our duty to stop them just at Rust side.

There’re many ways to avoid panics. For example, always do checking before .wrap() or something alike. In this story, I adopt a rather brutal one which is using std::panic::catch_unwind() 7. This guy behaves as a top level try-catch in other languages that invokes a closure and blocks potential panics from leaking. More information on this function can be found in the doc.

Now put them together, we finish the part of code to bridge the FFI gap at Rust side

#[repr(C)]
#[derive(Debug)]
pub struct FFIVec {
len: usize,
data: *const u8,
storage: *mut Vec<u8>,
}

impl FFIVec {
unsafe fn raw_to_slice<'a>(ptr: *mut Self) -> &'a [u8] {
let v = Box::from_raw(ptr);
let slice = core::slice::from_raw_parts(v.data, v.len);
std::mem::forget(v);
slice
}
fn from_vec(vec: Vec<u8>) -> *mut Self {
let vec = Box::new(vec);
let v = Box::new(Self {
len: vec.len(),
data: vec.as_ptr(),
storage: Box::into_raw(vec),
});
Box::into_raw(v)
}
}

#[no_mangle]
pub extern "C" fn free_ffi_vec(v: *mut FFIVec) {
if v != 0 as *mut FFIVec {
unsafe {
let v = Box::from_raw(v);
drop(Box::from_raw(v.storage));
drop(v)
}
}
}

#[no_mangle]
pub unsafe extern "C" fn mod_widgets(widgets: *mut FFIVec, mods: *mut FFIVec) -> *const FFIVec {
std::panic::catch_unwind(|| {
let widgets: &[u8] = FFIVec::raw_to_slice(widgets);
let mods: &[u8] = FFIVec::raw_to_slice(mods);
let new_widgets: Vec<u8> = play_with(widgets, mods);
FFIVec::from_vec(new_widgets)
})
// returns NULL on panic
.unwrap_or(0 as *const FFIVec)
}

The Python Side

It is now half the battle! And the rest half is easier to conquer.

To play with the shared library generated by Rust, we need the ctypes module. ctypes is the FFI library for Python. It provides facilities to compose or read from C data structures, and allows calling functions from a shared library. I won’t give an overall introduction to the library since it’s such a huge project, but rather focus on what we would use.

The Data Bridge

We would first deal with data bridge. As a peer to FFIVec in Rust, we should also declare the same struct at Python side

import ctypes

class FFIVec(ctypes.Structure):
_fields_ = [
("len", ctypes.c_size_t),
("data", ctypes.c_void_p),
("_storage", ctypes.c_void_p),
]

The syntax is straight-forward. ctypes provides various data types like c_size_t or c_void_p to represents their correspondences in C. These types work as guidance for data conversion. For example, when an Python int object (which has varied length) passed to a c_size_t, ctypes would know that it should be casted into an 8-bit unsigned integer (on 64-bit platform), and vice versa.

The FFIVec class is only a broker. Just like in Rust, we must implement the conversion between it and some more common Python data types, e.g., bytearray or bytes, to make it useful. The two methods below would work for the purpose

class FFIVec(...):
...
@classmethod
def from_bytearray(cls, buf: bytearray) -> "FFIVec":
l = len(buf)
ptr = (ctypes.c_uint8 * l).from_buffer(buf)
data = ctypes.cast(ptr, ctypes.c_void_p)
return cls(len=length, data=data, _storage=None)

def to_bytes(self) -> bytes:
ptr = (ctypes.c_uint8 * self.len).from_address(self.data)
return bytes(ptr)

The class method from_bytearray() creates an FFIVec instance from a given bytearray object. The instance shares the same memory view with that bytearray, and no memory copying occurs. The instance method to_bytes() instead constructs a bytes object, which the underlying memory is copied into.

You may notice the c_uint8 * l syntax. The expression is not doing arithmetics, but creates an array type. It’s equivalent to C uint8_t array type with fixed length l. With this type, we can access the buffer address of a bytearray object, or pump data into a bytes object.

Call the Functions

There’s one last piece in our puzzle – to interact with the functions from a foreign language. But firstly, we should load the shared library. This can be handled by ctypes.cdll.LoadLibrary

libw = ctypes.cdll.LoadLibrary("path/to/libmod_widgets.so")

After that, we declare the functions to be called, by attributing libw

PFFIVec = ctypes.POINTER(FFIVec)

mod_widgets = libw.mod_widgets
mod_widgets.argtypes = (PFFIVec, PFFIVec)
mod_widgets.restype = PFFIVec

free_ffi_vec = libw.free_ffi_vec
free_ffi_vec.argtypes = (PFFIVec,)

In the above code, we also specify the argument and return types of the foreign functions. This is not required 8, but I recommend to do so whenever possible. Specifying the types can avoid data being incorrectly coerced by ctypes.

That’s it! The functions mod_widgets and free_ffi_vec can now be called just like ordinary Python functions

old_widgets = bytearray(...)
mods = bytearray(...)
new_widgets = mod_widgets(
FFIVec.from_bytes(old_widgets),
FFIVec.from_bytes(mods),
).contents # .contents dereferences the pointer

new_widgets_bytes = new_widgets.to_bytes() # copy out as bytes
free_ffi_vec(new_widgets) # de-allocate at Rust side

Cheers!

Epilogue

This post showcases a simple yet practical example to call functions written in Rust from Python. I deliberately built all the stuff from scratch. Working at such a low level forces me to think carefully about potential edge cases, which I have never noticed before. It’s quite an entertainment and I learnt a good lesson.

  1. no memory leaking, no double freeing
  2. least memory copies possible
  3. https://en.wikipedia.org/wiki/Foreign_function_interface
  4. .so files in Linux or .dll files in Windows
  5. how to pass the arguments. by registers? by stack?
  6. how to order the fields of a struct? pad short fields or not?
  7. https://doc.rust-lang.org/std/panic/fn.catch_unwind.html
  8. https://docs.python.org/3/library/ctypes.html#specifying-the-required-argument-types-function-prototypes

Author: hsfzxjy.
Link: .
License: CC BY-NC-ND 4.0.
All rights reserved by the author.
Commercial use of this post in any form is NOT permitted.
Non-commercial use of this post should be attributed with this block of text.

«Initialize Process Pool Worker with Individual Value

OOPS!

A comment box should be right here...But it was gone due to network issues :-(If you want to leave comments, make sure you have access to disqus.com.