Rust + Python | Perl FFI Strings

Sat, 26 Dec 2020

Intro
Exposing Rust
Creating a Rust shim
Calling from Python
Calling from Perl

Intro

The internet is awash of examples of how to use Foreign Function Interface (FFI) to call functions which accept and return ints and floats, which are copy types. There seem to be fewer examples of passing strings back and forth over FFI. Passing (arbitrary length) strings requires managing the memory associated with those strings.

Goals:

Expose rust functions to Python or Perl which accept strings over the FFI.
Minimal dependencies.
Maximal safety, no crashing.

Limitations:

The particular example I’m using (passing JSON-encoded data) is not necessarily good API design, and will not be fast if we need to make may calls to rust due to encoding and decoding of JSON data.

Assumptions:

You know or are willing to learn enough rust to write a small shim layer.

Code is available at https://github.com/duelafn/blog-code/tree/main/2020/python-rust-string-ffi

In this post, I am passing JSON-encoded data to show a possible generalization, however, we could just as well be passing file names, binary-encoded data, … anything that looks like a string.

As I mention in the limitations, this probably isn’t a good way to design an API. It does, however, let us deal in complex data transfer without thinking too hard about the boundary between languages. It may be appropriate for exposing specialized functions that you only have to call a few times. It may be useful for heavy calculation as long as a single call from your scripting language will trigger a lot of calculation in rust.

Exposing Rust

We just have to add cdylib to the crate-type in Cargo.toml. This will cause cargo to produce a shared library (.so in Linux, or .dll in Windows) which can be used by the FFI library.

[lib]
name = "mylib"
path = "src/lib.rs"
crate-type = ["cdylib"]

My examples also use serde_json as a dependency since I am passing JSON strings over the interface.

Creating a Rust shim

Since we’re not using any of the tools which automatically create shim layers, we have to do this ourselves. It’s not too hard, our main concerns are managing the memory used by the passed strings and handling errors in a nice way.

We’ll need two exposed functions.

mylib_myfunc_str - The function we want to expose
mylib_free_string - Function to free the string returned by mylib_myfunc_str

Notice that we prefix our exposed functions with our library name to avoid potential collision with other libraries.

First, the code, analysis will follow:

use serde_json;
use serde_json::value::Value;

/// Read some *ffi-owned* JSON, do some processing and return a *rust-owned* string.
/// The FFI caller will need to call `mylib_free_string` on the returned pointer.
#[no_mangle]
pub extern fn mylib_myfunc_str(raw: *const std::os::raw::c_char) -> *const i8 {
    // *Copy* input to a rust-owned string
    if raw.is_null() { return std::ptr::null(); }
    let bytes = unsafe { std::ffi::CStr::from_ptr(raw) };

    // Internal processing
    let res = String::from_utf8(bytes.to_bytes().to_vec())
                .map_err(|e| format!("Encoding error: {}", e))
                .and_then(|req| myfunc(&req));

    // Formatting a response
    let rv = match serde_json::to_string(&res) {
        Ok(json) => json,
        // "rv" must be valid JSON, so we don't try including the error message
        Err(_)   => String::from("{\"Err\":\"JSON encode error\"}"),
    };

    // Return a *python-owned* string
    return match std::ffi::CString::new(rv) {
        Ok(cstr) => cstr.into_raw(),
        Err(_)   => std::ptr::null(),
    }
}

fn myfunc(request: &str) -> Result<Value, String> {
    let req: Value = serde_json::from_str(&request)
                        .map_err(|e| format!("JSON Parse error: {}", e))?;

    // Do whatever we like with the Value.
    if let Some(Value::String(val)) = req.get("plugh") {
        return Ok(Value::from(format!("plugh has length {}", val.len())));
    } else {
        return Err(String::from("plugh not present or not valid"));
    }
}

/// FFI users who receive a returned string from us MUST call this function
/// to free that string.
#[no_mangle]
pub extern fn mylib_free_string(raw: *mut std::os::raw::c_char) {
    unsafe { let _ = std::ffi::CString::from_raw(raw); }
}

We tag our exposed functions with #[no_mangle] and extern and they accept and return unmanaged pointers (*const XXX and *mut XXX).

mylib_myfunc_str

    if raw.is_null() { return std::ptr::null(); }
    let bytes = unsafe { std::ffi::CStr::from_ptr(raw) };

It is possible that we received a null pointer, so we check for that first. C/FFI deals in null-terminated strings. Null-termination isn’t allowed by Rust where strings are always paired with a length. CStr::from_ptr will scan the memory pointed to by the pointer and get the length of the string. This sort of memory scanning is unsafe, but produces a rust-safe byte string.

    // res is a Result<Value, String>
    let res = String::from_utf8(bytes.to_bytes().to_vec())
                .map_err(|e| format!("Encoding error: {}", e))
                .and_then(|req| myfunc(&req));

Decode our UTF-8 bytes into a String. On error, create an Err(String) with an appropriate error message. On successful decode, pass the resulting String to our function which will produce a Result<Value, String>.

    let rv = match serde_json::to_string(&res) {
        Ok(json) => json,
        // "rv" must be valid JSON, so we don't try including the error message
        Err(_)   => String::from("{\"Err\":\"JSON encode error\"}"),
    };

Here we take advantage of the fact that serde_json turns Enums into objects with a single key, the enum option name. Thus, if res is Ok(STUFF), serde will produce {"Ok":STUFF} and if res is Err("MESSAGE"), serde will produce {"Err":"MESSAGE"}. This is a reasonably convenient structure to pass around so I see no reason to unwrap any values.

The only difficult case is if the JSON encoding fails, in which case we don’t have any reasonable way to produce valid JSON so we hard-code a minimal response. We now have a String response in rv.

    return match std::ffi::CString::new(rv) {
        Ok(cstr) => cstr.into_raw(),
        Err(_)   => std::ptr::null(),
    }

We now turn our result string into a null-terminated string (Rust verifies that there are no nulls embedded in the string itself), and finally, the very important .into_raw() removes Rust ownership of the string so that it doesn’t get reclaimed as soon as the function returns. We now have a potential memory leak, yay!

myfunc

I’ll skip over myfunc(). It is a plain Rust function and can contain whatever business logic you want or call any Rust functions or libraries that are appropriate. This minimal example just looks for a “plugh” entry and returns its length, or else an error message.

mylib_free_string

pub extern fn mylib_free_string(raw: *mut std::os::raw::c_char) {
    unsafe { let _ = std::ffi::CString::from_raw(raw); }
}

This is the function that will close our potential memory leak. It receives the pointer we produced in mylib_myfunc_str, reclaims ownership of its contents using the null termination, and then immediately drops it by failing to assign it to a variable, freeing the memory. We just have to ensure this function is called from our scripts.

Calling from Python

There are a few FFI libraries for Python. I’m using cffi. First we import cffi and declare our exported functions. This is standard stuff straight from the cffi documentation.

#!/usr/bin/python3
# -*- coding: utf-8 -*-
# SPDX-License-Identifier: MIT
import json

from cffi import FFI
ffi = FFI()

import platform
if 'Windows' == platform.system():
    libmylib = ffi.dlopen('./target/release/libmylib.dll')
else:
    libmylib = ffi.dlopen('./target/release/libmylib.so')

ffi.cdef('''
void mylib_free_string(const char *n);
char* mylib_myfunc_str(const char *n);
''')

For safety, we wrap the rust functions in a python function so we have a nice pythonic interface and so we can ensure that we don’t accidentally leak memory.

def myfunc(req):
    pystr = json.dumps(req).encode("UTF-8")
    rstr = ffi.NULL
    try:
        rstr = libmylib.mylib_myfunc_str(pystr)
        if rstr == ffi.NULL:
            return None
        return json.loads(ffi.string(rstr).decode('UTF-8'))
    finally:
        if rstr != ffi.NULL:
            libmylib.mylib_free_string(rstr)
    return None

We take our arguments as a python dictionary and encode it to a python-owned JSON string.
We call our rust function which returns a rust-owned ffi-c-string.
In a bit of a busy line, we: copy the data to a python string, decode, then parse the JSON.
Using try: ... finally: we can ensure that the rust-owned string is freed even if one of the other commands raises an exception.

Finally, we can use our function from python, we pass in a plain dictionary and receive a plain dictionary back. The resulting dictionary conveniently tells us whether the call was a success or failure.

# { "Ok": "plugh has length 13" }
res = myfunc({ "plugh": "A test string" })

# { "Err": "plugh not present or not valid" }
res = myfunc({ "foo": "A test string" })

One could also unwrap the “Ok” or “Err” keys within the myfunc() python function turning the Err string into an Exception if we preferred that.

Calling from Perl

There are a few FFI libraries for Perl. I’m using FFI::Platypus. First we import the module and declare our exported functions. This is standard stuff straight from the documentation.

#!/usr/bin/perl
use strict; use warnings; use 5.020;
use JSON;

use FFI::Platypus;
my $FFI = FFI::Platypus->new(api => 1);
my $mylib_so = ($^O eq "MSWin32") ? "libmylib.dll" : "libmylib.so";
$FFI->lib("target/release/$mylib_so");
$FFI->attach(mylib_myfunc_str => ['string'] => 'opaque');
$FFI->attach(mylib_free_string => ['opaque'] => 'void');

For safety, we wrap the rust functions in a perl sub so we have a nice interface and so we can ensure that we don’t accidentally leak memory.

sub myfunc {
    my $json = encode_json($_[0]);
    my ($ptr, $str) = eval {
        my $p = mylib_myfunc_str($json);
        ($p, $FFI->cast('opaque' => 'string', $p))
    };
    mylib_free_string($ptr) if $ptr;
    die if $@;
    return( $str ? decode_json($str) : undef );
}

We take our arguments as a perl hash and encode it to a perl-owned JSON string.
We call our rust function which returns a rust-owned ffi-c-string.
We copy string pointer contents to a perl string.
Using the eval { ... } block we ensure that the rust-owned string is freed even if one of the other commands dies.

Finally, we can use our function from perl, we pass in a plain hash and receive a plain hash back. The resulting hash conveniently tells us whether the call was a success or failure.

# { "Ok" => "plugh has length 13" }
my $res = myfunc({ "plugh" => "A test string" });

# { "Err" => "plugh not present or not valid" }
my $res = myfunc({ "foo" => "A test string" });

One could also unwrap the “Ok” or “Err” keys within the myfunc() perl function, dying if the Err key is present if we preferred that.

Code is available at https://github.com/duelafn/blog-code/tree/main/2020/python-rust-string-ffi