Rust + Python | Perl FFI Strings
Sat, 26 Dec 2020
Intro
The internet is awash of examples of how to use Foreign Function Interface (FFI) to call functions which accept and return ints and floats, which are copy types. There seem to be fewer examples of passing strings back and forth over FFI. Passing (arbitrary length) strings requires managing the memory associated with those strings.
Goals:
-
Expose rust functions to Python or Perl which accept strings over the FFI.
-
Minimal dependencies.
-
Maximal safety, no crashing.
Limitations:
- The particular example I’m using (passing JSON-encoded data) is not necessarily good API design, and will not be fast if we need to make may calls to rust due to encoding and decoding of JSON data.
Assumptions:
- You know or are willing to learn enough rust to write a small shim layer.
Code is available at https://github.com/duelafn/blog-code/tree/main/2020/python-rust-string-ffi
In this post, I am passing JSON-encoded data to show a possible generalization, however, we could just as well be passing file names, binary-encoded data, … anything that looks like a string.
As I mention in the limitations, this probably isn’t a good way to design an API. It does, however, let us deal in complex data transfer without thinking too hard about the boundary between languages. It may be appropriate for exposing specialized functions that you only have to call a few times. It may be useful for heavy calculation as long as a single call from your scripting language will trigger a lot of calculation in rust.
Exposing Rust
We just have to add cdylib
to the crate-type
in Cargo.toml
. This will
cause cargo to produce a shared library (.so in Linux, or .dll in Windows)
which can be used by the FFI library.
[lib]
name = "mylib"
path = "src/lib.rs"
crate-type = ["cdylib"]
My examples also use serde_json
as a dependency since I am passing JSON
strings over the interface.
Creating a Rust shim
Since we’re not using any of the tools which automatically create shim layers, we have to do this ourselves. It’s not too hard, our main concerns are managing the memory used by the passed strings and handling errors in a nice way.
We’ll need two exposed functions.
-
mylib_myfunc_str
- The function we want to expose -
mylib_free_string
- Function to free the string returned bymylib_myfunc_str
Notice that we prefix our exposed functions with our library name to avoid potential collision with other libraries.
First, the code, analysis will follow:
use serde_json;
use serde_json::value::Value;
/// Read some *ffi-owned* JSON, do some processing and return a *rust-owned* string.
/// The FFI caller will need to call `mylib_free_string` on the returned pointer.
#[no_mangle]
pub extern fn mylib_myfunc_str(raw: *const std::os::raw::c_char) -> *const i8 {
// *Copy* input to a rust-owned string
if raw.is_null() { return std::ptr::null(); }
let bytes = unsafe { std::ffi::CStr::from_ptr(raw) };
// Internal processing
let res = String::from_utf8(bytes.to_bytes().to_vec())
.map_err(|e| format!("Encoding error: {}", e))
.and_then(|req| myfunc(&req));
// Formatting a response
let rv = match serde_json::to_string(&res) {
Ok(json) => json,
// "rv" must be valid JSON, so we don't try including the error message
Err(_) => String::from("{\"Err\":\"JSON encode error\"}"),
};
// Return a *python-owned* string
return match std::ffi::CString::new(rv) {
Ok(cstr) => cstr.into_raw(),
Err(_) => std::ptr::null(),
}
}
fn myfunc(request: &str) -> Result<Value, String> {
let req: Value = serde_json::from_str(&request)
.map_err(|e| format!("JSON Parse error: {}", e))?;
// Do whatever we like with the Value.
if let Some(Value::String(val)) = req.get("plugh") {
return Ok(Value::from(format!("plugh has length {}", val.len())));
} else {
return Err(String::from("plugh not present or not valid"));
}
}
/// FFI users who receive a returned string from us MUST call this function
/// to free that string.
#[no_mangle]
pub extern fn mylib_free_string(raw: *mut std::os::raw::c_char) {
unsafe { let _ = std::ffi::CString::from_raw(raw); }
}
We tag our exposed functions with #[no_mangle]
and extern
and they
accept and return unmanaged pointers (*const XXX
and *mut XXX
).
mylib_myfunc_str
if raw.is_null() { return std::ptr::null(); }
let bytes = unsafe { std::ffi::CStr::from_ptr(raw) };
It is possible that we received a null pointer, so we check for that first.
C/FFI deals in null-terminated strings. Null-termination isn’t allowed by
Rust where strings are always paired with a length. CStr::from_ptr
will
scan the memory pointed to by the pointer and get the length of the string.
This sort of memory scanning is unsafe, but produces a rust-safe byte string.
// res is a Result<Value, String>
let res = String::from_utf8(bytes.to_bytes().to_vec())
.map_err(|e| format!("Encoding error: {}", e))
.and_then(|req| myfunc(&req));
Decode our UTF-8 bytes into a String. On error, create an Err(String)
with an appropriate error message. On successful decode, pass the resulting
String
to our function which will produce a Result<Value, String>
.
let rv = match serde_json::to_string(&res) {
Ok(json) => json,
// "rv" must be valid JSON, so we don't try including the error message
Err(_) => String::from("{\"Err\":\"JSON encode error\"}"),
};
Here we take advantage of the fact that serde_json turns Enums into objects
with a single key, the enum option name. Thus, if res
is Ok(STUFF)
,
serde will produce {"Ok":STUFF}
and if res
is Err("MESSAGE")
, serde
will produce {"Err":"MESSAGE"}
. This is a reasonably convenient structure
to pass around so I see no reason to unwrap any values.
The only difficult case is if the JSON encoding fails, in which case we
don’t have any reasonable way to produce valid JSON so we hard-code a
minimal response. We now have a String
response in rv
.
return match std::ffi::CString::new(rv) {
Ok(cstr) => cstr.into_raw(),
Err(_) => std::ptr::null(),
}
We now turn our result string into a null-terminated string (Rust verifies
that there are no nulls embedded in the string itself), and finally, the
very important .into_raw()
removes Rust ownership of the string so that
it doesn’t get reclaimed as soon as the function returns. We now have a
potential memory leak, yay!
myfunc
I’ll skip over myfunc()
. It is a plain Rust function and can contain
whatever business logic you want or call any Rust functions or libraries
that are appropriate. This minimal example just looks for a “plugh” entry
and returns its length, or else an error message.
mylib_free_string
pub extern fn mylib_free_string(raw: *mut std::os::raw::c_char) {
unsafe { let _ = std::ffi::CString::from_raw(raw); }
}
This is the function that will close our potential memory leak. It receives
the pointer we produced in mylib_myfunc_str
, reclaims ownership of its
contents using the null termination, and then immediately drops it by
failing to assign it to a variable, freeing the memory. We just have to
ensure this function is called from our scripts.
Calling from Python
There are a few FFI libraries for Python. I’m using cffi
. First we import
cffi and declare our exported functions. This is standard stuff straight
from the cffi documentation.
#!/usr/bin/python3
# -*- coding: utf-8 -*-
# SPDX-License-Identifier: MIT
import json
from cffi import FFI
ffi = FFI()
import platform
if 'Windows' == platform.system():
libmylib = ffi.dlopen('./target/release/libmylib.dll')
else:
libmylib = ffi.dlopen('./target/release/libmylib.so')
ffi.cdef('''
void mylib_free_string(const char *n);
char* mylib_myfunc_str(const char *n);
''')
For safety, we wrap the rust functions in a python function so we have a nice pythonic interface and so we can ensure that we don’t accidentally leak memory.
def myfunc(req):
pystr = json.dumps(req).encode("UTF-8")
rstr = ffi.NULL
try:
rstr = libmylib.mylib_myfunc_str(pystr)
if rstr == ffi.NULL:
return None
return json.loads(ffi.string(rstr).decode('UTF-8'))
finally:
if rstr != ffi.NULL:
libmylib.mylib_free_string(rstr)
return None
-
We take our arguments as a python dictionary and encode it to a python-owned JSON string.
-
We call our rust function which returns a rust-owned ffi-c-string.
-
In a bit of a busy line, we: copy the data to a python string, decode, then parse the JSON.
-
Using
try: ... finally:
we can ensure that the rust-owned string is freed even if one of the other commands raises an exception.
Finally, we can use our function from python, we pass in a plain dictionary and receive a plain dictionary back. The resulting dictionary conveniently tells us whether the call was a success or failure.
# { "Ok": "plugh has length 13" }
res = myfunc({ "plugh": "A test string" })
# { "Err": "plugh not present or not valid" }
res = myfunc({ "foo": "A test string" })
One could also unwrap the “Ok” or “Err” keys within the myfunc()
python
function turning the Err string into an Exception
if we preferred that.
Calling from Perl
There are a few FFI libraries for Perl. I’m using FFI::Platypus
. First we
import the module and declare our exported functions. This is standard
stuff straight from the documentation.
#!/usr/bin/perl
use strict; use warnings; use 5.020;
use JSON;
use FFI::Platypus;
my $FFI = FFI::Platypus->new(api => 1);
my $mylib_so = ($^O eq "MSWin32") ? "libmylib.dll" : "libmylib.so";
$FFI->lib("target/release/$mylib_so");
$FFI->attach(mylib_myfunc_str => ['string'] => 'opaque');
$FFI->attach(mylib_free_string => ['opaque'] => 'void');
For safety, we wrap the rust functions in a perl sub so we have a nice interface and so we can ensure that we don’t accidentally leak memory.
sub myfunc {
my $json = encode_json($_[0]);
my ($ptr, $str) = eval {
my $p = mylib_myfunc_str($json);
($p, $FFI->cast('opaque' => 'string', $p))
};
mylib_free_string($ptr) if $ptr;
die if $@;
return( $str ? decode_json($str) : undef );
}
-
We take our arguments as a perl hash and encode it to a perl-owned JSON string.
-
We call our rust function which returns a rust-owned ffi-c-string.
-
We copy string pointer contents to a perl string.
-
Using the
eval { ... }
block we ensure that the rust-owned string is freed even if one of the other commands dies.
Finally, we can use our function from perl, we pass in a plain hash and receive a plain hash back. The resulting hash conveniently tells us whether the call was a success or failure.
# { "Ok" => "plugh has length 13" }
my $res = myfunc({ "plugh" => "A test string" });
# { "Err" => "plugh not present or not valid" }
my $res = myfunc({ "foo" => "A test string" });
One could also unwrap the “Ok” or “Err” keys within the myfunc()
perl
function, dying if the Err key is present if we preferred that.
Code is available at https://github.com/duelafn/blog-code/tree/main/2020/python-rust-string-ffi