Exploring meson for interfacing fortran and python via f2py and standard techniques, with benchmarks.

Background

A recent post gauging community interest in f2py brought to light (among other aspects) the fact that the build systems of f2py are rather opaque to the end user. There are good reasons for this, since many of the tools discussed in this post were not around / in any shape to be used during the active development of f2py1. Nevertheless, build systems have come a long way from the early 2000s and the next two decades of f2py, will certainly involve less dependence on the soon-to-be-gone numpy.distutils approach to building extension modules.

The Fibonacci example

Though rather contrived and overused2, for the rest of this post I will assume the following Fortran program, in a variety of guises.

 1C FILE: FIB1.F
 2      SUBROUTINE FIB(A,N)
 3C
 4C     CALCULATE FIRST N FIBONACCI NUMBERS
 5C
 6      INTEGER N
 7      REAL*8 A(N)
 8      DO I=1,N
 9         IF (I.EQ.1) THEN
10            A(I) = 0.0D0
11         ELSEIF (I.EQ.2) THEN
12            A(I) = 1.0D0
13         ELSE
14            A(I) = A(I-1) + A(I-2)
15         ENDIF
16      ENDDO
17      END
18C END FILE FIB1.F

Keeping in mind the fact that the fixed form syntax is deprecated, we will also consider the more standard compliant and modern form:

 1module fib1
 2  implicit none
 3  contains
 4    subroutine fib(a,n)
 5      integer, intent(in) :: n
 6      integer :: i
 7      real(8) :: a(n)
 8      do i=1, n
 9         if (i==1) then
10            a(i) = 0.0d0
11         else if (i==2) then
12            a(i) = 1.0d0
13        else
14            a(i) = a(i-1) + a(i-2)
15         end if
16      end do
17      end subroutine
18end module fib1

Note that this program in its current form is not really an executable, as it is essentially a subprogram; and needs to be coupled to a main program to work from fortran. This does not concern us really since we will be primarily using python to drive user interaction. However, for completeness (gfortran main.f fib1.f):

1PROGRAM MAIN
2INTEGER, PARAMETER :: N=7
3REAL(8) :: A(N)
4CALL FIB(A,N)
5PRINT*, A
6END PROGRAM

Along with the non-scream-case version (gfortran main.f90 fib1.f90):

1program main
2  use fib1, only: fib
3  implicit none
4  integer, parameter :: n = 7
5  real(8) :: a(n)
6  call fib(a,n)
7  print*, a
8end program main

NumPy Distutils

It would be remiss to not cover the most basic example of working with this. We can compile and try this out with the following:

1f2py -m fib -c fib1.f
2python -c "import fib; import numpy as np; a=np.zeros(7); fib.fib(a); print(a); exit();"

This is good enough, and there are standard extensions to expose more information to the binding, detailed in the documentation. That, however isn’t the point. The -c flag, convenient in many ways though it is, is a proxy for a rather more involved set of files.

 1mkdir blah
 2f2py -m fib -c fib1.f --build-dir blah
 3tree blah
 4blah
 5├── blah
 6│   └── src.macosx-10.9-x86_64-3.9
 7│       ├── blah
 8│       │   └── src.macosx-10.9-x86_64-3.9
 9│       │       ├── fortranobject.o
10│       │       └── fortranobject.o.d
11│       ├── fibmodule.o
12│       └── fibmodule.o.d
13├── fib1.o
14└── src.macosx-10.9-x86_64-3.9
15    ├── blah
16    │   └── src.macosx-10.9-x86_64-3.9
17    │       ├── fortranobject.c
18    │       └── fortranobject.h
19    └── fibmodule.c
20
217 directories, 8 files

Indeed, the standard usage is to use a setup.py:

1from numpy.distutils.core import Extension, setup
2fibby = Extension(name = 'fib',
3                  sources = ['fib1.f'])
4if __name__ == "__main__":
5    setup(name = 'fib', ext_modules = [ fibby ])

Which can then be built simply with:

1python setup.py build
2ag -g .so
3# build/lib.macosx-10.9-x86_64-3.9/fib.cpython-39-darwin.so

However this does not scale very well with more general build systems, though a custom command approach for CMake and scikit-build is a nice workaround by Nick Wogan.

Meson and F2PY

As scipy moves towards meson mainly driven by Ralf Gommers, it makes sense to look into its use to drive the compilation of an extension module as well. We start by generating the files for the interface.

 1f2py fib1.f -m fib
 2Reading fortran codes...
 3      Reading file 'fib1.f' (format:fix,strict)
 4Post-processing...
 5      Block: fib
 6                      Block: FIB
 7Post-processing (stage 2)...
 8Building modules...
 9      Building module "fib"...
10              Constructing wrapper function "FIB"...
11                FIB(A,[N])
12      Wrote C/API module "fib" to file "./fibmodule.c"

Before we can compile this with meson we need to make a few changes:

  • Replace FIB with fib in fibmodule.c (since C is case-sensitive)
  • Copy f2py’s Fortran-C-Python translation unit and header files fortranobject.{c,h}

With these in place, we can now setup our meson.build.

 1project('test_builds', 'c',
 2  version : '0.1',
 3  default_options : ['warning_level=3'])
 4
 5add_languages('fortran')
 6
 7# https://mesonbuild.com/Python-module.html
 8# requires atleast 0.46.0
 9py_mod = import('python')
10py3 = py_mod.find_installation()
11py3_dep = py3.dependency()
12
13incdir_numpy = run_command(py3,
14  ['-c', 'import os; os.chdir(".."); import numpy; print(numpy.get_include())'],
15  check : true
16).stdout().strip()
17
18inc_np = include_directories(incdir_numpy)
19
20py3.extension_module('fib1',
21           'fib1.f',
22           'fib1module.c',
23           'fortranobject.c',
24           include_directories: inc_np,
25           dependencies : py3_dep,
26           install : true)

Now we’re all set.

1meson setup builddir
2meson compile -C builddir
3cd builddir
4python -c "import fib1; import numpy as np; a=np.empty(7); fib1.fib(a); print(a); exit();"

This makes for a far more pleasant experience, since meson works well with larger projects as well. However, note the sizes (and complexities) of the C portions.

1wc -l fortranobject.c fortranobject.h fibmodule.c
2    1107 fortranobject.c
3     132 fortranobject.h
4     372 fibmodule.c
5    1611 total

Not something we’d probably like to mess around with. Standardization reduces the burden of user-code, as we will see in the next section.

Standard Bindings

The standard compliant approach was aptly covered almost a decade ago by Ondřej Čertík and codified in his fortran90 best practices guide. We start by updating our free-form code to be compliant with newer ISO_C_BINDING guidelines.

 1module fib1
 2  use iso_c_binding
 3  implicit none
 4  contains
 5    subroutine fib(a,n) bind(c,name='c_fib')
 6      integer(c_int), intent(in), value :: n
 7      integer(c_int) :: i
 8      real(c_double) :: a(n)
 9      do i=1, n
10         if (i==1) then
11            a(i) = 0.0d0
12         else if (i==2) then
13            a(i) = 1.0d0
14        else
15            a(i) = a(i-1) + a(i-2)
16         end if
17      end do
18      end subroutine
19end module fib1

The key differences are the c_type declarations and the bind(c,name='blah') attribute. Also, to reduce the number of pointers for easy to copy types, we used value; which causes behavior similar to the pass-by-value paradigm. With this, the a C driver can be as simple as:

 1/* fibby.c */
 2#include<stdlib.h>
 3#include<stdio.h>
 4
 5void c_fib(double *a, int n);
 6
 7int main(int argc, char* argv[argc+1]) {
 8    puts("Fortran fib from C:");
 9    int n=7;
10    double a[n];
11    c_fib(&a, n);
12    for (int i=0; i<n; i++) {
13        printf("%f ",a[i]);
14    };
15    return EXIT_SUCCESS;
16}

This can now be built via:

1gfortran -c fib1.f90
2gcc fibby.c fib1.o

ctypes Wrapper

To wrap this over to python we have a few options. We can omit main altogether and rely on ctypes. So with:

1/* cfib.c */
2void c_fib(double *a, int n);

Followed by:

1gfortran -c fib1.f90
2gcc -fPIC -shared -o libfib.so cfib.c fib1.o

We can work with this in python as:

1from ctypes import CDLL, POINTER, c_int, c_double
2import numpy as np
3fibby = CDLL("libfib.so")
4a = np.zeros(7)
5fibby.c_fib(a.ctypes.data_as(POINTER(c_double)), c_int(7))
6print(a)

Which works in the same exact way as the previous approaches.

We have gone from over a \(1000\) lines of C to one line3

Cython wrapper

However, we can skip even the one line of C, by using cython.

 1# fibby.pyx
 2from numpy cimport ndarray
 3import numpy as np
 4
 5cdef extern:
 6    void c_fib(double *a, int n)
 7
 8def fib(double a, int n):
 9    cdef ndarray[double, mode="c"] ret = np.zeros(n)
10    c_fib(&ret,n)
11    return ret

This is closer to f2py in the sense that the C code is simply hidden. It requires a compilation step typically done via setuptools.py, but, this is much nicer with meson since from 0.59.0, Cython is supported.

 1project('test_builds', 'c',
 2  version : '0.1',
 3  default_options : ['warning_level=3'])
 4
 5add_languages('fortran', 'cython')
 6
 7py_mod = import('python')
 8py3 = py_mod.find_installation()
 9py3_dep = py3.dependency()
10
11py3.extension_module('fibby',
12  'fibby.pyx',
13  dependencies : py3_dep
14)

As before.

1meson setup bdircython
2meson compile -C bdircython
3cd bdircython
4python -c "import fibby; import numpy as np; a=np.empty(7); b=fibby.fib(a); print(b); exit();"

Note the difference in the calling semantics here. Also, recall that .pyx files are actually used to generate C code.

1cython fibby.pyx
2wc -l fibby.c
3    6376 fibby.c

This is vastly more code compared to the f2py generated wrapper code; which seems pretty unattractive.

Python-C API

Neither ctypes nor cpython is as friendly/clean as loading an extension module, so it makes more sense to actually write a proper Python-C extension module for a more fair comparison. This however, involves quite some reference counting and familiarity with the NumPy-C API. Actually it is not too bad for the function we are considering. Even a well commented and naive hand crafted version clocks in below 90 lines of code.

 1#ifndef PY_SSIZE_T_CLEAN
 2#define PY_SSIZE_T_CLEAN
 3#endif /* PY_SSIZE_T_CLEAN */
 4
 5#include "Python.h"
 6#include "numpy/ndarrayobject.h"
 7#include "numpy/ufuncobject.h"
 8
 9static PyMethodDef FibbyMethods[] = {
10        {NULL, NULL, 0, NULL}
11};
12
13void c_fib(double *a, int n);
14
15/* The loop definition must precede the PyMODINIT_FUNC. */
16
17static void double_fib(char **args, npy_intp *dimensions,
18                            npy_intp* steps, void* data)
19{
20    int i; // Standard integer is fine here
21    npy_intp n = dimensions[0];
22    char *in = args[0], *out = args[1];
23    npy_intp in_step = steps[0], out_step = steps[1];
24
25    double apointer[n];
26
27    for (i = 0; i < n; i++) {
28        apointer[i]=(double)in[i];
29    }
30
31    // Call the Fortran function
32    c_fib(apointer, n);
33
34    for (i = 0; i < n; i++) {
35        /*BEGIN main ufunc computation*/
36        *((double *)out) = apointer[i];
37        /*END main ufunc computation*/
38
39        in += in_step;
40        out += out_step;
41    }
42}
43
44/*This a pointer to the above function*/
45PyUFuncGenericFunction funcs[1] = {&double_fib};
46
47/* These are the input and return dtypes of fib.*/
48static char types[2] = {NPY_DOUBLE, NPY_DOUBLE};
49
50static void *data[1] = {NULL};
51
52static struct PyModuleDef moduledef = {
53    PyModuleDef_HEAD_INIT,
54    "fibby", /* module name */
55    NULL, /* module docstring */
56    -1, /* -1 means global state, no sub-interpreters */
57    FibbyMethods, /* Module level functions */
58    NULL, /* single phase initialization, NULL slots*/
59    NULL, /* traversal function for GC */
60    NULL, /* clear function for the GC */
61    NULL /* deallocation function for the GC */
62};
63
64PyMODINIT_FUNC PyInit_fibby(void)
65{
66    PyObject *m, *fib, *d;
67    m = PyModule_Create(&moduledef);
68    if (!m) {
69        return NULL;
70    }
71
72    import_array();
73    import_umath();
74
75    fib = PyUFunc_FromFuncAndData(funcs, data, types, 1, 1, 1,
76                                    PyUFunc_None, "fib",
77                                    "Calls the fib.f90 Fibonacci subroutine", 0);
78
79    d = PyModule_GetDict(m);
80
81    PyDict_SetItemString(d, "fib", fib);
82    Py_DECREF(fib);
83
84    return m;
85}

Compiling this with meson is a snap.

 1project('test_builds', 'c',
 2  version : '0.1',
 3  default_options : ['warning_level=3'])
 4
 5add_languages('fortran', 'cython')
 6
 7py_mod = import('python')
 8py3 = py_mod.find_installation()
 9py3_dep = py3.dependency()
10
11incdir_numpy = run_command(py3,
12  ['-c', 'import os; os.chdir(".."); import numpy; print(numpy.get_include())'],
13  check : true
14).stdout().strip()
15
16inc_np = include_directories(incdir_numpy)
17
18py3.extension_module('fibby',
19  'fib1.f90',
20  'fibbyhand.c',
21  include_directories: inc_np,
22  dependencies : py3_dep
23)

We can compile this much the same as the others.

1meson setup bdircythonhand
2meson compile -C bdircythonhand
3cd bdircythonhand
4python -c "import fibby; import numpy as np; a=np.empty(7); b=fibby.fib(a); print(b); exit();"

This has the lowest lines of code too.

1wc -l fibbyhand.c
2      85 fibbyhand.c

It is rather remarkable since the C code here is not optimized in the least. The twin for loops can almost certainly be handled more efficiently, and also from the point of view of the Fortran-C interface too, this is lacking. Nevertheless, it will suffice for our purpose here, which has more to do with wrapping automatically (and therefore rather naively) than generating optimal code (but that would be great).

Benchmarks

We can use hyperfine for basic benchmarks.

1hyperfine 'python -c "import fibby; import numpy as np; a=np.empty(7); b=fibby.fib(a); print(b); exit();"' --warmup 5 -s full -r 25
2Benchmark #1: python -c "import fibby; import numpy as np; a=np.empty(7); b=fibby.fib(a); print(b); exit();"
3  Time (mean ± σ):     155.4 ms ±   9.7 ms    [User: 133.1 ms, System: 40.2 ms]
4  Range (min … max):   145.6 ms … 190.4 ms    25 runs

Results for the various approaches are collected below in Table tbl:bench.

Table 1: The results are for 25 runs with 5 warmup for np.empty(7) as the input
CommandMean [ms]Min [ms]Max [ms]
Handwritten NumPy-C-Fortran126.0 ± 3.9119.8136.8
F2PY (F77)129.1 ± 4.0125.1140.4
Cython129.5 ± 6.8121.4149.1
F2PY (F90)129.9 ± 5.1123.9145.8
ctypes128.3 ± 7.8122.7159.8

As expected, ctypes is the slowest, since it requires the most massaging of inputs on the Python side. Surprisingly, loading the free form file with f2py was quite a bit slower than the f77 file. Given the complexity of the files involved, it is actually surprising that the handwritten wrapper shows a rather clear gain in performance; given its implementation; but nonetheless it supports the hypothesis that advances in the NumPy-C API and the Fortran-C interoperability has improved significantly over time.

Conclusions

f2py, like Fortran, is here to stay. As the standards strive towards better compliance with C however, it also gets larger and rather more complicated 4. One thing which was not covered here was the actual ISO_Fortran_binding.h5 and other modifications to the C code while interpolating with Fortran itself.

To hold its place in the larger ecosystem, f2py will be essential in protecting the average user from the standard itself, automatically providing optimal bindings for users as initially planned by Pearu Peterson. Modernization efforts with Melissa Weber Mendonça, Ralf, Pearu, and others at Quansight (including the rest of the community and me) are sure to gravitate towards generating better, more reliable interfaces to bridge the gaps between Fortran and Python as supported by the NumPy core developers6. Clearly, the way forward is to work in lockstep with the NumPy-C API and recent Fortran standards.


  1. Make, autotools and the like are definitely worse than np.distutils ↩︎

  2. It is the first example from the canonical reference ↩︎

  3. This is a bit unfair, the complexity has been offloaded to the compiler, and they’re notorious for not supporting newer Fortran standards ↩︎

  4. Though it is and will remain far more svelte compared to C++ ↩︎

  5. Some examples from the community are here ↩︎

  6. Including a special shout out to Charles Harris for reviewing unreasonably large DOC PRs ↩︎