Design guidelines for thin Python wrappers to C++ libraries

Background

I recently ended up writing and rewriting a series of libraries to essentially get them to a point where I could write bindings to them. In doing so, some thoughts on design have evolved into this post.

False starts

Originally, I had the library and bindings in one repository, which was easier to hack on, and had fewer moving parts. I rashly assumed that using branches would provide a modicum of reproducibility. As I quickly found though, this got out of hand when the code needed to be refactored to suit the bindings and wasn’t always backported to main in a timely fashion

Setup Preliminaries

We will be as general in our suggestions as is humanely possible, but when we need to reach for specific examples, they will be in context of the potlib project (library, bindings), which is small and innocuous enough to grok hopefully.

Tools

For most new projects, I use pybind11 1 with meson.

  • pdm is used for dependencies / scripting / environments
  • mesonpy is the backend
  • cibuildwheel is used for actually getting the wheels for pypi
    • This is a bit of a nightmare, need to use repology and fiddle around with figuring out which version of each OS is used by each specification
    • Any kind of system based package management was exceedingly slow, so sticking to pip and meson managed dependencies is the safest bet

Library Layout

Taking potlib as a concrete example, my current favored layout looks like this:

 1CppCore
 2├── examples
 3│   ├── calling_cuh2.cc
 4│   └── calling_ljpot.cc
 5├── gtests
 6│   └── CuH2PotTest.cc
 7├── meson.build
 8└── src
 9    ├── base_types.cc
10    ├── base_types.hpp
11    ├── CuH2
12    ├── helpers.cc
13    ├── helpers.hpp
14    ├── LennardJones
15    ├── Potential.cc
16    ├── Potential.hpp
17    └── pot_types.hpp

Linking to the library

Git submodules are generally the first thing to reach for when working with repositories which depend on each other, but I found meson subprojects to be less finicky if the library and bindings are all compatible with meson. The documentation omits setting up a .wrap file for a project, so consider the following (from pypotlib):

1[wrap-git]
2directory=potlib
3url=https://github.com/TheochemUI/potlib.git
4revision=5d029b9
5
6[provide]
7potlib=git

Which can be placed in subprojects/potlib.wrap of the bindings repo and should be committed.

Miscellaneous meson points

Personally I gravitate towards having a few standard default variables defined at the top level:

1_args = [] # Extra arguments
2_deps = [] # Dependencies
3_linkto = [] # All the sub-libraries
4_incdirs = [] # All the includes

Along with the clang and gcc on Unix related dependencies:

1# Dependencies
2# libm for Unix systems
3m_dep = cppc.find_library('m', required: false)
4_deps += m_dep
5# For clang
6_deps += [declare_dependency(link_args: '-lstdc++')]

This makes it easier to pull variables from subprojects in a seamless manner: Linking to the library itself is best done by declaring a dependency 2:

1# ---------------------- Library Dependencies
2potlib_proj = subproject('potlib', version: '0.1')
3# optionally use:
4                        # default_options: ['default_library=static'])
5potlib_dep = declare_dependency(link_with: potlib_proj.get_variable('_linkto'),
6                               dependencies: potlib_proj.get_variable('_deps'))
7_deps += [ potlib_dep ]

Setting up targets is then also pretty straightforward:

1blahlib = library('blah',
2                 src_files,
3                 dependencies: _deps,
4                 cpp_args: _args,
5                 link_with: _linkto,
6                 install: true)

Binding Conventions

  • Always use clang-format.
  • All strings must be explicitly identified ""s
    • not for doc-strings in pybind11

General

  • Always use namespace py = pybind11;
  • For Blah.cpp the binding will be pybinds/py_blah.cc
  • Most header content must be in pybinds/py_wrapper.hpp
    • The exceptions are base classes like py_potential.hpp since these need to be imported by child files like potentials/py_morse.cc
    • If a header is supplied, then the corresponding .cc should only reference the header, which in turn should declare other dependencies.
      • This only applies to files at the same tree/folder level
  • Every method must have:
    • arguments defined
    • documentation string

Example:

1.def("setCell", &Matter::setCell, "Sets the cell dimensions"s, py::arg("AtomMatrix newCell"))

Naming

Naming is hard. To simplify long filenames and deep hierarchies, the following conventions are established:

  • Within objectivefunctions, ObjectiveFunction can be replaced by objfunc
    • So MatterObjectiveFunction is bound in py_matterobjfunc.cc

Class boilerplate

 1void py_objectName(py::module_ &m) {
 2    py::class_<ObjectName>(m, "ObjectName", py::dynamic_attr()) // dynamic incurs a penalty
 3    /*
 4    ** Constructors
 5    */
 6
 7    /*
 8    ** Operators
 9    */
10
11    /*
12    ** Methods
13    */
14
15    /*
16    ** Parameters
17    */
18
19    /*
20    ** Python helpers
21    */
22
23    .def("__repr__", [](const ObjectName &a) { return "<ObjectName object>"; });
24}

Function arguments

  • First one must be py::arg()
  • Subsequent arguments may be ""_a
  • When there is a non-obvious type used in the arguments, document it
    • e.g. AtomMatrix and VectorXd
    • Use _ for spaces, e.g. AtomMatrix_pos for arguments (AtomMatrix pos)
  • When using overloads, always note the actual arguments as comments
1py::overload_cast<long /*nAtoms*/,
2                  AtomMatrix /*positions*/,
3                  VectorXi /*atomicNrs*/,
4                  double * /*energy*/,
5                  Matrix3d /*box*/,
6                  int /*nImages*/
7                  >(&Potential::force),

Structuring Bindings

Although a single file can be used for the pybind11 bindings, I prefer a more structured approach, consider the layout below:

 1├── pyb11_srcs
 2│   ├── CuH2
 3│   │   ├── py_cuh2pot.cc
 4│   │   └── py_cuh2pot.hpp
 5│   ├── LennardJones
 6│   │   ├── py_ljpot.cc
 7│   │   └── py_ljpot.hpp
 8│   ├── py_potential.cc
 9│   ├── py_potential.hpp
10│   ├── py_pottypes.cc
11│   ├── py_wrapper.cc
12│   └── py_wrapper.hpp

Which mirrors the library design.

  • Add to py_wrapper.hpp
1PYBIND11_MODULE(eonclient, m) {
2    ...
3    py_newthing(m);
4}
  • Make py_newthing.cc
1// clang-format off
2#include "py_wrapper.hpp"
3// Binding code (unless in py_wrapper)
4#include "../newthing.h"
5// clang-format on
6
7void py_newthing(py::module_ &m) {
8    /* Wrapping details here */
9    }

For a concrete example consider this pypotlib commit.

Conclusions

So far this covered creating a single compiled extension for thin bindings to an existing c++ library. Using PRs and submodules turned out to be insufficient to maintain code quality and I ended up shifting to a subproject based workflow. Most of these evolved during my work on pypotlib, and a newer version of eON, the long timescale dynamics code, which isn’t (as of April 2024) public yet3. A follow up post will cover adding pure python code which can be used for generating a more pythonic interface and also the requirements for distribution via PyPI.


  1. nanobind is interesting but lacks Eigen support, and most of the codes I interface to / from use Eigen arrays ↩︎

  2. This was a bit of a gotcha, since adding variables to the project’s _deps and _linkto didn’t work ↩︎

  3. Academia is rather odd about open source development before publication, reviewers almost never bother to even run software, let alone check authorship / provenance ↩︎