6 minutes
Written: 2023-06-28 11:27 +0000
Updated: 2024-08-06 00:53 +0000
CPP Library Blueprints for Python Bindings
Design guidelines for thin Python wrappers to C++ libraries
Background
I recently ended up writing and rewriting a series of libraries to essentially get them to a point where I could write bindings to them. In doing so, some thoughts on design have evolved into this post.
False starts
Originally, I had the library and bindings in one repository, which was easier
to hack on, and had fewer moving parts. I rashly assumed that using branches
would provide a modicum of reproducibility. As I quickly found though, this got
out of hand when the code needed to be refactored to suit the bindings and
wasn’t always backported to main
in a timely fashion
Setup Preliminaries
We will be as general in our suggestions as is humanely possible, but when we
need to reach for specific examples, they will be in context of the potlib
project (library, bindings), which is small and innocuous enough to grok
hopefully.
Tools
For most new projects, I use pybind11
1 with meson
.
pdm
is used for dependencies / scripting / environmentsmesonpy
is the backendcibuildwheel
is used for actually getting the wheels forpypi
- This is a bit of a nightmare, need to use repology and fiddle around with figuring out which version of each OS is used by each specification
- Any kind of system based package management was exceedingly slow, so
sticking to
pip
andmeson
managed dependencies is the safest bet
Library Layout
Taking potlib
as a concrete example, my current favored layout looks like this:
1CppCore
2├── examples
3│ ├── calling_cuh2.cc
4│ └── calling_ljpot.cc
5├── gtests
6│ └── CuH2PotTest.cc
7├── meson.build
8└── src
9 ├── base_types.cc
10 ├── base_types.hpp
11 ├── CuH2
12 ├── helpers.cc
13 ├── helpers.hpp
14 ├── LennardJones
15 ├── Potential.cc
16 ├── Potential.hpp
17 └── pot_types.hpp
Linking to the library
Git submodules are generally the first thing to reach for when working with
repositories which depend on each other, but I found meson
subprojects to be
less finicky if the library and bindings are all compatible with meson
. The
documentation omits setting up a .wrap
file for a project, so consider the
following (from pypotlib
):
1[wrap-git]
2directory=potlib
3url=https://github.com/TheochemUI/potlib.git
4revision=5d029b9
5
6[provide]
7potlib=git
Which can be placed in subprojects/potlib.wrap
of the bindings repo and should be committed.
Miscellaneous meson
points
Personally I gravitate towards having a few standard default variables defined at the top level:
1_args = [] # Extra arguments
2_deps = [] # Dependencies
3_linkto = [] # All the sub-libraries
4_incdirs = [] # All the includes
Along with the clang
and gcc
on Unix related dependencies:
1# Dependencies
2# libm for Unix systems
3m_dep = cppc.find_library('m', required: false)
4_deps += m_dep
5# For clang
6_deps += [declare_dependency(link_args: '-lstdc++')]
This makes it easier to pull variables from subprojects in a seamless manner: Linking to the library itself is best done by declaring a dependency 2:
1# ---------------------- Library Dependencies
2potlib_proj = subproject('potlib', version: '0.1')
3# optionally use:
4 # default_options: ['default_library=static'])
5potlib_dep = declare_dependency(link_with: potlib_proj.get_variable('_linkto'),
6 dependencies: potlib_proj.get_variable('_deps'))
7_deps += [ potlib_dep ]
Setting up targets is then also pretty straightforward:
1blahlib = library('blah',
2 src_files,
3 dependencies: _deps,
4 cpp_args: _args,
5 link_with: _linkto,
6 install: true)
Binding Conventions
- Always use
clang-format
. - All strings must be explicitly identified
""s
- not for doc-strings in
pybind11
- not for doc-strings in
General
- Always use
namespace py = pybind11;
- For
Blah.cpp
the binding will bepybinds/py_blah.cc
- Most header content must be in
pybinds/py_wrapper.hpp
- The exceptions are base classes like
py_potential.hpp
since these need to be imported by child files likepotentials/py_morse.cc
- If a header is supplied, then the corresponding
.cc
should only reference the header, which in turn should declare other dependencies.- This only applies to files at the same tree/folder level
- The exceptions are base classes like
- Every method must have:
- arguments defined
- documentation string
Example:
1.def("setCell", &Matter::setCell, "Sets the cell dimensions"s, py::arg("AtomMatrix newCell"))
Naming
Naming is hard. To simplify long filenames and deep hierarchies, the following conventions are established:
- Within
objectivefunctions
,ObjectiveFunction
can be replaced byobjfunc
- So
MatterObjectiveFunction
is bound inpy_matterobjfunc.cc
- So
Class boilerplate
1void py_objectName(py::module_ &m) {
2 py::class_<ObjectName>(m, "ObjectName", py::dynamic_attr()) // dynamic incurs a penalty
3 /*
4 ** Constructors
5 */
6
7 /*
8 ** Operators
9 */
10
11 /*
12 ** Methods
13 */
14
15 /*
16 ** Parameters
17 */
18
19 /*
20 ** Python helpers
21 */
22
23 .def("__repr__", [](const ObjectName &a) { return "<ObjectName object>"; });
24}
Function arguments
- First one must be
py::arg()
- Subsequent arguments may be
""_a
- When there is a non-obvious type used in the arguments, document it
- e.g.
AtomMatrix
andVectorXd
- Use
_
for spaces, e.g.AtomMatrix_pos
for arguments(AtomMatrix pos)
- e.g.
- When using overloads, always note the actual arguments as comments
1py::overload_cast<long /*nAtoms*/,
2 AtomMatrix /*positions*/,
3 VectorXi /*atomicNrs*/,
4 double * /*energy*/,
5 Matrix3d /*box*/,
6 int /*nImages*/
7 >(&Potential::force),
Structuring Bindings
Although a single file can be used for the pybind11
bindings, I prefer a more
structured approach, consider the layout below:
1├── pyb11_srcs
2│ ├── CuH2
3│ │ ├── py_cuh2pot.cc
4│ │ └── py_cuh2pot.hpp
5│ ├── LennardJones
6│ │ ├── py_ljpot.cc
7│ │ └── py_ljpot.hpp
8│ ├── py_potential.cc
9│ ├── py_potential.hpp
10│ ├── py_pottypes.cc
11│ ├── py_wrapper.cc
12│ └── py_wrapper.hpp
Which mirrors the library design.
- Add to
py_wrapper.hpp
1PYBIND11_MODULE(eonclient, m) {
2 ...
3 py_newthing(m);
4}
- Make
py_newthing.cc
1// clang-format off
2#include "py_wrapper.hpp"
3// Binding code (unless in py_wrapper)
4#include "../newthing.h"
5// clang-format on
6
7void py_newthing(py::module_ &m) {
8 /* Wrapping details here */
9 }
For a concrete example consider this pypotlib commit.
Conclusions
So far this covered creating a single compiled extension for thin bindings to an
existing c++
library. Using PRs and submodules turned out to be insufficient
to maintain code quality and I ended up shifting to a subproject based workflow.
Most of these evolved during my work on pypotlib, and a newer version of eON
,
the long timescale dynamics code, which isn’t (as of April 2024) public
yet3. A follow up post will cover adding pure python code which
can be used for generating a more pythonic interface and also the requirements
for distribution via PyPI.
nanobind
is interesting but lacksEigen
support, and most of the codes I interface to / from useEigen
arrays ↩︎This was a bit of a gotcha, since adding variables to the project’s
_deps
and_linkto
didn’t work ↩︎Academia is rather odd about open source development before publication, reviewers almost never bother to even run software, let alone check authorship / provenance ↩︎