Spack and PyTorch Development Workflows

This post is part of the HPC series.

pytorch local development environments without tears, root access or spack dev-build

Background

Build systems are a bit of a nightmare. I spend most of my time SSH’ed into more powerful CPU machines, and sometimes on machines with more exotic compute devices. micromamba is typically my poison of choice where nix is not an option.

However, micromamba doesn’t allow a whole lot in the way of setting up environments which do not use packages already on conda-forge. Spack fills a nice niche for such situations¹. Additionally, it can be coerced into use for local development and the same cannot easily be said of conda based workflows.

Humble Origins

We’ll start from scratch, grabbing spack and bootstrapping clingo for its dependency resolution as per the documentation.

1git clone -c feature.manyFiles=true https://github.com/spack/spack.git
2. spack/share/spack/setup-env.sh # Assumes zsh || bash
3spack spec zlib # Or anything really
4spack --bootstrap find # Should have clingo

Environments and Purity

Unlike nix, spack does not offer strong guarantees of purity. There are primarily two approaches to using environments in spack, somewhat analogously to the conda environment logic.

Named Environments: These are essentially equivalent to conda environments, and can be activated and deactivated at will. However, the folder structure in this case, along with the dependencies are localized within $SPACK_HOME/var/spack/environments/$ENV_NAME
Anonymous Environments: These can be setup inside any directory, and can be activated but not deactivated (despacktivate will not work). These are useful for development environments.

We will use both kinds of environments. Additionally, spack supports multiple variants which can be queried by spack info $PKG_NAME. These are used to support various build configurations while providing a unified interface build through spack install.

Basic Anonymous Environments

For starters we will need to setup a development environment.

1mkdir spackPyTorch
2cd spackPyTorch
3spack env create -d .

To ensure that the dependencies are resolved consistently, the concretization option to needs to be set to together. We will start by adding some packages.

1spack -e . add py-ipython
2spack -e . config edit # Make changes

For ease of manual customization, it is best to lightly edit the spack.yaml file to have each dependency on its own line. At this point a bare-minimum near-empty environment is ready.

1spack:
2  specs:
3    - py-ipython
4  view: true
5  concretization: together

To ensure dependencies are resolved simultaneously, concretize is set to together.

1spack -e . concretize # doesn't install
2spack -e . install
3spack env activate . # or spacktivate .
4# Can also use the fully qualified path instead of .

Recall that changes are propagated via:

1spack -e . concretize --force
2spack -e . install

This establishes baseline environments, but precludes setting up development workflows. Most packages can be included as described above, with the exception of compiler toolchains.

Compiler Setups

Setting up compiler toolchains like a fortran compiler (perhaps for openblas) can take a bit more effort. Although this discussion will focus on obtaining a fortran compiler it is equally applicable to updating a version. For spack, unlike many system package managers, gcc will install the C, C++ and Fortran toolchains. Thus:

1# outside the spack environment
2spack install gcc@11.2.0 # will take a while

Now we need to let spack register this compiler.

1spack compiler find $(spack location -i gcc@11.2.0)
2spack compilers # should now list GCC 11.2.0

Finally we can edit our spack.yml environment to reflect the new compiler toolchain.

1spack:
2  definitions:
3    - compilers: [gcc@11.2.0]
4  specs:
5    - py-ipython
6  view: true
7  concretization: together

One of the nicer parts of spack being an HPC environment management tool is that there is first-class support for proprietary compiler toolchains like Intel and families of compilers can also be specified with the %intel syntax as well. Much more fine-tuning is also possible including registering system compilers if required.

PyTorch Local Development

One of the caveats of local development with spack is that the base URL needs to be updated from within the local copy of spack. This means editing:

1vim $SPACK_ROOT/var/spack/repos/builtin/packages/py-torch/package.py # edit

The patch is minimally complicated, for a fork and a working branch like npeye it would look like this diff:

 1diff --git a/var/spack/repos/builtin/packages/py-torch/package.py b/var/spack/repos/builtin/packages/py-torch/package.py
 2index 8190e01102..c276009d10 100644
 3--- a/var/spack/repos/builtin/packages/py-torch/package.py
 4+++ b/var/spack/repos/builtin/packages/py-torch/package.py
 5@@ -14,7 +14,7 @@ class PyTorch(PythonPackage, CudaPackage):
 6     with strong GPU acceleration."""
 7
 8     homepage = "https://pytorch.org/"
 9-    git      = "https://github.com/pytorch/pytorch.git"
10+    git      = "https://github.com/HaoZeke/pytorch.git"
11
12     maintainers = ['adamjstewart']
13
14@@ -22,6 +22,7 @@ class PyTorch(PythonPackage, CudaPackage):
15     # core libraries to ensure that the package was successfully installed.
16     import_modules = ['torch', 'torch.autograd', 'torch.nn', 'torch.utils']
17
18+    version('npeye', branch='npeye', submodules=True)
19     version('master', branch='master', submodules=True)
20     version('1.11.0', tag='v1.11.0', submodules=True)
21     version('1.10.2', tag='v1.10.2', submodules=True)
22@@ -348,7 +349,8 @@ def enable_or_disable(variant, keyword='USE', var=None, newer=False):
23         elif '~onnx_ml' in self.spec:
24             env.set('ONNX_ML', 'OFF')
25
26-        if not self.spec.satisfies('@master'):
27+        if not (self.spec.satisfies('@master') or
28+                self.spec.satisfies('@npeye')):
29             env.set('PYTORCH_BUILD_VERSION', self.version)
30             env.set('PYTORCH_BUILD_NUMBER', 0)

Applying such a patch is straightforward.

1cd $SPACK_ROOT
2# Assuming it is named pytorchspack.diff
3git apply pytorchspack.diff

This modified environment can now be enabled for use with the appropriate variants (details found with spack info py-torch). However, there is one rather important variant missing for local development, DEBUG. The application of this patch will rectify this until an upstream PR is merged.

 1diff --git a/var/spack/repos/builtin/packages/py-torch/package.py b/var/spack/repos/builtin/packages/py-torch/package.py
 2index 8190e01102..d7b68ae4bd 100644
 3--- a/var/spack/repos/builtin/packages/py-torch/package.py
 4+++ b/var/spack/repos/builtin/packages/py-torch/package.py
 5@@ -49,6 +49,7 @@ class PyTorch(PythonPackage, CudaPackage):
 6
 7     # All options are defined in CMakeLists.txt.
 8     # Some are listed in setup.py, but not all.
 9+    variant('debug', default=False, description="Build with debugging support")
10     variant('caffe2', default=True, description='Build Caffe2', when='@1.7:')
11     variant('test', default=False, description='Build C++ test binaries')
12     variant('cuda', default=not is_darwin, description='Use CUDA')
13@@ -343,6 +344,12 @@ def enable_or_disable(variant, keyword='USE', var=None, newer=False):
14         enable_or_disable('gloo', newer=True)
15         enable_or_disable('tensorpipe')
16
17+        if '+debug' in self.spec:
18+            env.set('DEBUG', 1)
19+        elif '-debug' or '~debug' in self.spec:
20+            env.set('DEBUG', '0')
21+
22+
23         if '+onnx_ml' in self.spec:
24             env.set('ONNX_ML', 'ON')
25         elif '~onnx_ml' in self.spec:

All together now the variants can be used in conjunction with the development branch. To focus on the pytorch workflow, we will unpin python@3.10 and add ipython instead.

CPU

Normally, for a CPU build we would setup something like the following:

1DEBUG=1 USE_DISTRIBUTED=0 USE_MKLDNN=0 USE_CUDA=0 BUILD_TEST=0 USE_FBGEMM=0 USE_NNPACK=0 USE_QNNPACK=0 USE_XNNPACK=0 BUILD_CAFFE2=0 USE_KINETO=0 python setup.py develop

The rationale behind the build flags can be found in the upstream contributing guide. In an appropriately defined environment. For us, this translates to (with the patch added for debug):

1spacktivate .
2# CPU only, disable cuda
3# Also disable a bunch of optionals
4spack add py-torch@npeye -cuda -fbgemm -nnpack -mkldnn -test -qnnpack +debug
5spack concretize -f

However, we need to also register this package for development (with the branch setup previously), which is accomplished by:

1spack develop py-torch@npeye -cuda -fbgemm -nnpack -mkldnn -test -qnnpack +debug

This generates a sub-directory py-torch with the right branch checked out, along with the dependencies needed for the build. Additionally, the python version is also localized to the spack installation spec.

1spack install # concretizes and installs

If additional dependencies are required for testing or other purposes, they are easily obtained. After making changes spack install will rebuild with the appropriate flags.

CUDA Setup

The good news is that updating the build system to use CUDA is very straightforward. While changing variants, it is occasionally necessary to forcibly clear out the cache.

1cd py-torch
2rm -rf build
3python setup.py clean
4# More extreme cases
5git submodule deinit -f .
6git clean -xdf
7python setup.py clean
8git submodule update --init --recursive --jobs 0
9python setup.py develop

We will require the CUDA version to install the appropriate tool-chain.

1export NVCC_CUDA_VERSION=$(nvidia-smi -q | awk -F': ' '/CUDA Version/ {print $2}')
2spack add cuda@$NVCC_CUDA_VERSION
3spack concretize --force
4spack install

One caveat of CUDA installations (which cannot be dealt with here) is that spack needs to have read/write access to /tmp/cuda-installer.log because of a ridiculous upstream bug.

Finally we update the spack.yaml to reflect our new changes:

 1spack:
 2  definitions:
 3  - compilers: [gcc@11.2.0]
 4    # add package specs to the `specs` list
 5  specs:
 6  - py-ipython
 7  - cuda@11.3
 8  - py-torch@npeye+cuda+debug~fbgemm~mkldnn~nnpack~qnnpack~test
 9  view: true
10  concretization: together
11  develop:
12    py-torch:
13      spec: py-torch@npeye+cuda+debug~fbgemm~mkldnn~nnpack~qnnpack~test

Which now works seamlessly with spack install.

Baseline Environment

Finally, to test changes against the main branch upstream, it is useful to define an environment for the same. This can be named since it allows for better usage semantics with deactivate.

 1spack env create pytorchBaseline
 2spack -e pytorchBaseline add py-torch@master -cuda -fbgemm -nnpack -mkldnn -test -qnnpack
 3spack -e pytorchBaseline add py-ipython
 4spack -e pytorchBaseline config edit
 5# Add compilers, concretization
 6spacktivate pytorchBaseline
 7spack concretize --force
 8spack install
 9# Do tests, compare
10# ...
11# Wrap up and deactivate
12despacktivate pytorchBaseline

Conclusions

Dependency management is always painful. CUDA management doubly so. Better workflows with spack dev-build are available for some packages, like KOKKOS, but spack dev-build doesn’t work yet for pytorch, and also appears to be removed from the present set of tutorials. Personally, I’d still prefer nix, but where micromamba falls short in terms of source builds, spack is a good alternative, if one has the resources to rebuild everything needed.

It also integrates nicely with typical HPC modular workflows like LMod and has reasonable support for Windows ↩︎

Series info

HPC series

Spack and PyTorch Development Workflows <-- You are here!