8 minutes
Written: 2022-04-24 19:16 +0000
Updated: 2024-08-06 00:53 +0000
Spack and PyTorch Development Workflows
pytorch
local development environments without tears, root access orspack dev-build
Background
Build systems are a bit of a nightmare. I spend most of my time SSH’ed into more
powerful CPU machines, and sometimes on machines with more exotic compute
devices. micromamba
is typically my poison of choice where nix
is not
an option.
However, micromamba
doesn’t allow a whole lot in the way of setting up
environments which do not use packages already on conda-forge
. Spack fills a
nice niche for such situations1. Additionally, it can be coerced
into use for local development and the same cannot easily be said of conda
based workflows.
Humble Origins
We’ll start from scratch, grabbing spack
and bootstrapping clingo
for its
dependency resolution as per the documentation.
1git clone -c feature.manyFiles=true https://github.com/spack/spack.git
2. spack/share/spack/setup-env.sh # Assumes zsh || bash
3spack spec zlib # Or anything really
4spack --bootstrap find # Should have clingo
Environments and Purity
Unlike nix
, spack
does not offer strong guarantees of purity. There are
primarily two approaches to using environments in spack
, somewhat analogously
to the conda
environment logic.
- Named Environments
- These are essentially equivalent to
conda
environments, and can be activated and deactivated at will. However, the folder structure in this case, along with the dependencies are localized within$SPACK_HOME/var/spack/environments/$ENV_NAME
- Anonymous Environments
- These can be setup inside any directory, and can be activated but not deactivated (
despacktivate
will not work). These are useful for development environments.
We will use both kinds of environments. Additionally, spack
supports multiple
variants
which can be queried by spack info $PKG_NAME
. These are used to
support various build configurations while providing a unified interface build
through spack install
.
Basic Anonymous Environments
For starters we will need to setup a development environment.
1mkdir spackPyTorch
2cd spackPyTorch
3spack env create -d .
To ensure that the dependencies are resolved consistently, the concretization
option to needs to be set to together
. We will start by adding some packages.
1spack -e . add py-ipython
2spack -e . config edit # Make changes
For ease of manual customization, it is best to lightly edit the spack.yaml
file to have each dependency on its own line. At this point a bare-minimum
near-empty environment is ready.
1spack:
2 specs:
3 - py-ipython
4 view: true
5 concretization: together
To ensure dependencies are resolved simultaneously, concretize
is set to
together
.
1spack -e . concretize # doesn't install
2spack -e . install
3spack env activate . # or spacktivate .
4# Can also use the fully qualified path instead of .
Recall that changes are propagated via:
1spack -e . concretize --force
2spack -e . install
This establishes baseline environments, but precludes setting up development workflows. Most packages can be included as described above, with the exception of compiler toolchains.
Compiler Setups
Setting up compiler toolchains like a fortran
compiler (perhaps for
openblas
) can take a bit more effort. Although this discussion will focus on
obtaining a fortran
compiler it is equally applicable to updating a version.
For spack
, unlike many system package managers, gcc
will install the C
,
C++
and Fortran
toolchains. Thus:
1# outside the spack environment
2spack install gcc@11.2.0 # will take a while
Now we need to let spack
register this compiler.
1spack compiler find $(spack location -i gcc@11.2.0)
2spack compilers # should now list GCC 11.2.0
Finally we can edit our spack.yml
environment to reflect the new compiler toolchain.
1spack:
2 definitions:
3 - compilers: [gcc@11.2.0]
4 specs:
5 - py-ipython
6 view: true
7 concretization: together
One of the nicer parts of spack
being an HPC environment management tool is
that there is first-class support for proprietary compiler toolchains like Intel
and families of compilers can also be specified with the %intel
syntax as
well. Much more fine-tuning is also possible including registering system
compilers if required.
PyTorch Local Development
One of the caveats of local development with spack
is that the base URL needs
to be updated from within the local copy of spack
. This means editing:
1vim $SPACK_ROOT/var/spack/repos/builtin/packages/py-torch/package.py # edit
The patch is minimally complicated, for a fork and a working branch like npeye
it would look like this diff:
1diff --git a/var/spack/repos/builtin/packages/py-torch/package.py b/var/spack/repos/builtin/packages/py-torch/package.py
2index 8190e01102..c276009d10 100644
3--- a/var/spack/repos/builtin/packages/py-torch/package.py
4+++ b/var/spack/repos/builtin/packages/py-torch/package.py
5@@ -14,7 +14,7 @@ class PyTorch(PythonPackage, CudaPackage):
6 with strong GPU acceleration."""
7
8 homepage = "https://pytorch.org/"
9- git = "https://github.com/pytorch/pytorch.git"
10+ git = "https://github.com/HaoZeke/pytorch.git"
11
12 maintainers = ['adamjstewart']
13
14@@ -22,6 +22,7 @@ class PyTorch(PythonPackage, CudaPackage):
15 # core libraries to ensure that the package was successfully installed.
16 import_modules = ['torch', 'torch.autograd', 'torch.nn', 'torch.utils']
17
18+ version('npeye', branch='npeye', submodules=True)
19 version('master', branch='master', submodules=True)
20 version('1.11.0', tag='v1.11.0', submodules=True)
21 version('1.10.2', tag='v1.10.2', submodules=True)
22@@ -348,7 +349,8 @@ def enable_or_disable(variant, keyword='USE', var=None, newer=False):
23 elif '~onnx_ml' in self.spec:
24 env.set('ONNX_ML', 'OFF')
25
26- if not self.spec.satisfies('@master'):
27+ if not (self.spec.satisfies('@master') or
28+ self.spec.satisfies('@npeye')):
29 env.set('PYTORCH_BUILD_VERSION', self.version)
30 env.set('PYTORCH_BUILD_NUMBER', 0)
Applying such a patch is straightforward.
1cd $SPACK_ROOT
2# Assuming it is named pytorchspack.diff
3git apply pytorchspack.diff
This modified environment can now be enabled for use with the appropriate
variants (details found with spack info py-torch
).
However, there is one rather important variant missing for local development,
DEBUG
. The application of this patch will rectify this until an upstream PR is
merged.
1diff --git a/var/spack/repos/builtin/packages/py-torch/package.py b/var/spack/repos/builtin/packages/py-torch/package.py
2index 8190e01102..d7b68ae4bd 100644
3--- a/var/spack/repos/builtin/packages/py-torch/package.py
4+++ b/var/spack/repos/builtin/packages/py-torch/package.py
5@@ -49,6 +49,7 @@ class PyTorch(PythonPackage, CudaPackage):
6
7 # All options are defined in CMakeLists.txt.
8 # Some are listed in setup.py, but not all.
9+ variant('debug', default=False, description="Build with debugging support")
10 variant('caffe2', default=True, description='Build Caffe2', when='@1.7:')
11 variant('test', default=False, description='Build C++ test binaries')
12 variant('cuda', default=not is_darwin, description='Use CUDA')
13@@ -343,6 +344,12 @@ def enable_or_disable(variant, keyword='USE', var=None, newer=False):
14 enable_or_disable('gloo', newer=True)
15 enable_or_disable('tensorpipe')
16
17+ if '+debug' in self.spec:
18+ env.set('DEBUG', 1)
19+ elif '-debug' or '~debug' in self.spec:
20+ env.set('DEBUG', '0')
21+
22+
23 if '+onnx_ml' in self.spec:
24 env.set('ONNX_ML', 'ON')
25 elif '~onnx_ml' in self.spec:
All together now the variants can be used in conjunction with the development
branch. To focus on the pytorch
workflow, we will unpin python@3.10
and add
ipython
instead.
CPU
Normally, for a CPU build we would setup something like the following:
1DEBUG=1 USE_DISTRIBUTED=0 USE_MKLDNN=0 USE_CUDA=0 BUILD_TEST=0 USE_FBGEMM=0 USE_NNPACK=0 USE_QNNPACK=0 USE_XNNPACK=0 BUILD_CAFFE2=0 USE_KINETO=0 python setup.py develop
The rationale behind the build flags can be found in the upstream contributing
guide. In an appropriately defined environment. For us, this translates to
(with the patch added for debug
):
1spacktivate .
2# CPU only, disable cuda
3# Also disable a bunch of optionals
4spack add py-torch@npeye -cuda -fbgemm -nnpack -mkldnn -test -qnnpack +debug
5spack concretize -f
However, we need to also register this package for development (with the branch setup previously), which is accomplished by:
1spack develop py-torch@npeye -cuda -fbgemm -nnpack -mkldnn -test -qnnpack +debug
This generates a sub-directory py-torch
with the right branch checked out,
along with the dependencies needed for the build. Additionally, the
python
version is also localized to the spack
installation spec.
1spack install # concretizes and installs
If additional dependencies are required for testing or other purposes, they are
easily obtained. After making changes spack install
will rebuild with the
appropriate flags.
CUDA Setup
The good news is that updating the build system to use CUDA is very straightforward. While changing variants, it is occasionally necessary to forcibly clear out the cache.
1cd py-torch
2rm -rf build
3python setup.py clean
4# More extreme cases
5git submodule deinit -f .
6git clean -xdf
7python setup.py clean
8git submodule update --init --recursive --jobs 0
9python setup.py develop
We will require the CUDA version to install the appropriate tool-chain.
1export NVCC_CUDA_VERSION=$(nvidia-smi -q | awk -F': ' '/CUDA Version/ {print $2}')
2spack add cuda@$NVCC_CUDA_VERSION
3spack concretize --force
4spack install
One caveat of CUDA installations (which cannot be dealt with here) is that
spack
needs to have read/write access to /tmp/cuda-installer.log
because of
a ridiculous upstream bug.
Finally we update the spack.yaml
to reflect our new changes:
1spack:
2 definitions:
3 - compilers: [gcc@11.2.0]
4 # add package specs to the `specs` list
5 specs:
6 - py-ipython
7 - cuda@11.3
8 - py-torch@npeye+cuda+debug~fbgemm~mkldnn~nnpack~qnnpack~test
9 view: true
10 concretization: together
11 develop:
12 py-torch:
13 spec: py-torch@npeye+cuda+debug~fbgemm~mkldnn~nnpack~qnnpack~test
Which now works seamlessly with spack install
.
Baseline Environment
Finally, to test changes against the main branch upstream, it is useful to
define an environment for the same. This can be named since it allows for better
usage semantics with deactivate
.
1spack env create pytorchBaseline
2spack -e pytorchBaseline add py-torch@master -cuda -fbgemm -nnpack -mkldnn -test -qnnpack
3spack -e pytorchBaseline add py-ipython
4spack -e pytorchBaseline config edit
5# Add compilers, concretization
6spacktivate pytorchBaseline
7spack concretize --force
8spack install
9# Do tests, compare
10# ...
11# Wrap up and deactivate
12despacktivate pytorchBaseline
Conclusions
Dependency management is always painful. CUDA management doubly so. Better
workflows with spack dev-build are available for some packages, like KOKKOS, but
spack dev-build
doesn’t work yet for pytorch
, and also appears to be removed
from the present set of tutorials. Personally, I’d still prefer nix
, but where
micromamba
falls short in terms of source builds, spack
is a good
alternative, if one has the resources to rebuild everything needed.
It also integrates nicely with typical HPC modular workflows like LMod and has reasonable support for Windows ↩︎