This post describes how to set up a transparent automated setup for reproducible R workflows using nixpkgs, niv, and lorri. The explanatory example used throughout the post is one of setting up the rethinking package and running some examples from the excellent second edition of “Statistical Rethinking” by Richard McElreath.

Background

As detailed in an earlier post1, I had set up Nix to work with non-CRAN packages. If the rest of this section is unclear, please refer back to the earlier post.

Setup

For the remainder of the post, we will set up a basic project structure:

mkdir tryRnix/

Now we will create a shell.nix as2:

# shell.nix
{ pkgs ? import <nixpkgs> { } }:
with pkgs;
let
  my-r-pkgs = rWrapper.override {
    packages = with rPackages; [
      ggplot2
      tidyverse
      tidybayes
      tidybayes.rethinking
      (buildRPackage {
        name = "rethinking";
        src = fetchFromGitHub {
          owner = "rmcelreath";
          repo = "rethinking";
          rev = "d0978c7f8b6329b94efa2014658d750ae12b1fa2";
          sha256 = "1qip6x3f6j9lmcmck6sjrj50a5azqfl6rfhp4fdj7ddabpb8n0z0";
        };
        propagatedBuildInputs = [ coda MASS mvtnorm loo shape rstan dagitty ];
      })
    ];
  };
in mkShell {
  buildInputs = with pkgs; [ git glibcLocales openssl which openssh curl wget ];
  inputsFrom = [ my-r-pkgs ];
  shellHook = ''
    mkdir -p "$(pwd)/_libs"
    export R_LIBS_USER="$(pwd)/_libs"
  '';
  GIT_SSL_CAINFO = "${cacert}/etc/ssl/certs/ca-bundle.crt";
  LOCALE_ARCHIVE = stdenv.lib.optionalString stdenv.isLinux
    "${glibcLocales}/lib/locale/locale-archive";
}

So we have:

tree tryRnix
tryRnix
└──shell.nix
0directories,1file

Introspection

At this point:

  • I was able to install packages (system and R) arbitrarily
  • I was able to use project specific folders
  • Unlike npm, pipenv, poetry, conda and friends, my system was not bloated by downloading and setting up the same packages every-time I used them in different projects

However, though this is a major step up from being chained to RStudio and my system package manager, it is still perhaps not immediately obvious how this workflow is reproducible. Admittedly, I have defined my packages in a nice functional manner; but someone else might have a different upstream channel they are tracking, and thus will have different packages. Indeed the only packages which I could be sure of were the R packages I built from Github, since those were tied to a hash. Finally, the setup described for each project is pretty onerous, and it is not immediately clear how to leverage fantastic tools like direnv for working through this.

Towards Reproducible Environments

The astute reader will have noticed that I mentioned that the R packages were reproducible since they were tied to a hash, and might reasonable argue that the entire Nix ecosystem is about hashing in the first place. Once we realize that, the rest is relatively simple3.

Niv and Pinning

Niv essentially keeps track of the channel from which all the packages are installed. Setup is pretty minimal.

cd tryRnix/
nix-env -i niv
niv init

At this point, we have:

tree tryRnix
tryRnix
├──nix
│  ├──sources.json
│  └──sources.nix
└──shell.nix
1directory,3files

We will have to update our shell.nix to use the new sources.

let
  sources = import ./nix/sources.nix;
  pkgs = import sources.nixpkgs { };
  stdenv = pkgs.stdenv;
  my-r-pkgs = pkgs.rWrapper.override {
    packages = with pkgs.rPackages; [
      ggplot2
      tidyverse
      tidybayes
    ];
  };
in pkgs.mkShell {
  buildInputs = with pkgs;[ git glibcLocales openssl which openssh curl wget my-r-pkgs ];
  shellHook = ''
    mkdir -p "$(pwd)/_libs"
    export R_LIBS_USER="$(pwd)/_libs"
  '';
  GIT_SSL_CAINFO = "${pkgs.cacert}/etc/ssl/certs/ca-bundle.crt";
  LOCALE_ARCHIVE = stdenv.lib.optionalString stdenv.isLinux
    "${pkgs.glibcLocales}/lib/locale/locale-archive";
}

We could inspect and edit these sources by hand, but it is much more convenient to simply use niv again when we need to update these.

cd tryRnix/
niv update nixpkgs -b nixpkgs-unstable

At this stage we have a reproducible set of packages ready to use. However it is still pretty annoying to have to go through the trouble of writing nix-shell and also waiting while it rebuilds when we change things.

Lorri and Direnv

In the past, I have made my admiration for direnv very clear (especially for python-poetry). However, though direnv does allow us to include arbitrary bash logic into our projects, it would be nice to have something which has some defaults for nix. Thankfully, the folks at TweagIO developed lorri to scratch that itch.

The basic setup is simple:

nix-env -i lorri
cd tryRnix/
lorri init
tree -a tryRnix/
tryRnix/
├──.envrc
├──nix
│  ├──sources.json
│  └──sources.nix
└──shell.nix
1directory,4files

We can and should inspect the environment lorri wants us to load with direnv file:

cat tryRnix/.envrc
$(lorri direnv)

In and of itself that is not too descriptive, so we should run that on our own first.

EVALUATION_ROOT="$HOME/.cache/lorri/gc_roots/407bd4df60fbda6e3a656c39f81c03c2/gc_root/shell_gc_root"

watch_file "/run/user/1000/lorri/daemon.socket"
watch_file "$EVALUATION_ROOT"

#!/usr/bin/env bash
# ^ shebang is unused as this file is sourced, but present for editor
# integration. Note: Direnv guarantees it *will* be parsed using bash.

function punt () {
    :
}

# move "origPreHook" "preHook" "[email protected]";;
move() {
    srcvarname=$1 # example: varname might contain the string "origPATH"
    # drop off the source variable name
    shift

    destvarname=$1 # example: destvarname might contain the string "PATH"
    # drop off the destination variable name
    shift

    # like: export origPATH="...some-value..."
    export "${@?}";

    # set $original to the contents of the variable $srcvarname
    # refers to
    eval "$destvarname=\"${!srcvarname}\""

    # mark the destvarname as exported so direnv picks it up
    # (shellcheck: we do want to export the content of destvarname!)
    # shellcheck disable=SC2163
    export "$destvarname"

    # remove the export from above, ie: export origPATH...
    unset "$srcvarname"
}

function prepend() {
    varname=$1 # example: varname might contain the string "PATH"

    # drop off the varname
    shift

    separator=$1 # example: separator would usually be the string ":"

    # drop off the separator argument, so the remaining arguments
    # are the arguments to export
    shift

    # set $original to the contents of the the variable $varname
    # refers to
    original="${!varname}"

    # effectfully accept the new variable's contents
    export "${@?}";

    # re-set $varname's variable to the contents of varname's
    # reference, plus the current (updated on the export) contents.
    # however, exclude the ${separator} unless ${original} starts
    # with a value
    eval "$varname=${!varname}${original:+${separator}${original}}"
}

function append() {
    varname=$1 # example: varname might contain the string "PATH"

    # drop off the varname
    shift

    separator=$1 # example: separator would usually be the string ":"
    # drop off the separator argument, so the remaining arguments
    # are the arguments to export
    shift


    # set $original to the contents of the the variable $varname
    # refers to
    original="${!varname:-}"

    # effectfully accept the new variable's contents
    export "${@?}";

    # re-set $varname's variable to the contents of varname's
    # reference, plus the current (updated on the export) contents.
    # however, exclude the ${separator} unless ${original} starts
    # with a value
    eval "$varname=${original:+${original}${separator}}${!varname}"
}

varmap() {
    if [ -f "$EVALUATION_ROOT/varmap-v1" ]; then
        # Capture the name of the variable being set
        IFS="=" read -r -a cur_varname <<< "$1"

        # With IFS='' and the `read` delimiter being '', we achieve
        # splitting on \0 bytes while also preserving leading
        # whitespace:
        #
        #    bash-3.2$ printf ' <- leading space\0bar\0baz\0' \
        #                  | (while IFS='' read -d $'\0' -r x; do echo ">$x<"; done)
        #    > <- leading space<
        #    >bar<
        #    >baz<```
        while IFS='' read -r -d '' map_instruction \
           && IFS='' read -r -d '' map_variable \
           && IFS='' read -r -d '' map_separator; do
            unset IFS

            if [ "$map_variable" == "${cur_varname[0]}" ]; then
                if [ "$map_instruction" == "append" ]; then
                    append "$map_variable" "$map_separator" "[email protected]"
                    return
                fi
            fi
        done < "$EVALUATION_ROOT/varmap-v1"
    fi


    export "${@?}"
}

function declare() {
    if [ "$1" == "-x" ]; then shift; fi

    # Some variables require special handling.
    #
    # - punt:    don't set the variable at all
    # - prepend: take the new value, and put it before the current value.
    case "$1" in
        # vars from: https://github.com/NixOS/nix/blob/92d08c02c84be34ec0df56ed718526c382845d1a/src/nix-build/nix-build.cc#L100
        "HOME="*) punt;;
        "USER="*) punt;;
        "LOGNAME="*) punt;;
        "DISPLAY="*) punt;;
        "PATH="*) prepend "PATH" ":" "[email protected]";;
        "TERM="*) punt;;
        "IN_NIX_SHELL="*) punt;;
        "TZ="*) punt;;
        "PAGER="*) punt;;
        "NIX_BUILD_SHELL="*) punt;;
        "SHLVL="*) punt;;

        # vars from: https://github.com/NixOS/nix/blob/92d08c02c84be34ec0df56ed718526c382845d1a/src/nix-build/nix-build.cc#L385
        "TEMPDIR="*) punt;;
        "TMPDIR="*) punt;;
        "TEMP="*) punt;;
        "TMP="*) punt;;

        # vars from: https://github.com/NixOS/nix/blob/92d08c02c84be34ec0df56ed718526c382845d1a/src/nix-build/nix-build.cc#L421
        "NIX_ENFORCE_PURITY="*) punt;;

        # vars from: https://www.gnu.org/software/bash/manual/html_node/Bash-Variables.html (last checked: 2019-09-26)
        # reported in https://github.com/target/lorri/issues/153
        "OLDPWD="*) punt;;
        "PWD="*) punt;;
        "SHELL="*) punt;;

        # https://github.com/target/lorri/issues/97
        "preHook="*) punt;;
        "origPreHook="*) move "origPreHook" "preHook" "[email protected]";;

        *) varmap "[email protected]" ;;
    esac
}

export IN_NIX_SHELL=impure

if [ -f "$EVALUATION_ROOT/bash-export" ]; then
    # shellcheck disable=SC1090
    . "$EVALUATION_ROOT/bash-export"
elif [ -f "$EVALUATION_ROOT" ]; then
    # shellcheck disable=SC1090
    . "$EVALUATION_ROOT"
fi

unset declare

Jun 06 19:02:32.368 INFO lorri has not completed an evaluation for this project yet, expr: $HOME/Git/Github/WebDev/Mine/haozeke.github.io/content-org/tryRnix/shell.nix
Jun 06 19:02:32.368 WARN `lorri direnv` should be executed by direnv from within an `.envrc` file, expr: $HOME/Git/Github/WebDev/Mine/haozeke.github.io/content-org/tryRnix/shell.nix

Upon inspection, that seems to check out. So now we can enable this.

direnv allow

Additionally, we will need to stick to using a pure environment as much as possible to prevent unexpected situations. So we set:

# .envrc
eval "$(lorri direnv)"
nix-shell --run bash --pure

There’s still a catch though. We need to have lorri daemon running to make sure the packages are built automatically without us having to exit the shell and re-run things. We can turn to the documentation for this. Essentially, we need to have a user-level systemd socket file and service for lorri.

# ~/.config/systemd/user/lorri.socket
[Unit]
Description=Socket for Lorri Daemon

[Socket]
ListenStream=%t/lorri/daemon.socket
RuntimeDirectory=lorri

[Install]
WantedBy=sockets.target
# ~/.config/systemd/user/lorri.service
[Unit]
Description=Lorri Daemon
Requires=lorri.socket
After=lorri.socket

[Service]
ExecStart=%h/.nix-profile/bin/lorri daemon
PrivateTmp=true
ProtectSystem=strict
ProtectHome=read-only
Restart=on-failure

With that we are finally ready to start working with our auto-managed, reproducible environments.

systemctl --user daemon-reload && \
systemctl --user enable --now lorri.socket

Rethinking

As promised, we will first test the setup to see that everything is working. Now is also a good time to try the tidybayes.rethinking package. In order to use it, we will need to define the rethinking package in a way so we can pass it to the buildInputs for tidybayes.rethinking. We will modify new shell.nix as follows:

# shell.nix
let
  sources = import ./nix/sources.nix;
  pkgs = import sources.nixpkgs { };
  stdenv = pkgs.stdenv;
  rethinking = with pkgs.rPackages;
    buildRPackage {
      name = "rethinking";
      src = pkgs.fetchFromGitHub {
        owner = "rmcelreath";
        repo = "rethinking";
        rev = "d0978c7f8b6329b94efa2014658d750ae12b1fa2";
        sha256 = "1qip6x3f6j9lmcmck6sjrj50a5azqfl6rfhp4fdj7ddabpb8n0z0";
      };
      propagatedBuildInputs = [ coda MASS mvtnorm loo shape rstan dagitty ];
    };
  tidybayes_rethinking = with pkgs.rPackages;
    buildRPackage {
      name = "tidybayes.rethinking";
      src = pkgs.fetchFromGitHub {
        owner = "mjskay";
        repo = "tidybayes.rethinking";
        rev = "df903c88f4f4320795a47c616eef24a690b433a4";
        sha256 = "1jl3189zdddmwm07z1mk58hcahirqrwx211ms0i1rzbx5y4zak0c";
      };
      propagatedBuildInputs =
        [ dplyr tibble rlang MASS tidybayes rethinking rstan ];
    };
  rEnv = pkgs.rWrapper.override {
    packages = with pkgs.rPackages; [
      ggplot2
      tidyverse
      tidybayes
      devtools
      modelr
      cowplot
      ggrepel
      RColorBrewer
      purrr
      forcats
      rstan
      rethinking
      tidybayes_rethinking
    ];
  };
in pkgs.mkShell {
  buildInputs = with pkgs; [ git glibcLocales which ];
  inputsFrom = [ rEnv ];
  shellHook = ''
    mkdir -p "$(pwd)/_libs"
    export R_LIBS_USER="$(pwd)/_libs"
  '';
  GIT_SSL_CAINFO = "${pkgs.cacert}/etc/ssl/certs/ca-bundle.crt";
  LOCALE_ARCHIVE = stdenv.lib.optionalString stdenv.isLinux
    "${pkgs.glibcLocales}/lib/locale/locale-archive";
}

The main thing to note here is that we need the output of the derivation we create here, i.e. we need to use inputsFrom and NOT buildInputs for rEnv.

Let us try to get a nice graphic for the conclusion.

library(magrittr)
library(dplyr)
library(purrr)
library(forcats)
library(tidyr)
library(modelr)
library(tidybayes)
library(tidybayes.rethinking)
library(ggplot2)
library(cowplot)
library(rstan)
library(rethinking)
library(ggrepel)
library(RColorBrewer)

theme_set(theme_tidybayes())
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())


set.seed(5)
n = 10
n_condition = 5
ABC =
  tibble(
    condition = factor(rep(c("A","B","C","D","E"), n)),
    response = rnorm(n * 5, c(0,1,2,1,-1), 0.5)
  )

mtcars_clean = mtcars %>%
  mutate(cyl = factor(cyl))

m_cyl = ulam(alist(
    cyl ~ dordlogit(phi, cutpoint),
    phi <- b_mpg*mpg,
    b_mpg ~ student_t(3, 0, 10),
    cutpoint ~ student_t(3, 0, 10)
  ),
  data = mtcars_clean,
  chains = 4,
  cores = parallel::detectCores(),
  iter = 2000
)

cutpoints = m_cyl %>%
  recover_types(mtcars_clean) %>%
  spread_draws(cutpoint[cyl])

# define the last cutpoint
last_cutpoint = tibble(
  .draw = 1:max(cutpoints$.draw),
  cyl = "8",
  cutpoint = Inf
)

cutpoints = bind_rows(cutpoints, last_cutpoint) %>%
  # define the previous cutpoint (cutpoint_{j-1})
  group_by(.draw) %>%
  arrange(cyl) %>%
  mutate(prev_cutpoint = lag(cutpoint, default = -Inf))

fitted_cyl_probs = mtcars_clean %>%
  data_grid(mpg = seq_range(mpg, n = 101)) %>%
  add_fitted_draws(m_cyl) %>%
  inner_join(cutpoints, by = ".draw") %>%
  mutate(`P(cyl | mpg)` =
    # this part is logit^-1(cutpoint_j - beta*x) - logit^-1(cutpoint_{j-1} - beta*x)
    plogis(cutpoint - .value) - plogis(prev_cutpoint - .value)
  )


data_plot = mtcars_clean %>%
  ggplot(aes(x = mpg, y = cyl, color = cyl)) +
  geom_point() +
  scale_color_brewer(palette = "Dark2", name = "cyl")

fit_plot = fitted_cyl_probs %>%
  ggplot(aes(x = mpg, y = `P(cyl | mpg)`, color = cyl)) +
  stat_lineribbon(aes(fill = cyl), alpha = 1/5) +
  scale_color_brewer(palette = "Dark2") +
  scale_fill_brewer(palette = "Dark2")

png(filename="../images/rethinking.png")
plot_grid(ncol = 1, align = "v",
  data_plot,
  fit_plot
)
dev.off

Finally we will run this in our environment.

Rscript tesPlot.R

Conclusions

This post was really more of an exploratory follow up to the previous post, and does not really work in isolation. Then again, at this point everything seems to have worked out well. R with Nix has finally become a truly viable combination for any and every analysis under the sun. Some parts of the workflow are still a bit janky, but will probably resolve themselves over time.

Update: There is a final part detailing automated ways of reloading the system configuration


  1. My motivations were laid out in the aforementioned post, and will not be repeated ↩︎

  2. For why these are the way they are see the this is written, see the aforementioned post ↩︎

  3. Christine Dodrill has a great write up on using these tools as well ↩︎