Trivial LLMs on old hardware

Apropos nothing. R1 represents the remote machine. L1 denotes the local machine. Tailscale connects the two.

1# R1, assuming Ollama is installed
2pixi global install --environment llmlocal ollama
3# or with a GPU
4pixi global install --environment llmlocal "ollama *cuda*"
5export OLLAMA_HOST="0.0.0.0"
6ollama serve
7# another tab
8ollama run gpt-oss:20b

A preliminary check on the local machine:

1curl http://localhost:11434/api/tags

Followed by some more reasonable usage:

1# L1
2ssh -N -L 11434:127.0.0.1:11434 $R1_HOST
3export OLLAMA_API_BASE=http://localhost:11434
4pixi global install --environment llmlocal aider-chat gpustat
5aider --model ollama/qwen3-coder:480b-cloud blah.py
6# different tab
7gpustat -i 1

Tunneling the API port over SSH via Tailscale ensures a secure connection without exposing the Ollama instance to the open web. This workflow facilitates the usage of substantial models on lightweight clients, effectively offloading the heavy lifting to dedicated hardware, and making good use of older machines.

Also see this post for more details.