DeepSeek via HPC in JupyterLab Setup
In my previous blog post I showed how to start DeepSeek R1 in the High Performance Cluster (HPC) of the TUD.
This gave us a prompt. Which is not very useful if you want to automatically query DeepSeek and process data with it.
A solution is to connect DeepSeek to JupyterLab. However, starting JupyterLab in the same HPC Job is not possible and JupyterHub of the TUD is quite limited (long startup times, limited environment possibilities due to missing C-dependency capability for installing python packages etc.).
As an alternative, we can use a Jump Host, similar to what I used for Stable Diffusion.
HPC Setup
Let’s first setup the HPC environment.
- Connect to your jump host and open a byobu session
ssh your-vm-user@141.11.11.1
byobu
F2 # create a new window
Connect to the HPC alpha
cluster from within your byobu session:
ssh -A s1111111@login1.alpha.hpc.tu-dresden.de
Start a GPU job with 1
GPU and 8
hours runtime:
srun --account=p_llm_mapping \
--gres=gpu:1 \
--pty --ntasks=1 --cpus-per-task=2 --nodes=1 \
--time=8:00:00 --mem-per-cpu=16000 --pty bash -l
Ollama Setup
This needs to be done once. Since ollama models are quite big (you can easily cover 100+ GB with a few models), we needs to reserve a workspace. We use horse
for this.
ws_allocate -F horse -r 7 -m s1111111@tu-dresden.de p_llm_mapping 90
Info: creating workspace. /data/horse/ws/s1111111-p_llm_mapping remaining extensions : 10 remaining time in days: 90
To later extend the workspace period for another 30 days:
ws_extend -F horse p_llm_mapping 30
Info: extending workspace. /data/horse/ws/s1111111-p_llm_mapping remaining extensions : 10 remaining time in days: 30
The next step is to save the ollama
bin to the workspace and make sure all the models are saved to the workspace as well.
Get ollama binary:
cd /data/horse/ws/s1111111-p_llm_mapping
curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz
tar -C . -xzf ollama-linux-amd64.tgz
rm ollama-linux-amd64.tgz
mv ./ollama-linux-amd64 ollama
Create a models folder and symlink into your home folder:
cd /data/horse/ws/s1111111-p_llm_mapping
mkdir models
cd ~/
mkdir .ollama
cd ~/.ollama
rm -rf models
ln -s /data/horse/ws/s1111111-p_llm_mapping/models ~/.ollama/models
Make workspace accessible for others in the project:
chown -R s1111111:p_llm_mapping /data/horse/ws/s1111111-p_llm_mapping
chmod -R g+rx /data/horse/ws/s1111111-p_llm_mapping
This is necessary if you have colleages who are working with you on the same HPC project.
Now you can start and connect your HPC job to the Jump Host:
- first we add the ollama binary to
PATH
- then we start
ollama serve
in a SSH session connected to port11434
of our Jump Host
PATH=$PATH:/data/horse/ws/s1111111-p_llm_mapping/ollama/bin
ssh service@141.76.18.72 -o ExitOnForwardFailure=yes \
-o ServerAliveInterval=20 -f -R :11434:127.0.0.1:11434 -p 22 -N; \
ollama serve
VM Setup
We use Carto-Lab Docker. Follow the Setup guide or use your own JupyterLab docker container.
In order to be able to connect to port 11434
on the host from within the Docker Container, we need to enable host-networking.
Edit the Carto-Lab Docker docker-compose.yml
:
services:
jupyterlab:
image: registry.gitlab.vgiscience.org/lbsn/tools/jupyterlab:${TAG:-latest}
network_mode: "host"
# networks:
# - lbsn-network
# networks:
# lbsn-network:
# name: ${NETWORK_NAME:-lbsn-network}
# external: true
- add
network_mode: "host"
- comment out
networks:
Now start Carto-Lab Docker and connect to it. See this Notebook where I show how to connect from JupyterLab to DeepSeek via Rest API/requests:
Clean up stale connections
I saw students have problems with stale connections blocking the port, which prevents future connections.
E.g.:
sudo lsof -i :11434
> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
> sshd 166119 student 10u IPv6 2077667 0t0 TCP localhost:11434 (LISTEN)
> sshd 166119 student 11u IPv4 2077668 0t0 TCP localhost:11434 (LISTEN)
Since I did not give sudo
rights to kill processes, I used a workaround.
- Create a script in
/usr/local/bin/kill_ssh_tunnel.sh
#!/bin/bash
PORT="11434"
USER=$(whoami)
# Get all SSHD PIDs owned by the user
PIDS=$(pgrep -u "$USER" sshd)
if [[ -z "$PIDS" ]]; then
echo "No SSH tunnel processes found."
exit 1
fi
# Convert port to hex (needed for /proc/net/tcp lookup)
PORT_HEX=$(printf "%04X" "$PORT")
# Find processes with an open connection on the target port
FOUND_PIDS=()
for PID in $PIDS; do
if grep -q ":$PORT_HEX" /proc/$PID/net/tcp* 2>/dev/null; then
FOUND_PIDS+=("$PID")
fi
done
# If no relevant processes found
if [[ ${#FOUND_PIDS[@]} -eq 0 ]]; then
echo "No active SSH tunnels using port $PORT."
exit 1
fi
# Kill the identified processes
echo "Killing SSH tunnel processes: ${FOUND_PIDS[*]}"
kill "${FOUND_PIDS[@]}"
# Confirm termination
sleep 1
for PID in "${FOUND_PIDS[@]}"; do
if kill -0 "$PID" 2>/dev/null; then
echo "Failed to kill SSH tunnel process $PID."
else
echo "SSH tunnel process $PID terminated."
fi
done
- Update permissions
chmod +x /usr/local/bin/kill_ssh_tunnel.sh
chmod 755 /usr/local/bin/kill_ssh_tunnel.sh
- Now the user can kill ssh processes without sudo
kill_ssh_tunnel.sh
> Killing SSH tunnel processes: 72465 166119
> SSH tunnel process 72465 terminated.
> SSH tunnel process 166119 terminated.