Mar 31, 2025

DeepSeek via HPC in JupyterLab Setup

In my previous blog post I showed how to start DeepSeek R1 in the High Performance Cluster (HPC) of the TUD.

This gave us a prompt. Which is not very useful if you want to automatically query DeepSeek and process data with it.

A solution is to connect DeepSeek to JupyterLab. However, starting JupyterLab in the same HPC Job is not possible and JupyterHub of the TUD is quite limited (long startup times, limited environment possibilities due to missing C-dependency capability for installing python packages etc.).

As an alternative, we can use a Jump Host, similar to what I used for Stable Diffusion.

HPC Setup

Let’s first setup the HPC environment.

Connect to your jump host and open a byobu session

ssh your-vm-user@141.11.11.1
byobu
F2 # create a new window

Connect to the HPC alpha cluster from within your byobu session:

ssh -A s1111111@login1.alpha.hpc.tu-dresden.de

Start a GPU job with 1 GPU and 8 hours runtime:

srun --account=p_llm_mapping \
    --gres=gpu:1 \
    --pty --ntasks=1 --cpus-per-task=2 --nodes=1 \
    --time=8:00:00 --mem-per-cpu=16000 --pty bash -l

Ollama Setup

This needs to be done once. Since ollama models are quite big (you can easily cover 100+ GB with a few models), we needs to reserve a workspace. We use horse for this.

ws_allocate -F horse -r 7 -m s1111111@tu-dresden.de p_llm_mapping 90

Info: creating workspace.
/data/horse/ws/s1111111-p_llm_mapping
remaining extensions : 10
remaining time in days: 90

To later extend the workspace period for another 30 days:

ws_extend -F horse p_llm_mapping 30

Info: extending workspace.
/data/horse/ws/s1111111-p_llm_mapping
remaining extensions : 10
remaining time in days: 30

The next step is to save the ollama bin to the workspace and make sure all the models are saved to the workspace as well.

Get ollama binary:

cd /data/horse/ws/s1111111-p_llm_mapping
curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz
tar -C . -xzf ollama-linux-amd64.tgz
rm ollama-linux-amd64.tgz
mv ./ollama-linux-amd64 ollama

Create a models folder and symlink into your home folder:

cd /data/horse/ws/s1111111-p_llm_mapping
mkdir models
cd ~/
mkdir .ollama
cd ~/.ollama
rm -rf models
ln -s /data/horse/ws/s1111111-p_llm_mapping/models ~/.ollama/models

Make workspace accessible for others in the project:

chown -R s1111111:p_llm_mapping /data/horse/ws/s1111111-p_llm_mapping
chmod -R g+rx /data/horse/ws/s1111111-p_llm_mapping

This is necessary if you have colleages who are working with you on the same HPC project.

Now you can start and connect your HPC job to the Jump Host:

first we add the ollama binary to PATH
then we start ollama serve in a SSH session connected to port 11434 of our Jump Host

PATH=$PATH:/data/horse/ws/s1111111-p_llm_mapping/ollama/bin
ssh service@141.76.18.72 -o ExitOnForwardFailure=yes \
    -o ServerAliveInterval=20 -f -R :11434:127.0.0.1:11434 -p 22 -N; \
    ollama serve

VM Setup

We use Carto-Lab Docker. Follow the Setup guide or use your own JupyterLab docker container.

In order to be able to connect to port 11434 on the host from within the Docker Container, we need to enable host-networking.

Edit the Carto-Lab Docker docker-compose.yml:

services:

  jupyterlab:
    image: registry.gitlab.vgiscience.org/lbsn/tools/jupyterlab:${TAG:-latest}
    network_mode: "host"
    # networks:
    #  - lbsn-network

# networks:
#  lbsn-network:
#    name: ${NETWORK_NAME:-lbsn-network}
#    external: true

add network_mode: "host"
comment out networks:

Now start Carto-Lab Docker and connect to it. See this Notebook where I show how to connect from JupyterLab to DeepSeek via Rest API/requests:

https://code.ad.ioer.info/wip/snippets/html/2025-03-25_Deepseek.html

Clean up stale connections

I saw students have problems with stale connections blocking the port, which prevents future connections.

E.g.:

sudo lsof -i :11434
> COMMAND    PID    USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
> sshd    166119 student   10u  IPv6 2077667      0t0  TCP localhost:11434 (LISTEN)
> sshd    166119 student   11u  IPv4 2077668      0t0  TCP localhost:11434 (LISTEN)

Since I did not give sudo rights to kill processes, I used a workaround.

Create a script in /usr/local/bin/kill_ssh_tunnel.sh

#!/bin/bash

PORT="11434"
USER=$(whoami)

# Get all SSHD PIDs owned by the user
PIDS=$(pgrep -u "$USER" sshd)

if [[ -z "$PIDS" ]]; then
    echo "No SSH tunnel processes found."
    exit 1
fi

# Convert port to hex (needed for /proc/net/tcp lookup)
PORT_HEX=$(printf "%04X" "$PORT")

# Find processes with an open connection on the target port
FOUND_PIDS=()
for PID in $PIDS; do
    if grep -q ":$PORT_HEX" /proc/$PID/net/tcp* 2>/dev/null; then
        FOUND_PIDS+=("$PID")
    fi
done

# If no relevant processes found
if [[ ${#FOUND_PIDS[@]} -eq 0 ]]; then
    echo "No active SSH tunnels using port $PORT."
    exit 1
fi

# Kill the identified processes
echo "Killing SSH tunnel processes: ${FOUND_PIDS[*]}"
kill "${FOUND_PIDS[@]}"

# Confirm termination
sleep 1
for PID in "${FOUND_PIDS[@]}"; do
    if kill -0 "$PID" 2>/dev/null; then
        echo "Failed to kill SSH tunnel process $PID."
    else
        echo "SSH tunnel process $PID terminated."
    fi
done

Update permissions

chmod +x /usr/local/bin/kill_ssh_tunnel.sh
chmod 755 /usr/local/bin/kill_ssh_tunnel.sh

Now the user can kill ssh processes without sudo

kill_ssh_tunnel.sh
> Killing SSH tunnel processes: 72465 166119
> SSH tunnel process 72465 terminated.
> SSH tunnel process 166119 terminated.

HPC Setup

Ollama Setup

VM Setup

Clean up stale connections

See Also