Running LLMs on DAIC
5 minute read
This guide shows you how to serve and use Large Language Models (LLMs) on DAIC using Ollama, a tool that lets you run models like Meta’s Llama models, Mistral models, or HuggingFace’s models for inference.
1. Clone the Template Repository
First, navigate to your project storage space. Then, clone the public REIT Ollama Serving repository. This ensures that all generated files, models, and containers are stored in the correct location, not in your home directory.
cd /tudelft.net/staff-umbrella/<your_project_name> # Replace with your actual project path
git clone https://gitlab.ewi.tudelft.nl/reit/reit-ollama-serving-template.git
tree reit-ollama-serving-template
Next, define and export the following environment variables:
Variable | Value | Default |
---|---|---|
PROJECT_DIR | Path where you like to store models/data for your project | |
CONTAINER_DIR | Directory where ollama container will be stored | ${PROJECT_DIR}/containers |
OLLAMA_DEBUG | Enable debug logging by setting to 1 | 0 |
Tip
PROJECT_DIR
is the main directory for your project. To avoid$HOME
quota issues, it is recommended to use a path in yourbulk
orumbrella
storage.CONTAINER_DIR
is where the Ollama container image will be stored. The default is acontainers
subdirectory within yourPROJECT_DIR
. If you want to change this location, make sure to update theCONTAINER_DIR
variable accordingly. Alternatively, if you have a pre-builtollama.sif
image, you can setOLLAMA_IMG
to its path.- Setting
OLLAMA_DEBUG
to1
can help you troubleshoot issues by providing more detailed logs.
2. (Optional) Pull the Ollama Container
For simplicity, we will use the Ollama container image available on Docker Hub. You can pull it using Apptainer.
This step is optional, as the ollama-function.sh
script will build the image automatically if it’s not found
in the ${CONTAINER_DIR}
folder or as ${OLLAMA_IMG}
file.
$ PROJECT_DIR=</path/to/your/project/in/umbrella/or/bulk/storage>
$ mkdir -p ${PROJECT_DIR}/containers
$ apptainer build ${PROJECT_DIR}/containers/ollama.sif docker://ollama/ollama
WARNING: 'nodev' mount option set on /tmp, it could be a source of failure during build process
INFO: Starting build...
Copying blob 6574d8471920 done |
Copying blob 13b7e930469f done |
Copying blob 97ca0261c313 done |
Copying blob e0fa0ad9f5bd done |
Copying config b9d03126ef done |
Writing manifest to image destination
2025/06/24 12:57:55 info unpack layer: sha256:13b7e930469f6d3575a320709035c6acf6f5485a76abcf03d1b92a64c09c2476
2025/06/24 12:57:56 info unpack layer: sha256:97ca0261c3138237b4262306382193974505ab6967eec51bbfeb7908fb12b034
2025/06/24 12:57:57 info unpack layer: sha256:e0fa0ad9f5bdc7d30b05be00c3663e4076d288995657ebe622a4c721031715b6
2025/06/24 12:57:57 info unpack layer: sha256:6574d84719207f59862dad06a34eec2b332afeccf4d51f5aae16de99fd72b8a7
INFO: Creating SIF file...
INFO: Build complete: /tudelft.net/staff-bulk/ewi/insy/PRLab/Staff/aeahmed/ollama_tutorial/containers/ollama.sif
Tip
For more on using Apptainer, see the Apptainer tutorial.3. Quick Interactive Test
- Start an interactive GPU session:
$ sinteractive --cpus-per-task=2 --mem=500 --time=00:15:00 --gres=gpu --partition=general
Note: interactive sessions are automatically terminated when they reach their time limit (1 hour)!
srun: job 11642659 queued and waiting for resources
srun: job 11642659 has been allocated resources
13:01:27 up 93 days, 11:16, 0 users, load average: 2,85, 2,60, 1,46
- Once you are allocated resources on a compute node, set your project directory, source the
ollama-function.sh
script, and run the Ollama server (from the container):
export PROJECT_DIR=</path/to/your/project/in/umbrella/or/bulk/storage> # replace with your actual project path
source ollama-function.sh # Define the `ollama` function
ollama serve # The wrapper picks a free port and prints the server URL
Keep this terminal open to monitor logs and keep the Ollama server running.
Open a second terminal, login to DAIC, and interact with the server (e.g., from the login node). In the example below, we run the
codellama
model
export PROJECT_DIR=</path/to/your/project/in/umbrella/or/bulk/storage> # Ensure this matches the server's PROJECT_DIR
source ollama-function.sh
ollama run codellama # Forwards the command to the running server
You can check the health of the server by running:
$ curl http://$(cat ${PROJECT_DIR}/ollama/host.txt):$(cat ${PROJECT_DIR}/ollama/port.txt)
Ollama is running
Ollama is running
- Interact with the model by typing your queries. For example, you can ask it to generate code or answer questions.
>>> who are you?
I am LLaMA, an AI assistant developed by Meta AI that can understand and respond to
human input in a conversational manner. I am trained on a massive dataset of text from
the internet and can answer questions or provide information on a wide range of topics.
>>>
- Stop the server with
Ctrl‑C
in the server terminal. Thehost.txt
andport.txt
files will be cleaned up automatically.
4. Production batch jobs
The template already provides ready‐to‐run Slurm scripts. For convenience a single helper, start-serve-client.sh
,
submits the server and client jobs in the right order and passes your PROJECT_DIR
into both jobs.
To submit your jobs:
bash start-serve-client.sh \
-p </path/to/your/project/in/umbrella/or/bulk/storage> # Specify your project path. Defaults to `$PWD` if omitted.
What happens:
- Sets
PROJECT_DIR
to the path you pass (or defaults to$PWD
ifPROJECT_DIR
omitted), - Submits
ollama-server.sbatch
requesting GPU resources for serving your model - Submits
ollama-client.sbatch
with--dependency=after:<server‑id>
so it starts as soon as the server begins running.
To check progress of these jobs:
squeue -j <server‑job-id>,<client‑job-id>
Once the jobs have run, the typical logs are:
log-ollama-server-<server-job-id>.out
: showing the server has started and where it is running.log-ollama-client-<client-job-id>.log
: Showing example workflow of pulling a model (deepseek-r1:7b
), sending a prompt to the model and printing the response.
Client jobs
- As long as the server job is running you can submit additional client jobs that point to the same
PROJECT_DIR
- You can inspect the
ollama-client.sbatch
file for examples of how to interact with the server (from the command line or within scripts)
5. Best Practices
While you can run Ollama manually, the wrapper scripts provide several conveniences:
- Always serve on a GPU node. The wrapper prints an error if you try to serve from a login node.
- Client jobs don’t need
--nv
. The wrapper omits it automatically when no GPU is detected, eliminating noisy warnings. - Model cache is project‑scoped. All model blobs land in
$PROJECT_DIR/ollama/models
, so they don’t consume$HOME
quota. - Image builds use
/tmp
. The wrapper builds via a local cache to avoid premission errors. - Automatic cleanup. The wrapper cleans up
host.txt
andport.txt
files after the server stops, so you can tell if you have a server up and running.
6. Troubleshooting
Symptom | Fix |
---|---|
host.txt / port.txt not found | Start the server first: ollama serve (interactive) or submit ollama-server.sbatch . |
Could not find any nv files on this host! | Safe to ignore; client ran on CPU. |
Build fails with operation not permitted | Ensure the wrapper’s /tmp build cache patch is in place, or add --disable-cache . |
## Acknowledgment
Inspirtation for this tutorial comes from the Stanford ollama_helper repository.
The DAIC template adapts many of the same ideas to TU Delft’s Slurm environment.