Quickstart

Login to DAIC and submit your first SLURM job.

This guide provides the basic steps to get you started with the Delft AI Cluster (DAIC). You’ll learn how to log in and submit your first SLURM job.

General Workflow

The interactive diagram below outlines the general workflow when working with the Delft AI Cluster (DAIC) cluster. Each step links to detailed documentation.

---
title: "Cluster Workflow"
---
flowchart TB

    classDef cyan fill:#00A6D6,stroke:#007A99,stroke-width:2px;
    classDef white fill: #FFFFFF,stroke:#000000,stroke-width:2px;
    classDef yellow fill:#FFB81C,stroke:#FF8C00,stroke-width:2px;
    classDef green fill:#6CC24A,stroke:#3F6F21,stroke-width:2px;
    classDef darkgreen fill:#009B77,stroke:#006644,stroke-width:2px;

    Prerequisites:::yellow
    Quickstart:::green
    Resources:::darkgreen
    Reminder:::white
    Workflow:::cyan

    subgraph Practical[" "]
        direction RL

        subgraph Workflow[" "]
            direction TB    
            C[Set up software & dependencies in DAIC]

            D[Transfer data to DAIC]

            E["Test interactively"]

            F[Submit jobs & Monitor progress]
            H[Download results & clean up ]

            D --> E
            C --> E
            E --> F --> H 
            
            click C "/docs/manual/software/" "Software setup"
            click D "/docs/manual/data-management/data-transfer" "Data transfer methods"
            click E "/docs/manual/job-submission/job-interactive" "Interactive jobs on compute nodes"
            click F "/docs/manual/job-submission/job-scripts" "Job submission"
            click H "/support/faqs/job-resources#how-do-i-clean-up-tmp-when-a-job-fails" "How do I clean up tmp?"
        end
        subgraph  local["Develop locally, then port code"]
        end
    end  

    subgraph Reminder[" "]
        subgraph Resources
            direction LR
            r0@{ shape: hex, label: "DAIC support resources"}
            r1@{ shape: hex, label: "Handy commands on DAIC"}
            r2@{ shape: hex, label: "Command line basics" }

            click r0 "/support/" "DAIC support resources"
            click r1 "/docs/manual/commands" "List of handy commands"
            click r2 "https://swcarpentry.github.io/shell-novice/" "The software carpentry's Unix shell materials"
        end

        subgraph Quickstart
            direction LR
            q1@{ shape: hex, label: "Login via SSH"}
            %%q2@{ shape: hex, label: "Submit a job to SLURM"}
            click q1 "/docs/manual/connecting/" "Login via SSH"
        end

        subgraph Prerequisites
            direction LR
            p1@{ shape: hex, label: "User Account and Credentials"}
            p2@{ shape: hex, label: "Data Storage on University Network" }

            click p1 "https://tudelft.topdesk.net/tas/public/ssp/content/detail/service?unid=c6d0e44564b946eaa049898ffd4e6938&from=d75e860b-7825-4711-8225-8754895b3507" "Request an account"
            click p2 "https://tudelft.topdesk.net/tas/public/ssp/content/detail/service?unid=f359caaa60264f99b0084941736786ae" "Request storage"
        end
    end

Login via SSH

  1. Open your terminal and run the following SSH command:
$ ssh <YouNetID>@login.daic.tudelft.nl
  1. You will be prompted for your password:
The HPC cluster is restricted to authorized users only.

YourNetID@login.daic.tudelft.nl's password: 
Last login: Mon Jul 24 18:36:23 2023 from tud262823.ws.tudelft.net
 #########################################################################
 #                                                                       #
 # Welcome to login1, login server of the HPC cluster.                   #
 #                                                                       #
 # By using this cluster you agree to the terms and conditions.          #
 #                                                                       #
 # For information about using the HPC cluster, see:                     #
 # https://login.hpc.tudelft.nl/                                         #
 #                                                                       #
 # The bulk, group and project shares are available under /tudelft.net/, #
 # your windows home share is available under /winhome/$USER/.           #
 #                                                                       #
 #########################################################################
 18:40:16 up 51 days,  6:53,  9 users,  load average: 0,82, 0,36, 0,53
YourNetID@login1:~$ 

Congratulations, you just logged in to the Delft AI Cluster!

Submit a job to SLURM

To submit a Python script using SLURM:

  1. Create a Python script, eg, the file below named script.py:

    script.py
    
       import time
       time.sleep(60)  # Simulate some work.
       print("Hello SLURM!")
       

  2. Create a SLURM submission file submit.sh with the following content:

    submit.sh
    
       #!/bin/sh
       #SBATCH --partition=general   # Request partition. Default is 'general'. Select the best partition following the advice on  https://daic.tudelft.nl/docs/manual/job-submission/priorities/#priority-tiers
       #SBATCH --qos=short           # Request Quality of Service. Default is 'short' (maximum run time: 4 hours)
       #SBATCH --time=0:05:00        # Request run time (wall-clock). Default is 1 minute
       #SBATCH --ntasks=1            # Request number of parallel tasks per job. Default is 1
       #SBATCH --cpus-per-task=2     # Request number of CPUs (threads) per task. Default is 1 (note: CPUs are always allocated to jobs per 2).
       #SBATCH --mem=1GB             # Request memory (MB) per node. Default is 1024MB (1GB). For multiple tasks, specify --mem-per-cpu instead
       #SBATCH --mail-type=END       # Set mail type to 'END' to receive a mail when the job finishes. 
       #SBATCH --output=slurm_%j.out # Set name of output log. %j is the Slurm jobId
       #SBATCH --error=slurm_%j.err  # Set name of error log. %j is the Slurm jobId
       
       # Some debugging logs
       which python 1>&2  # Write path to Python binary to standard error
       python --version   # Write Python version to standard error
       
       # Run your script with the `srun` command:
       srun python script.py
       
  1. Submit the job to the queuing system with the sbatch command:

    $ sbatch submit.sh 
    Submitted batch job 9267828
    
  2. Monitor the job’s progress with the squeue command:

    $ squeue -u $USER 
    JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
    9267834   general script.s <netid>   R       0:18      1 grs1
    
  3. When your job finishes you will get a notification via email. Then you can see that two files have been created in your home directory, or in the directory where you submitted the job: slurm_9267834.out and slurm_9267834.err where the number corresponds to the job-id that SLURM had assigned to your job. You can see the content of the files with the cat command:

    $ cat slurm_9267834.err
    /usr/bin/python
    Python 2.7.5
    
    $ cat slurm_9267834.out
    Hello SLURM!
    

You can see that the standard output of your script has been written to the file slurm_9267834.out and the standard error was written to slurm_9267834.err. For more useful commands at your disposal have a look here.

For more detailed job submission instructions, see Job Submission.

Next Steps

It is strongly recommended that users read the Containers Tutorial. Using containers simplifies the process of setting up reproducible environments for your workflows, especially when working with different dependencies and software versions.

Additional resources

  1. DAIC training materials
  2. Unix Shell Basics (Software Carpentry)
  3. DAIC Support Resources