These pages contain basic concepts and details to make optimal use of TU Delft’s DAIC. Alternatively, you might with to jump to the Quickstart or Tutorials for more thematic content.
This is the multi-page printable view of this section. Click here to print.
Documentation
- 1: Introduction
- 2: Policies
- 3: System specifications
- 4: User manual
- 4.1: Best practices
- 4.2: Handy commands on DAIC
- 4.3: Connecting to DAIC
- 4.4: Data management
- 4.4.1: Data transfer
- 4.5: Software
- 4.5.1: Available software
- 4.5.2: Modules
- 4.5.3: Installing software
- 4.5.4: Containerization
- 4.6: Job submission
- 4.6.1: Priorities and waiting times
- 4.6.2: Quality of Service (QoS)
- 4.6.3: Partitions
- 4.6.4: Interactive jobs
- 4.6.5: Submitting jobs
- 4.6.6: Monitoring jobs
- 4.6.7: Cancelling jobs
- 4.6.8: Using graphic cards
- 4.6.9: Job arrays
- 4.6.10: Job chains
- 4.6.11: Reservations
- 4.6.12: Kerberos
1 - Introduction
What is an HPC cluster?
A High Performance Computing (HPC) cluster, is a collection of (large) computing resources, like Processors (CPUs), Graphics processors (GPUs), Memory and Storage, that are shared among a group of users. Using multiple computers as such makes it possible to perform lengthy and resource-intense computations beyond the capabilities of a single computer, and is especially handy for modern scientific computing applications where datasets are typically large in size, models are big in parameters’ size and complexity, and computations need specialized hardware (like GPUs and FPGAs).
What is DAIC?
The Delft AI Cluster (DAIC), formerly known as INSY-HPC or just plainly HPC, is a TU Delft High Performance Computing (HPC) cluster consisting of Linux compute nodes (ie servers) with a lot of processing power and memory for running large, long or GPU-enabled jobs.
From a CS only cluster in 2015, DAIC has grown in time to serve researchers across many TU Delft departments but maintained the needs of CS and AI in each expansion phase. Today, DAIC nodes are organized as partitions that correspond to the groups contributing these resources. (See Contributing departments and TU Delft clusters comparison).
1.1 - Contributors and funding
The Delft AI Cluster (DAIC) - formerly known as INSY-HPC or just plainly HPC- was initiated within the INSY department in 2015. Later, resources were joined with ST, collectively called CS@Delft, and with other departments across faculties in subsequent expansion cycles.
Joining DAIC?
If you are interested in joining DAIC as a contributor, please contact us via this TopDesk DAIC Contact Us form.Contributing departments
The cluster is available (only) to users from participating departments, and access can be arranged through the department’s contact persons (see Access and accounts).
I | DAIC partition | Contributor | Faculty | Faculty abbreviation (English/Dutch) |
---|---|---|---|---|
1 | 3dgi | 3D Geoinformation | Faculty of Architecture and the Built Environment | ABE/BK |
2 | asm | Aerospace Structures and Materials | Faculty of Aerospace Engineering | AE/LR |
3 | imphys | Imaging Physics | Faculty of Applied Sciences | AS/TNW |
4 | cor | Cognitive Robotics | Faculty of Mechanical Engineering | ME |
5 | grs | Geoscience & Remote Sensing | Faculty Of Civil Engineering and Geosciences | CEG/CiTG |
6 | influence | Intelligent Systems | Faculty of Electrical Engineering, Mathematics & Computer Science | EEMCS/EWI |
7 | insy | |||
8 | st | Software Technology |
Funding sources
In addition to funding received from departmental sources, DAIC has also been financially supported by the following projects and granting sources:
1.2 - Advisors and Impact
Advisory board
Department of Intelligent Systems
Pattern Recognition and Bioinformatics group
Department of Intelligent Systems
Interactive Intelligence group
Software Technology Department
Web Informatics group
Citation and Acknowledgement
To help demonstrate the impact of DAIC, we ask that you both cite and acknowledge DAIC in your scientific publications. Please use the following formats:
Delft AI Cluster (DAIC). (2024). The Delft AI Cluster (DAIC), RRID:SCR_025091. https://doi.org/10.4233/rrid:scr_025091
@misc{DAIC,
author = {{Delft AI Cluster (DAIC)}},
title = {The Delft AI Cluster (DAIC), RRID:SCR_025091},
year = {2024},
doi = {10.4233/rrid:scr_025091},
url = {https://doc.daic.tudelft.nl/}
}
TY - DATA
T1 - The Delft AI Cluster (DAIC), RRID:SCR_025091
UR - https://doi.org/10.4233/rrid:scr_025091
PB - TU Delft
PY - 2024
Research reported in this work was partially or completely facilitated by computational resources and support of the Delft AI Cluster (DAIC) at TU Delft (RRID: SCR_025091), but remains the sole responsibility of the authors, not the DAIC team.
Scientific impact in numbers
Since 2015, DAIC has facilitated more than 2000 scientific outputs from the various DAIC-participating departments:
Article | Conference/Meeting contribution | Book/Book chapter/Book editing | Dissertation (TU Delft) | Abstract | Other | Editorial | Patent | Grand Total | |
---|---|---|---|---|---|---|---|---|---|
Grand Total | 1067 | 854 | 123 | 99 | 69 | 32 | 29 | 8 | 2281 |
These outputs span a wide range of application areas, with titles reflecting an emphasis on data analysis and machine learning:
Reference
The table and wordcloud provided here are based on retrospective retrieval of all DAIC users’ scientific outputs between 2015-2023 from TU Delft’s Pure database. The data has been generated by the Strategic Development – Data Insights team.Publications using DAIC
Note
he compilation of the following list is done retrospectively by the Data Insights team and/or is based on self-reporting by individual researchers. As a result, it may not be exhaustive nor complete. If your publication is missing, please let us know by posting it to the ScientificOutput MatterMost channel.1.3 - TU Delft clusters comparison
Cluster comparison
TU Delft clusters
DAIC is one of several clusters accessible to TU Delft CS researchers (and their collaborators). The table below gives a comparison between these in terms of use case, eligible users, and other characteristics.
DAIC | DelftBlue | DAS | |
---|---|---|---|
Primary use cases | Research, especially in AI | Research & Education | Distributed systems research, streaming applications, edge and fog computing, in-network processing, and complex security and trust policies, Machine learning research, ... |
Contributors | Certain groups within TU Delft (see Contributing departments) | All TU Delft faculties | Multiple universities & SURF |
Eligible users |
| All TU Delft affiliates | |
Website | DAIC documentation | DelftBlue Documentation | DAS Documentation |
Contact info | DAIC community | DHPC team | DAS admin |
Request account | Access and accounts | Get an account | Email DAS admin with details like user's affiliation and the planned purpose of the account. |
Getting started | Quickstart | Crash course | |
Hardware | System specifications | DHPC hardware | Head node +
|
Software stack | Software | DHPC modules | Base OS: Rocky Linux, OpenHPC, Slurm Workload Manager |
Data storage | Storage | Storage | Storage: 128 TB (RAID6) |
Access to TU Delft Network storage | ✓ | Only in login nodes | Not supported |
Sharing data in collaboration | ✓ | ✗ | |
Has GPUs? | ✓ | ✓ | ✓ |
Cost of use | Contribution towards hardware purchase | - |
SURF clusters
SURF, the collaborative organization for IT in Dutch education and research, has installed and is currently operating the Dutch National supercomputer, Snellius, which houses 144 40GB A100 GPUs as of Q3 2021 (36 gcn nodes x 4 A100 GPUs/node = 144 A100 GPUs total) with other specs detailed in the Snellius hardware and file systems wiki.
SURF also operates other clusters like Spider for processing large structured data sets, and ODISSEI Secure Supercomputer (OSSC) for large-scale analyses of highly-sensitive data. For an overview of SURF clusters, see the SURF wiki.
TU Delft researchers in TBM and CITG already have direct and easy access to the compute power and data services of SURF, while members of other faculties need to apply for access as detailed in SURF’s guide to Apply for access to compute services.
TU Delft cloud resources
For both education and research activities, TU Delft has established the Cloud4Research program. Cloud4Research aims to facilite the use of public cloud resources, primarily Amazon AWS. At the administrative level, Cloud4Research provides AWS accounts with an initial budget. Subsequent billing can be incurred via a project code, instead of a personal credit card. At the technical level, the ICT innovation teams provides intake meetings to facilitate getting started. Please refer to the Policies and FAQ pages for more details.
2 - Policies
User agreement
This user agreement is intended to establish the expectations between all users and administrators of the cluster with respect to fair-use and fair-share of cluster resources. By using the DAIC cluster you agree to these terms and conditions.
General information about the DAIC cluster
- Cluster structure: The DAIC cluster is made up of shared resources contributed by different labs and groups. The pooling of resources from different groups is beneficial for everyone: it enables larger, parallelized computations and more efficient use of resources with less idle time.
- Basic principles: Regardless of the specific details, cluster use is always based on basic principles of fair-use and fair-share (through priority) of resources, and all users are expected to take care at all times that their cluster use is not hindering other users.
- Policies: Cluster policies are decided by the user board and enforced by various automated and non-automated actions, for example by the job scheduler based on QoS limits and the administrators for ensuring the stability and performance of the cluster.
- Support:
- Cluster administrators offer, during office hours, different levels of support, which include (in order of priority): ensuring the stability and performance of the cluster, providing generic software, helping with cluster-specific questions and problems, and providing information (via e-mails and during the board meeting) about cluster updates.
- Contact persons from participating groups add and manage users at the level of their respective groups, communicate needs and updates between their groups and system administrators, and may help with cluster-specific questions and problems.
- HPC Engineers, in CS@Delft, provide support to (CS) students, researchers and staff members to efficiently use DAIC resources. This includes: maintaining updated documentation resources, running onboarding and advanced training courses on cluster usage, organizing workshops to assess compute needs, plan infrastructure upgrades, and may collaborate with researchers on individual projects as fits.
- More information: Please see the General cluster usage and What to do in case of problems sections on where to find more information about cluster use.
- Cluster workflow:
- The typical steps for running a job on the cluster are: Test → Determine resources → Submit → Monitor job → Repeat until results are obtained. See Quickstart
- You can use the logins nodes for testing your code, determining the required resources and submitting jobs (see Computing on login nodes).
- For testing jobs which require larger resources (more than 4 CPUs and/or more than 4 GB of memory and/or one or more powerful GPUs), start an interactive job (see Interactive jobs).
- For determining resources of larger jobs, you can submit a single (short) test job (see Submitting jobs)
- QoS:
- A Quality of Service (QoS) is a set of limits that controls what resources a job can use and determines the priority level of a job. DAIC adopts multiple QoSs to optimize the throughput of job scheduling and to reduce the waiting times in the cluster (see Quality of Service).
- The DAIC QoS limits are set by the DAIC user board, and the scheduler strictly enforces these limits. Thus, no user can use more resources than the amount that was set by the user board.
- Any (perceived) imbalance in the use of resources by a certain QoS or user should not be hold against a user or the scheduler, but should be discussed in the user board.
General cluster usage
- You may use cluster resources for your research within the QoS restrictions of your domain user and user group. Depending on your user group, you might be eligible to use specific partitions, giving higher priorities on certain nodes. See Priority tiers, and please check this with your lab.
- Depending on your user group, you might be eligible to get priorities on certain nodes. For example, you might have access to a specialized partition or limited-time node reservation for your group or department (for example before a conference deadline). Please check this with your lab and try to use these in your
*.sbatch
file, your jobs should then start faster! See Resources Reservations for more information. - In general, you will be informed about standard administrative actions on the cluster. All official DAIC cluster e-mails are sent to your official TU Delft mailbox, so it is advised to check it regularly.
- You will receive e-mails about downtimes relating to scheduled maintenance.
- You, or your supervisor, will receive e-mails about scheduled cluster user board meetings where any updates and changes to the cluster structure, software, or hardware will be announced. Please check with your lab or feel free to join the cluster board meetings if you want to be up-to-date about any changes.
- You will receive automated e-mails regarding the efficiency of your jobs. The cluster monitors the use of resources of all jobs. When certain specific inefficiencies are detected for a significant number of jobs in the same day, an automated efficiency mail is sent to inform you about these problems with your resource use, to help you optimize your jobs. These mails will not lead to automatic cancellations or bans. To avoid spamming, limited inefficient use will not trigger a mail.
- You will receive an e-mail when your jobs are canceled or you receive a cluster ban (see the Expectations from cluster users and Regulations sections). You will be informed about why your jobs were canceled or why you were banned from the cluster (often before the bans take place). If the problem is still not clear to you from the e-mails you already received, please follow the steps detailed in the What to do in case of problems section.
- You are not entitled to receive personalized help on how to debug your code via e-mail. It is your responsibility to solve technical problems stemming from your code. Please first consult with your lab for a solution to a technical problem (see What to do in case of problems). However, admins might offer help, advice and solutions along with information regarding a job cancellation or ban. Please listen to such advice, it might help you solve your problem and improve fair use of the cluster.
- You may join cluster user board meetings. In the meetings you will be informed of any new developments, hardware and software updates and can suggest changes and improvements. These meetings take place roughly every 3 months and will be announced by e-mail and on the MatterMost channel.
Expectations from cluster users
- You are responsible for your jobs not interfering with other users’ cluster usage. Please try to always keep in mind that cluster resources are limited and shared between all users, and that fair use benefits everyone.
- You are not allowed to use the cluster for reasons unrelated to your studies and research.
- If your jobs are destructive to other users’ jobs or are threatening cluster integrity, your jobs might be canceled. You have the responsibility at all times to avoid behavior which interferes negatively with other users’ cluster usage. See Regulations.
- If the destructive behavior of your jobs does not change over time or you are unresponsive to e-mails from system admins requesting information or requiring immediate action regarding your cluster use, you might receive a ban from the cluster. See Regulations.
Regulations
- Your jobs might be canceled if:
- The node your jobs are running on becomes unresponsive and the node is automatically restarted.
- The job is overloading the node (for example overloading the network communication of the node).
- The job is adversely affecting the execution of other jobs (jobs that are not using all requested resources (effectively) and thus unfairly block waiting jobs from running may also be canceled).
- The jobs ignore the directions from the administrators (for example if a job is (still) affected by the same problem that the administrators informed you about before, and asked you to fix and test before resubmitting).
- The job is showing clear signs of a problem (like hanging, or being idle, or using only 1 CPU of the multiple CPUs requested, or not using a GPU that was requested).
- You might receive a cluster access ban for:
- Disallowed use of the cluster, including disallowed use of computing time, purposefully ignoring directions, guidelines, fair-use principles and/or (trying to gain) unauthorized access and/or causing disruptions to the cluster or parts thereof (even if unintentional).
- Unresponsiveness to e-mails from system admins requesting information or requiring immediate action regarding your cluster use.
- Repeated problems caused by your cluster use which go unsolved even after attempts to resolve the issue.
- Your cluster use privileges will be returned when all parties are confident that you understand the problem and it won’t reoccur.
- Your jobs won’t be canceled for:
- Scheduled maintenance. This is planned in advance and jobs that would run during scheduled maintenance times won’t start until the end of maintenance.
What to do in case of problems?
When you encounter problems, please follow the subsequent steps, in the indicated order:
- First, please contact your colleagues and fellow cluster users in your lab, concerning problems with your code, job performance and efficiency. They may be running similar jobs and potentially have solutions for your problem.
- You can also ask questions to fellow users on the MatterMost channel.
- For prolonged problems, your initial contact point is your supervisor/PI.
- As a final step, you can contact the cluster administrators for technical sysadmin problems or persistent efficiency problems, or for more information if you are not sure why you are banned from the cluster. You can do this by reporting your question, through the Self Service Portal , to the Service Desk. In your question, refer to the ‘DAIC cluster’.
- For severe recurring problems, complaints and suggestions for policy changes, or issues affecting multiple users, you can contact the DAIC advisory board to bring it up as an agenda point in the next user board meeting.
Responsible cluster usage
You are responsible that your jobs run efficiently:
- Please keep an eye on your jobs and the automated efficiency e-mails to check for unexpected behavior.
- Sometimes many jobs from the same user, or from student groups, will be running on many nodes at the same time. While this may seem like one user, or user group, is blocking the cluster for everyone else, please keep in mind that the scheduler operates on a set of predetermined rules based on the QoS and priority settings. We do not want idle resources. Therefore, at the time that those jobs were started, the resources were idle, no higher priority jobs were in the queue and the jobs did not exceed the QoS limits. If you repeatedly observe pending jobs, please bring it up in the user board meeting.
- Short job efficiency: If you are running many (hundreds or thousands of) very short jobs (duration of a few minutes), you may want to consider that starting and individually loading the same modules for each job may create overheads. When reasonably possible, it might save computation time to instead group some jobs together. The jobs can still be submitted to the
short
queue if the runtime is less than 4 hours. - GPU job efficiency: If you are running multi-GPU jobs (for example due to GPU memory limitations), you may want to consider that the communications between the GPUs and other CPU processes (for example data loaders) may create overheads. It might be useful to consider running jobs on less GPUs with more GPU memory each, or taking advantage of specialized libraries optimized for multi-GPU computing in your code.
Citing and acknowledging DAIC
Please cite and acknowledge DAIC in your scientific publications using the format specified in the Citation and Acknowledgement section.
Reporting of scientific outputs
Please remember to post any scientific output based-off work performed on DAIC to the ScientificOutput MatterMost channel.
Access and accounts
- DAIC is a cluster dedicated for TU Delft researchers (eg, PhD students, postdocs, .. etc) from participating groups (see Contributing departments).
- To access DAIC resources, eligible candidates from these groups can request an account via the DAIC request Access form.
- Additionally, requests for resources reservations can also be accommodated (see General cluster usage).
3 - System specifications
At present DAIC and DelftBlue have different software stacks. This pertains to the operating system (CentOS 7 vs Red Hat Enterprise Linux 8, respectively) and, consequently, the available software. Please refer to the respective DelftBlue modules and Software section before commencing your experiments.
Operating System
DAIC runs the Red Hat Enterprise Linux 7 Linux distribution, which provides the general Linux software. Most common software, including programming languages, libraries and development files for compiling your own software, is installed on the nodes (see Available software). However, a not-so-common program that you need might not be installed. Similarly, if your research requires a state-of-the-art program that is not (yet) available as a package for Red Hat 7, then it is not available. See Installing software for more information.
Login Nodes
The login nodes are the gateway to the DAIC HPC cluster and are specifically designed for lightweight tasks such as job submission, file management, and compiling code (on certain nodes). These nodes are not intended for running resource-intensive jobs, which should be submitted to the Compute Nodes.
Specifications and usage notes
Hostname | CPU (Sockets x Model) | Total Cores | Total RAM | Operating System | GPU Type | GPU Count | Usage Notes |
---|---|---|---|---|---|---|---|
login1 | 1 x Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz | 8 | 15.39 GB | OpenShift Enterprise | Quadro K2200 | 1 | For file transfers, job submission, and lightweight tasks. |
login2 | 1 x Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz | 1 | 3.70 GB | OpenShift Enterprise | N/A | N/A | Virtual server, for non-intensive tasks. No compilation. |
login3 | 2 x Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz | 32 | 503.60 GB | RHEV | Quadro K2200 | 1 | For large compilation and interactive sessions. |
Compute Nodes
DAIC compute nodes are all multi CPU servers, with large memories, and some with GPUs. The nodes in the cluster are heterogeneous, i.e. they have different types of hardware (processors, memory, GPUs), different functionality (some more advanced than others) and different performance characteristics. If a program requires specific features, you need to specifically request those for that job (see Submitting jobs).
Note
All compute nodes have Advanced Vector Extensions 1 and 2 (AVX, AVX2) support, and hyper-threading (ht
) processors (two CPUs per core, always allocated in pairs).Note
You can use Slurm’s sinfo
command to get various information about cluster nodes. For example, to get an overview of compute nodes on DAIC, you can use the command:
$ sinfo --all --format="%P %N %c %m %G %b" --hide -S P,N -a | grep -v "general" | awk 'NR==1 {print; next} {match($5, /gpu:[^,]+:[0-9]+/); if (RSTART) print $1, $2, $3, $4, substr($5, RSTART, RLENGTH), $6; else print $1, $2, $3, $4, "-", $6 }'
Check out the Slurm’s sinfo page and wikipedia’s awk page for more info on these commands.
List of all nodes
The following table gives an overview of current nodes and their characteristics:
Hostname | CPU (Sockets x Model) | Cores per Socket | Total Cores | CPU Speed (MHz) | Total RAM | GPU Type | GPU Count |
---|---|---|---|---|---|---|---|
100plus | 2 x Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz | 16 | 32 | 2097.488 | 755.585 GB | ||
3dgi1 | 1 x AMD EPYC 7502P 32-Core Processor | 32 | 32 | 2500 | 251.41 GB | ||
3dgi2 | 1 x AMD EPYC 7502P 32-Core Processor | 32 | 32 | 2500 | 251.41 GB | ||
awi01 | 2 x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz | 18 | 36 | 2996.569 | 376.384 GB | Tesla V100 PCIe 32GB | 1 |
awi02 | 2 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 14 | 28 | 2900.683 | 503.619 GB | Tesla V100 SXM2 16GB | 2 |
awi03 | 2 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 14 | 28 | 2899.951 | 503.625 GB | ||
awi04 | 2 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 14 | 28 | 3231.884 | 503.625 GB | ||
awi05 | 2 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 14 | 28 | 3258.984 | 503.625 GB | ||
awi07 | 2 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 14 | 28 | 2899.951 | 503.625 GB | ||
awi08 | 2 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 14 | 28 | 2899.951 | 503.625 GB | ||
awi09 | 2 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 14 | 28 | 2899.951 | 503.625 GB | ||
awi10 | 2 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 14 | 28 | 2899.951 | 503.625 GB | ||
awi11 | 2 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 14 | 28 | 2899.951 | 503.625 GB | ||
awi12 | 2 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 14 | 28 | 2899.951 | 503.625 GB | ||
awi19 | 2 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 14 | 28 | 2899.951 | 251.641 GB | ||
awi20 | 2 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 14 | 28 | 2899.951 | 251.641 GB | ||
awi21 | 2 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 14 | 28 | 2899.951 | 251.641 GB | ||
awi22 | 2 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz | 14 | 28 | 2899.951 | 251.641 GB | ||
awi23 | 2 x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz | 18 | 36 | 3221.038 | 376.385 GB | ||
awi24 | 2 x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz | 18 | 36 | 2580.2 | 376.385 GB | ||
awi25 | 2 x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz | 18 | 36 | 3399.884 | 376.385 GB | ||
awi26 | 2 x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz | 18 | 36 | 3442.7 | 376.385 GB | ||
cor1 | 2 x Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz | 16 | 64 | 3599.975 | 1510.33 GB | Tesla V100 SXM2 32GB | 8 |
gpu01 | 2 x AMD EPYC 7413 24-Core Processor | 24 | 48 | 2650 | 503.402 GB | NVIDIA A40 | 3 |
gpu02 | 2 x AMD EPYC 7413 24-Core Processor | 24 | 48 | 2650 | 503.402 GB | NVIDIA A40 | 3 |
gpu03 | 2 x AMD EPYC 7413 24-Core Processor | 24 | 48 | 2650 | 503.402 GB | NVIDIA A40 | 3 |
gpu04 | 2 x AMD EPYC 7413 24-Core Processor | 24 | 48 | 2650 | 503.402 GB | NVIDIA A40 | 3 |
gpu05 | 2 x AMD EPYC 7413 24-Core Processor | 24 | 48 | 2650 | 503.402 GB | NVIDIA A40 | 3 |
gpu06 | 2 x AMD EPYC 7413 24-Core Processor | 24 | 48 | 2650 | 503.402 GB | NVIDIA A40 | 3 |
gpu07 | 2 x AMD EPYC 7413 24-Core Processor | 24 | 48 | 2650 | 503.402 GB | NVIDIA A40 | 3 |
gpu08 | 2 x AMD EPYC 7413 24-Core Processor | 24 | 48 | 2650 | 503.402 GB | NVIDIA A40 | 3 |
gpu09 | 2 x AMD EPYC 7413 24-Core Processor | 24 | 48 | 2650 | 503.402 GB | NVIDIA A40 | 3 |
gpu10 | 2 x AMD EPYC 7413 24-Core Processor | 24 | 48 | 2650 | 503.402 GB | NVIDIA A40 | 3 |
gpu11 | 2 x AMD EPYC 7413 24-Core Processor | 24 | 48 | 2650 | 503.402 GB | NVIDIA A40 | 3 |
gpu14 | 2 x AMD EPYC 7543 32-Core Processor | 32 | 64 | 2794.613 | 503.275 GB | NVIDIA A40 | 3 |
gpu15 | 2 x AMD EPYC 7543 32-Core Processor | 32 | 64 | 2794.938 | 503.275 GB | NVIDIA A40 | 3 |
gpu16 | 2 x AMD EPYC 7543 32-Core Processor | 32 | 64 | 2794.604 | 503.275 GB | NVIDIA A40 | 3 |
gpu17 | 2 x AMD EPYC 7543 32-Core Processor | 32 | 64 | 2794.878 | 503.275 GB | NVIDIA A40 | 3 |
gpu18 | 2 x AMD EPYC 7543 32-Core Processor | 32 | 64 | 2794.57 | 503.275 GB | NVIDIA A40 | 3 |
gpu19 | 2 x AMD EPYC 7543 32-Core Processor | 32 | 64 | 2794.682 | 503.275 GB | NVIDIA A40 | 3 |
gpu20 | 2 x AMD EPYC 7543 32-Core Processor | 32 | 64 | 2794.651 | 1007.24 GB | NVIDIA A40 | 3 |
gpu21 | 2 x AMD EPYC 7543 32-Core Processor | 32 | 64 | 2794.646 | 1007.24 GB | NVIDIA A40 | 3 |
gpu22 | 2 x AMD EPYC 7543 32-Core Processor | 32 | 64 | 2794.963 | 1007.24 GB | NVIDIA A40 | 3 |
gpu23 | 2 x AMD EPYC 7543 32-Core Processor | 32 | 64 | 2794.658 | 1007.24 GB | NVIDIA A40 | 3 |
gpu24 | 2 x AMD EPYC 7543 32-Core Processor | 32 | 64 | 2794.664 | 1007.24 GB | NVIDIA A40 | 3 |
grs1 | 2 x Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz | 8 | 16 | 3499.804 | 251.633 GB | ||
grs2 | 2 x Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz | 8 | 16 | 3577.734 | 251.633 GB | ||
grs3 | 2 x Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz | 8 | 16 | 3499.804 | 251.633 GB | ||
grs4 | 2 x Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz | 8 | 16 | 3499.804 | 251.633 GB | ||
influ1 | 2 x Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz | 16 | 32 | 2955.816 | 376.391 GB | GeForce RTX 2080 Ti | 8 |
influ2 | 2 x Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz | 16 | 32 | 2300 | 187.232 GB | GeForce RTX 2080 Ti | 4 |
influ3 | 2 x Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz | 16 | 32 | 2300 | 187.232 GB | GeForce RTX 2080 Ti | 4 |
influ4 | 2 x AMD EPYC 7452 32-Core Processor | 32 | 64 | 1500 | 251.626 GB | ||
influ5 | 2 x AMD EPYC 7452 32-Core Processor | 32 | 64 | 2350 | 503.611 GB | ||
influ6 | 2 x AMD EPYC 7452 32-Core Processor | 32 | 64 | 1500 | 503.61 GB | ||
insy15 | 2 x Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz | 16 | 32 | 2300 | 754.33 GB | GeForce RTX 2080 Ti Rev. A | 4 |
insy16 | 2 x Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz | 16 | 32 | 2300 | 754.33 GB | GeForce RTX 2080 Ti Rev. A | 4 |
Total | 1206 | 2380 | 28 TB | 101 | |||
CPUs
All nodes have multiple Central Processing Units (CPUs) that perform the operations. Each CPU can process one thread (i.e. a separate string of computer code) at a time. A computer program consists of one or multiple threads, and thus needs one or multiple CPUs simultaneously to do its computations (see wikipedia's CPU page ).
Note
Most programs use a fixed number of threads. Requesting more CPUs for a program than its number of threads will not make it any faster because it won’t know how to use the extra CPUs. When a program has less CPUs available than its number of threads, the threads will have to time-share the available CPUs (i.e. each thread only gets part-time use of a CPU), and, as a result, the program will run slower (And even slower because of the added overhead of the switching of the threads). So it’s always necessary to match the number of CPUs to the number of threads, or the other way around. See submitting jobs for setting resources for batch jobs.The number of threads running simultaneously determines the load of a server. If the number of running threads is equal to the number of available CPUs, the server is loaded 100% (or 1.00). When the number of threads that want to run exceed the number of available CPUs, the load rises above 100%.
The CPU functionality is provided by the hardware cores in the processor chips in the machines. Traditionally, one physical core contained one logical CPU, thus the CPUs operated completely independent. Most current chips feature hyper-threading: one core contains two (or more) logical CPUs. These CPUs share parts of the core and the cache, so one CPU may have to wait when a shared resource is in use by the other CPU. Therefore these CPUs are always allocated in pairs by the job scheduler.
GPUs
A few types of GPUs are available in some of DAIC nodes, as shown in table 1. The total numbers of these GPUs/type and their technical specifications are shown in table 2. See using graphic cards for requesting GPUs for a computational job.
GPU (slurm) type | Count | Model | Architecture | Compute Capability | CUDA cores | Memory |
---|---|---|---|---|---|---|
a40 | 66 | NVIDIA A40 | Ampere | 8.6 | 10752 | 46068 MiB |
turing | 24 | NVIDIA GeForce RTX 2080 Ti | Turing | 7.5 | 4352 | 11264 MiB |
v100 | 11 | Tesla V100-SXM2-32GB | Volta | 7.0 | 5120 | 32768 MiB |
In table 2: the headers denote:
Model
: The official product name of the GPUArchitecture
: The hardware design used, and thus the hardware specifications and performance characteristics of the GPU. Each new architecture brings forward a new generation of GPUs.Compute capability
: determines the general functionality, available features and CUDA support of the GPU. A GPU with a higher capability supports more advanced functionality.CUDA cores
: The number of cores perform the computations: The more cores, the more work can be done in parallel (provided that the algorithm can make use of higher parallelization).Memory
: Total installed GPU memory. The GPUs provide their own internal (fixed-size) memory for storing data for GPU computations. All required data needs to fit in the internal memory or your computations will suffer a big performance penalty.
Note
To inspect a given GPU and obtain the data of table 2, you can run the following commands on an interactive session or an sbatch script (see Jobs on GPU resources). The apptainer image used in this code snippet was built as demonstrated in the Apptainer tutorial.
$ sinteractive --cpus-per-task=2 --mem=500 --time=00:02:00 --gres=gpu
Note: interactive sessions are automatically terminated when they reach their time limit (1 hour)!
srun: job 8607783 queued and waiting for resources
srun: job 8607783 has been allocated resources
15:50:29 up 51 days, 3:26, 0 users, load average: 60,33, 59,72, 54,65
SomeNetID@influ1:~$ nvidia-smi --format=csv,noheader --query-gpu=name
NVIDIA GeForce RTX 2080 Ti
SomeNetID@influ1:~$ nvidia-smi -q | grep Architecture
Product Architecture : Turing
SomeNetID@influ1:~$ nvidia-smi --query-gpu=compute_cap --format=csv,noheader
7.5
SomeNetID@influ1:~$ apptainer run --nv cuda_based_image.sif | grep "CUDA Cores" # using the apptainer image of the tutorial
(068) Multiprocessors, (064) CUDA Cores/MP: 4352 CUDA Cores
SomeNetID@influ1:~$ nvidia-smi --format=csv,noheader --query-gpu=memory.total
11264 MiB
SomeNetID@influ1:~$ exit
Memory
All machines have large main memories for performing computations on big data sets. A job cannot use more than it’s allocated amount of memory. If it needs to use more memory, it will fail or be killed. It’s not possible to combine the memory from multiple nodes for a single task. 32-bit programs can only address (use) up to 3Gb (gigabytes) of memory. See Submitting jobs for setting resources for batch jobs.
Storage
DAIC compute nodes have direct access to the TU Delft home, group and project storage. You can use your TU Delft installed machine or an SCP or SFTP client to transfer files to and from these storage areas and others (see data transfer) , as is demonstrated throughout this page.
File System Overview
Unlike TU Delft’s
DelftBlue
, DAIC does not have a dedicated storage filesystem. This means no /scratch
space for storing temporary files (see DelftBlue’s
Storage description
and
Disk quota and scratch space
). Instead, DAIC relies on direct connection to the TU Delft network storage filesystem (see
Overview data storage
) from all its nodes, and offers the following types of storage areas:
Personal storage (aka home folder)
The Personal Storage is private and is meant to store personal files (program settings, bookmarks). A backup service protects your home files from both hardware failures and user error (you can restore previous versions of files from up to two weeks ago). The available space is limited by a quota limit (since this space is not meant to be used for research data).
You have two (separate) home folders: one for Linux and one for Windows (because Linux and Windows store program settings differently). You can access these home folders from a machine (running Linux or Windows OS) using a command line interface or a browser via
TU Delft's webdata
. For example, Windows home has a My Documents
folder. My documents
can be found on a Linux machine under /winhome/<YourNetID>/My Documents
Home directory | Access from | Storage location |
---|---|---|
Linux home folder | ||
Linux | /home/nfs/<YourNetID> | |
Windows | only accessible using an scp/sftp client (see SSH access) | |
webdata | not available | |
Windows home folder | ||
Linux | /winhome/<YourNetID> | |
Windows | H: or \\tudelft.net\staff-homes\[a-z]\<YourNetID> | |
webdata | https://webdata.tudelft.nl/staff-homes/[a-z]/<YourNetID> |
It’s possible to access the backups yourself. In Linux the backups are located under the (hidden, read-only) ~/.snapshot/
folder. In Windows you can right-click the H:
drive and choose Restore previous versions
.
Note
To see your disk usage, run something like:
du -h '</path/to/folder>' | sort -h | tail
Group storage
The Group Storage is meant to share files (documents, educational and research data) with department/group members. The whole department or group has access to this storage, so this is not for confidential or project data. There is a backup service to protect the files, with previous versions up to two weeks ago. There is a Fair-Use policy for the used space.
Destination | Access from | Storage location |
---|---|---|
Group Storage | ||
Linux | /tudelft.net/staff-groups/<faculty>/<department>/<group> or | |
/tudelft.net/staff-bulk/<faculty>/<department>/<group>/<NetID> | ||
Windows | M: or \\tudelft.net\staff-groups\<faculty>\<department>\<group> or | |
L: or \\tudelft.net\staff-bulk\ewi\insy\<group>\<NetID> | ||
webdata | https://webdata.tudelft.nl/staff-groups/<faculty>/<department>/<group>/ |
Project Storage
The Project Storage is meant for storing (research) data (datasets, generated results, download files and programs, …) for projects. Only the project members (including external persons) can access the data, so this is suitable for confidential data (but you may want to use encryption for highly sensitive confidential data). There is a backup service and a Fair-Use policy for the used space.
Project leaders (or supervisors) can request a Project Storage location via the Self-Service Portal or the Service Desk .
Destination | Access from | Storage location |
---|---|---|
Project Storage | ||
Linux | /tudelft.net/staff-umbrella/<project> | |
Windows | U: or \\tudelft.net\staff-umbrella\<project> | |
webdata | https://webdata.tudelft.nl/staff-umbrella/<project> or
|
Tip
Data deleted from project storage,staff-umbrella
, remains in a hidden .snapshot
folder. If accidently deleted, you can recover such data by copying it from the (hidden).snapshot
folder in your storage.Local Storage
Local storage is meant for temporary storage of (large amounts of) data with fast access on a single computer. You can create your own personal folder inside the local storage. Unlike the network storage above, local storage is only accessible on that computer, not on other computers or through network file servers or webdata. There is no backup service nor quota. The available space is large but fixed, so leave enough space for other users. Files under /tmp
that have not been accessed for 10 days are automatically removed.
Destination | Access from | Storage location |
---|---|---|
Local storage | ||
Linux | /tmp/<NetID> | |
Windows | not available | |
webdata | not available |
Memory Storage
Memory storage is meant for short-term storage of limited amounts of data with very fast access on a single computer. You can create your own personal folder inside the memory storage location. Memory storage is only accessible on that computer, and there is no backup service nor quota. The available space is limited and shared with programs, so leave enough space (the computer will likely crash when you don’t!). Files that have not been accessed for 1 day are automatically removed.
Destination | Access from | Storage location |
---|---|---|
Memory storage | ||
Linux | /dev/shm/<NetID> | |
Windows | not available | |
webdata | not available |
Warning
Use this only when using other storage makes your job or the whole computer slow.Workload scheduler
DAIC uses the Slurm scheduler to efficiently manage workloads. All jobs for the cluster have to be submitted as batch jobs into a queue. The scheduler then manages and prioritizes the jobs in the queue, allocates resources (CPUs, memory) for the jobs, executes the jobs and enforces the resource allocations. See the job submission pages for more information.
A slurm-based cluster is composed of a set of login nodes that are used to access the cluster and submit computational jobs. A central manager orchestrates computational demands across a set of compute nodes. These nodes are organized logically into groups called partitions, that defines job limits or access rights. The central manager provides fault-tolerant hierarchical communications, to ensure optimal and fair use of available compute resources to eligible users, and make it easier to run and schedule complex jobs across compute resources (multiple nodes).
4 - User manual
4.1 - Best practices
The available processing power and memory in DAIC is large, but still limited. You should use the available resources efficiently and fairly. This page lays out a few general principles and guidelines for considerate use of DAIC.
Using shared resources
The computing nodes within DAIC are primarily meant to run large, long (non-interactive) jobs. You share these resources with other users across departments. Thus, you need to be cautious of your usage so you do not hinder other users.
To help protect the active jobs and resources, when a login node becomes overloaded, new logins to this node are automatically disabled. This means that you will sometimes have to wait for other jobs to finish and at other times ICT may have to kill a job to create space for other users.
One rule: Respect your fellow users.
Implication: we reserve the right to terminate any job or process that we feel is clearly interfering with the ability of others to complete work, regardless of technical measures or its resource usage.
Best practices
- Connect only directly from the bastion server to the login nodes (See Connecting to DAIC)
- Always choose the login node with the lowest use (most importantly system load and memory usage), by checking the
Current resource usage page
or the
servers
command for information.- Each login node displays a message at login. Make sure you understand it before proceeding. This message includes the current load of the node, so look at it at every login
- Only use the storage best suited to your files (See Storage).
Do interactive code development, debugging and testing in your local machine, as much as possible. In the cluster, try to organize your code as scripts, instead of working interactively in the command line.
If you need to test and debug in the cluster, for example, in a GPU node, request an interactive session and do not work in the login node itself (See Interactive jobs on compute nodes).
Save results frequently: your job can crash, the compute node can become overloaded, or the network shares can become unavailable.
Write your code in a modular way, so that you can continue the job from the point where it last crashed.
- Actively monitor the status of your jobs:
- Make sure your job runs normally and is not hindering other jobs. Check the following at the start of a job and thereafter at least twice a day:
- If your job is not working correctly (or halted) because of a programming error, terminate it immediately; debug and fix the problem instead of just trying again (the result will almost certainly be exactly the same).
- If your
screen
’s Kerberos ticket has expired, renew it so your job can successfully save it’s results. - Use the
top
program to monitor the cpu (%CPU
) and memory (%MEM
) usage of your code. If either is too high, kill your code so it doesn’t cause problems for other users. - Don’t leave
top
running unless your are continuously watching it; press q to quit. - Watch the current resource usage (see
Current resource usage page
or use the
servers
command), and if the node is running close to it’s limits (higher than 90% load or memory, swap or disk usage), consider moving your job to a less busy node.
- Make sure your job runs normally and is not hindering other jobs. Check the following at the start of a job and thereafter at least twice a day:
Computing on login nodes
You can use login nodes for basic tasks like compiling software, preparing submission scripts for the batch queue, submitting and monitoring jobs in the batch queue, analyzing results, and moving data or managing files.
Small-scale interactive work may be acceptable on login nodes if your resource requirements are minimal.
- Please do not run production research computations on the login nodes. If necessary, request an interactive session in these cases (See Interactive jobs on compute nodes)
Note
Most multi-threaded applications (such asJava
and Matlab
) will automatically use all cpu cores of a node, and thus take away processing power from other jobs. If you can specify the number of threads, set it to at most 25% (¼) of the cores in that node (for a node with 16 cores, use at most 4; this leaves enough processing capacity for other users). Also see How do I request CPUs for a multithreaded program?4.2 - Handy commands on DAIC
BASH commands
BASH (Bourne Again SHell) is an open-source Unix shell and command language. It is the default shell on many Linux distributions and macOS, and it’s available on Windows via the Windows Subsystem for Linux, Git BASH, and other emulators. BASH is widely used for scripting and automating tasks in a computing environment. Below are some fundamental BASH commands with examples and brief explanations, aiding users in effective navigation and task execution. Remember to use these commands carefully, especially those that can modify or delete files and directories. They are fundamental tools for interacting with BASH and managing your tasks effectively.
man
The man
command is a tool for displaying the manual pages (documentation) of various commands and utilities available on Unix-like operating systems. It is an essential resource for users seeking detailed information about a specific command, program, or configuration file.
Basic Usage
Display the manual page for a command:
man <command>
This displays the manual page for the specified command.
Examples
Show the manual page for the ls
command:
man ls
Show the manual page for the man
command:
man man
echo
Used for displaying a line of text/string that is passed as an argument. This is a fundamental command for displaying output in shell scripts.
Example: Display “Hello, World!”.
echo "Hello, World!"
cd
Changes the current directory to another directory. It’s a basic command to navigate through the filesystem.
Example: Change to the home directory.
cd ~
ls
Lists the contents of a directory. It’s a key command to view files and directories.
Example: List all files and directories in the current directory, including hidden files.
ls -a
tree
The tree
command is a utility that displays the directory structure of a path in a tree-like format. It provides a visual representation of the hierarchy of files and directories, making it easier to understand the organization of a file system.
Basic Usage
Display the directory tree structure:
tree [path]
This command displays the directory structure starting from the specified path or the current directory if no path is specified.
Options
-a
: Display all files and directories, including hidden ones (those starting with a dot).-d
: Display only directories, omitting files.-L level
: Limit the depth of the tree to the specified level.--noreport
: Suppress the file and directory count summary at the end of the output.-H baseHREF
: Create an HTML output starting with the specified base URL.-o filename
: Output the tree structure to a file with the specified name.--charset encoding
: Use the specified character encoding (e.g.,UTF-8
).-P pattern
: Only display files matching the specified pattern (e.g.,*.txt
).-I pattern
: Exclude files and directories matching the specified pattern (e.g.,*.bak
).
Examples
Display the directory tree structure starting from the current directory:
tree
Display the directory tree structure from a specific path:
tree /path/to/start
Display only directories in the tree structure:
tree -d
Display the tree structure and limit the depth to 2 levels:
tree -L 2
Display the tree structure and output it to a file:
tree -o output.txt
Display all files and directories, including hidden ones:
tree -a
The tree
command is a helpful tool for quickly understanding the layout of a directory and its contents. It is especially useful for navigating complex file systems and identifying the location of files and directories within a hierarchy.
which
The which
command shows the full path of a command’s executable file by searching the directories listed in the PATH
environment variable.
Basic Usage
Find the path of a command:
which command
This displays the full path of the specified command’s executable file.
Examples
Find the path of the
ls
command:which ls
Find the path of the
python
command:which python
whereis
The whereis
command locates not only the executable file but also the source and manual page files of a command, if available.
Basic Usage
Locate a command:
whereis command
This displays the paths to the executable, source, and manual page files of the specified command, if they exist.
Options
-b
: Search only for binaries (executable files).-m
: Search only for manual pages.-s
: Search only for source files.-u
: Search for any missing information (binaries, source, or manual) and report it.-B path
: Add a directory to the search path for binaries.-M path
: Add a directory to the search path for manual pages.-S path
: Add a directory to the search path for source files.
Examples
Locate the
ls
command:whereis ls
Locate the
gcc
command and its source files:whereis -s gcc
cat
Concatenates and displays file contents. It’s commonly used to view the contents of a file.
Example: Display the contents of a file named example.txt
.
cat example.txt
grep
Searches for patterns in files. It’s a powerful tool for searching text using patterns.
Example: Search for the word “example” in file.txt
.
grep "example" file.txt
find
Searches for files in a directory hierarchy. This command is essential for locating files and directories.
Example: Find all .txt files in the current directory.
find . -name "*.txt"
mkdir
Creates a new directory.
Example: Create a directory named new_directory
.
mkdir new_directory
rm
Removes files or directories. It’s a critical command for file management.
Example 1: Remove a file named example.txt
.
rm example.txt
Example 2: Remove a directory and its contents (recursively).
rm -r directory_name
Warning: Be extremely cautious with rm -r
, especially when used with .
(current directory) or ..
(parent directory), as this can lead to irreversible deletion of files. Never use rm -r .
in a directory unless you are absolutely sure about deleting all its contents.
cp
Copies files and directories.
Example: Copy file1.txt
to file2.txt
.
cp file1.txt file2.txt
mv
Moves or renames files and directories.
Example: Rename oldname.txt
to newname.txt
.
mv oldname.txt newname.txt
for, do, done
A for loop in Bash allows you to iterate over a list of items, such as an array, a set of files, or even a range of numbers. Below, I will provide you with a few examples of how you can use a for loop in Bash.
Iterating over a list of strings
In this example, the for loop iterates over a list of strings and prints each one:
# List of items
items=("apple" "banana" "cherry")
# Loop through each item
for item in "${items[@]}"; do
echo "Item: $item"
done
if, (else), then
The if
statement in Bash scripting is used to execute a block of code conditionally based on whether an expression evaluates to true or false. Below are examples of how you can use an if statement in Bash:
filepath="/path/to/file.txt"
if [ -f "$filepath" ]; then
echo "The file exists."
else
echo "The file does not exist."
fi
alias
In Bash, an alias
is a shortcut for a command. You can define an alias to simplify the execution of commonly used commands or to add default options to commands you frequently use. Here are some examples of how to create and use aliases in Bash:
Creating a simple alias
You can create an alias by using the alias command followed by the alias name and the command it represents. Here’s an example of a simple alias:
alias ll="ls -l"
Another commonly used alias is md
as a shortcut for mkdir
:
alias md="mkdir"
You can add these instructions to your .bashrc
file in order to load them when logging in to the cluster.
Slurm commands
SLURM (Simple Linux Utility for Resource Management) is an open-source job scheduler used on many of the world’s supercomputers and compute clusters. It allows users to efficiently manage computing resources and queue their computational jobs for execution. Below are some essential SLURM commands with examples and brief explanations, helping users navigate and utilize these resources effectively. Remember to replace <jobid>
with your specific job ID where necessary. These commands are vital tools for interacting with SLURM and managing your compute tasks effectively.
sinteractive
For requesting an interactive node, typically during testing phases. Compute resources such as memory, time, and GPUs are specified as part of the command, similar to sbatch
directives.
Example: Request a 10-minute GPU node session.
sinteractive --time=00:10:00 --gres=gpu
sbatch
Used for submitting a script to SLURM for queuing in batch mode. The script includes directives at the top to specify required resources.
Example: Submit a job using a script named script.sh
.
sbatch script.sh
squeue
Checks the status of jobs in the SLURM queue. Useful for tracking your job’s status and understanding the queue’s state, and to find a specific jobid
of a particular job.
Example: Check the status of all your queued jobs.
squeue -u $USER
scancel
Cancels a job or all jobs of a user. Vital for managing jobs that are no longer needed or were submitted in error.
Example 1: Cancel a specific job with job ID <jobid>
.
scancel <jobid>
Example 2: Cancel all jobs for the current user.
scancel -u $USER
slurmtop
A DAIC-specific command to view the top jobs in the queues and their resource usage.
Example:
slurmtop
scontrol
Shows detailed information and resources allocated to the job with the specified SLURM job ID.
Example: Show details of a job with job ID jobid
.
scontrol show job <jobid>
sinfo
Displays information about SLURM nodes and partitions. Key command for understanding the state of the cluster.
Example: Display information about all nodes and partitions.
sinfo
sacct
Displays accounting data for all jobs and job steps. Useful for tracking resource usage and performance metrics. Example: Display accounting data for all jobs.
sacct --format=JobID,JobName%30,State,Elapsed,Timelimit,AllocNodes,Priority,Start,NodeList
Other
module
In the context of Unix-like operating systems, the module
command is part of the environment modules system, a tool that provides a dynamic approach to managing the user environment. This system allows users to load and unload different software packages or environments on demand.
Basic Usage
Load a module:
module load module-name
This command loads the specified module, setting up the environment variables and paths needed for the software package.
Unload a module:
module unload module-name
This command unloads the specified module, removing any environment variables and paths associated with it.
For a more detailed description of module
see Modules.
4.3 - Connecting to DAIC
SSH access
If you have a valid DAIC account (see Access and accounts), you can access DAIC resources using an SSH client. SSH (Secure SHell) is a protocol that allows you to connect to a remote computer via a secure network connection. SSH supports remote command-line login and remote command execution. SCP (Secure CoPy) and SFTP (Secure File Transfer Protocol) are file transfer protocols based on SSH (see wikipedia's ssh page ).
SSH clients
Most modern operating systems like Linux, macOS, and Windows 10 include SSH, SCP, and SFTP clients (part of the OpenSSH package) by default. If not, you can install third-party programs like:
MobaXterm , PuTTY page , or FileZilla .Access from the TU Delft Network
To connect to DAIC within TU Delft network (ie, via eduram or wired connection), open a command-line interface (prompt, or terminal, see Wikipedia's CLI page ), and run the following command:
$ ssh <YourNetID>@login.daic.tudelft.nl # Or
$ ssh login.daic.tudelft.nl # If your username matches your NetID
<YourNetID>
is your TU Delft NetID. If the username on your machine you are connecting from matches your NetID, you can omit the square brackets and their contents,[<YourNetID>@]
.
This will log you in into DAIC’s login1.daic.tudelft.nl
node for now. Note that this setup might change in the future as the system undergoes migration, potentially reducing the number of login nodes..
Note
Currently DAIC has 3 login nodes:login1.daic.tudelft.nl
, login2.daic.tudelft.nl
, and login3.daic.tudelft.nl
. You can connect to any of these nodes directly as per your needs. For more on the choice of login nodes, see DAIC login nodes.Note
Upon first connection to an SSH server, you will be prompted to confirm the server’s identity, with a message similar to:
The authenticity of host 'login.daic.tudelft.nl (131.180.183.244)' can't be established.
ED25519 key fingerprint is SHA256:MURg8IQL8oG5o2KsUwx1nXXgCJmDwHbttCJ9ljC9bFM.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'login.daic.tudelft.nl' (ED25519) to the list of known hosts.
A distinct fingerprint will be shown for each login node, as below:
SHA256:MURg8IQL8oG5o2KsUwx1nXXgCJmDwHbttCJ9ljC9bFM
SHA256:MURg8IQL8oG5o2KsUwx1nXXgCJmDwHbttCJ9ljC9bFM
SHA256:O3AjQQjCfcrwJQ4Ix4dyGaUoYiIv/U+isMT5+sfeA5Q
Once identity confirmed, enter your password when prompted (nothing will be printed as you type your password):
The HPC cluster is restricted to authorized users only.
YourNetID@login.daic.tudelft.nl's password:
Next, a welcome message will be shown:
Last login: Mon Jul 24 18:36:23 2023 from tud262823.ws.tudelft.net
#########################################################################
# #
# Welcome to login1, login server of the HPC cluster. #
# #
# By using this cluster you agree to the terms and conditions. #
# #
# For information about using the HPC cluster, see: #
# https://login.hpc.tudelft.nl/ #
# #
# The bulk, group and project shares are available under /tudelft.net/, #
# your windows home share is available under /winhome/$USER/. #
# #
#########################################################################
18:40:16 up 51 days, 6:53, 9 users, load average: 0,82, 0,36, 0,53
And, now you can now verify your environment with basic commands:
YourNetID@login1:~$ hostname # show the current hostname
login1.hpc.tudelft.nl
YourNetID@login1:~$ echo $HOME # show the path to your home directory
/home/nfs/YourNetID
YourNetID@login1:~$ pwd # show current path
/home/nfs/YourNetID
YourNetID@login1:~$ exit # exit current connection
logout
Connection to login.daic.tudelft.nl closed.
In this example, the user, YourNetID
, is logged in via the login node login1.hpc.tudelft.nl
as can be seen from the hostname
output. The user has landed in the $HOME
directory, as can be seen by printing its value, and checked by the pwd
command. Finally, the exit
command is used to exit the cluster.
Graphical applications
We discourage running graphical applications (viassh -X
) on DAIC login nodes, as GUI applications are not supported on the HPC systems.Access from outside university network
Direct access to DAIC from outside the university network is blocked by a firewall. To access DAIC, you have two options:
1. Using the Linux Bastion Server
To connect to DAIC via the Linux Bastion Server:
SSH into the bastion server. The bastion server acts as a gateway to the DAIC cluster.
- If you are an employee or guest, use
linux-bastion.tudelft.nl
. - If you are a student (BSc or MSc) use
student-linux.tudelft.nl
.
ssh <YourNetID>@linux-bastion.tudelft.nl #OR ssh linux-bastion.tudelft.nl # If your username matches your NetID
As with DAIC login nodes, the first time you attempt to login to the bastion, you will be asked to confirm the server’s identity. Upon confirmation and entering your password, a welcome screen will be shown:
The authenticity of host 'linux-bastion.tudelft.nl (131.180.123.195)' can't be established. ED25519 key fingerprint is SHA256:VJUFsQkIebODETsXwczkInnRrpdYYqAZDbsoKP1we+A. This key is not known by any other names Are you sure you want to continue connecting (yes/no/[fingerprint])? yes Warning: Permanently added 'linux-bastion.tudelft.nl' (ED25519) to the list of known hosts. YourNetID@linux-bastion.tudelft.nl's password: ____ ____ _____ ___ _ ____ _|___ \|___ \___ | / __| '__\ \ / / __) | __) | / / \__ \ | \ V / / __/ / __/ / / |___/_| \_/ |_____|_____/_/ YourNetID@srv227:~$
- If you are an employee or guest, use
Once on the bastion server, SSH into DAIC as shown in SSH access.
YourNetID@srv227:~$ ssh login.daic.tudelft.nl # Or any other login node
Tip
To simplify this procedure, use SSH’s proxy jump feature to access DAIC via the bastion server:
$ ssh -J [<YourNetID>@]linux-bastion.tudelft.nl [<YourNetID>@]login.daic.tudelft.nl
2. Using a VPN
You can also use TU Delft’s EduVPN or OpenVPN (See TU Delft’s Access via VPN recommendations ) to access DAIC directly. Once connected to the VPN, you can ssh to DAIC directly, as in Access from the TU Delft Network.
VPN access trouble?
If you are having trouble accessing DAIC via the VPN, please report an issue via this Self-Service link.Simplifying SSH with Configuration Files
To simplify SSH connections, you can store configurations in a file in your local machine. The SSH configuration file can be created (or found, if already exists) in ~/.ssh/config
on Linux/Mac systems, or in C:\Users\<YourUserName>\.ssh
on Windows.
For example, on a Linux system, you can have the following lines in the configuration file:
~/.ssh/config
Host daic
HostName login.daic.tudelft.nl # Or any other login node
User <YourNetID>
Host bastion
Hostname linux-bastion.tudelft.nl # If employee/guest. Else, use: student-linux.tudelft.nl instead
User <YourNetID>
PreferredAuthentications password
where:
- The
Host
keyword starts the SSH configuration block and specifies the name (or pattern of names, likedaic
in this example) to which the configuration entries will apply.- The
HostName
is the actual hostname to log into. Numeric IP addresses are also permitted (both on the command line and in HostName specifications).- The
User
is the login username. This is especially important when the username differs between your machine and the remote server/cluster.
You can then connect to DAIC from inside TU Delft network by just typing the following command:
$ ssh daic
Or, if outside the university network, you can connect via the bastion server:
$ ssh bastion
And, similarly, you can create/modify the configuration file on the bastion
server (in ~/ssh/config
) by adding a Host
configuration block for DAIC as above, to simplify the connection to DAIC from there.
ssh proxy jump feature
To connect directly from your machine to a DAIC login node (when outside the university network), use the ssh Jump Host option to jump the bastion server as follows:
$ ssh -J YourNetID@linux-bastion.tudelft.nl YourNetID@login.daic.tudelft.nl # use `student-linux.tudelft.nl` instead if you are a student
For convenience, you can also edit your ssh configuration file, ~/.ssh/config
, on your local computer as follows:
Host daic
Hostname login.daic.tudelft.nl
User <YourNetID>
ProxyJump linux-bastion.tudelft.nl # For employees and guests. If you are a student, use: student-linux.tudelft.nl instead
Where:
*
ProxyJump
: Specifies the jump server, bastion in this case.
You can then simply use: ssh daic
to login.
Note
When using the ProxyJump feature, you will be prompted for your password twice: once for the bastion server, and then for DAICEfficient SSH Connections with SSH Multiplexing
SSH multiplexing allows you to reuse an existing connection for multiple SSH sessions, reducing the time spent entering your password for every new connection. After the first connection is established, subsequent connections will be much faster since the existing control connection is reused.
To enable SSH multiplexing, add the following lines to your SSH configuration file. Assuming a Linux/Mac system, you can add the following lines to ~/.ssh/config
:
~/.ssh/config
Host *
ControlMaster auto
ControlPath /tmp/ssh-%r@%h:%p
where:
- The
ControlPath
specifies where to store the “control socket” for the multiplexed connections.%r
refers to the remote login name,%h
refers to the target host name, and%p
refers to the destination port. This ensures that SSH separates control sockets for different connections.- The
ControlMaster
setting activates multiplexing. With theauto
setting, SSH will use an existing master connection if available or create a new one when necessary. This configuration helps streamline SSH connections and reduces the need to enter your password for each new session.
This setup will speed up connections after the first one and reduce the need to repeatedly enter your password for each new SSH session.
Note
On Windows you may need to adjust theControlPath
to match a valid path for your operating system. For example, instead of /tmp/
, you might use a path like C:/Users/<YourUserName>/AppData/Local/Temp/
.Important
SSH public key logins (passwordless login) are not supported on DAIC, because Kerberos authentication is required to access your home directory. You will need to enter your password for each session4.4 - Data management
Data Management Guidelines
There are different use cases and quota limits for the different TU Delft network drives. For example, Umbrella
(project storage), is for everybody and everything, while bulk
needs to be cleaned up, migrated and phased out. Always check TU Delft
Overview data storage
for guidelines on using network drives and quota limits.
4.4.1 - Data transfer
Your Windows Personal Storage and the Project and Group Storage are available on all TU Delft installed machines including the DAIC compute nodes. If possible use one of these for files that you want to access on both your personal computer and the compute nodes. Your Windows Personal Storage and the Project and Group Storage are also accessible off-campus through the TU Delft webdata service
. See the
webdata page
for manuals on using the service with your personal computer.
Mounting folders
Besides the commands below, there are multiple ways to upload and download code and data to and from the central storage. The officially advised way would be either via direct mount or viasftp.tudelft.nl
. Find more information hereSCP
Both your Linux and Windows Personal Storage and the Project and Group Storage are also available world-wide via an SCP/SFTP client. This is the simplest transfer method via the scp
command, which has the following basic syntax:
$ scp <source_file> <target_destination> # for files
$ scp -r <source_folder> <target_destination> # for folders
For example, to transfer a file from your computer to DAIC:
$ scp mylocalfile [<netid>@]login.daic.tudelft.nl:~/destination_path_on_DAIC/
To transfer a folder (recursively) from your computer to DAIC:
$ scp -r mylocalfolder [<netid>@]login.daic.tudelft.nl:~/destination_path_on_DAIC/
To transfer a file from DAIC to your computer:
$ scp [<netid>@]login.daic.tudelft.nl:~/origin_path_on_DAIC/remotefile ./
To transfer a folder from DAIC to your computer:
$ scp -r [<netid>@]login.daic.tudelft.nl:~/origin_path_on_DAIC/remotefolder ./
The above commands will work from either the university network, or when using EduVPN. If a “jump” via linux-bastion
is needed (see Access from outside university network), modify the above commands by replacing scp with scp -J <netid>@linux-bastion.tudelft.nl
and keep the rest of the command as before:
$ scp <local_file> [<netid>@]linux-bastion.tudelft.nl:<remote_destination>
$ scp -r <local_folder> [<netid>@]linux-bastion.tudelft.nl:<remote_destination>
$ scp [<netid>@]linux-bastion.tudelft.nl:<remote_file> <local_destination>
$ scp -r [<netid>@]linux-bastion.tudelft.nl:<remote_folder> <local_destination>
$ sftp [<netid>@]linux-bastion.tudelft.nl
Where:
- Case is important.
- Items between < > brackets are user-supplied values (so replace with your own NetID, file or folder name).
- Items between [ ] brackets are optional: when your username on your local computer is the same as your NetID username, you don’t have to specify it.
- When you specify your NetID username, don’t forget the @ character between the username and the computer name.
Note for students
Please usestudent-linux.tudelft.nl
instead of linux-bastion.tudelft.nl
as an intermediate server!Hint
Use quotes when file or folder names contain spaces or special characters.rsync
rsync
is a robust file copying and synchronization tool commonly used in Unix-like operating systems. It allows you to transfer files and directories efficiently, both locally and remotely. rsync
supports options that enable compression, preserve file attributes, and allow for incremental updates.
Basic Usage
Copy files locally:
rsync [options] source destination
This command copies files and directories from the source to the destination.
Copy files remotely:
rsync [options] source user@remote_host:destination
This command transfers files from a local source to a remote destination.
Options in rsync
Commonly used options in rsync
with DAIC are:
-a
for recursion and to preserve almost everything-z
compress file data during the transfer-v
verbose mode to display information while copying--progress
show progress of files during transfer--no-perms
don’t preserve file permissions
In addition to the commonly used options, rsync
provides several other options for more advanced control and customization during file transfers:
--dry-run
: Perform a trial run without making any changes. This option allows you to see what would be done without actually doing it.--checksum
: Use checksums instead of file size and modification time to determine if files should be transferred. This is more precise but slower.--partial
: Keep partially transferred files and resume them later. This is useful in case of an interrupted transfer.--partial-dir=DIR
: Specify a directory to hold partial transfers. This option works well with--partial
.--bwlimit=KBPS
: Limit the bandwidth used by the transfer to the specified rate in kilobytes per second. Useful for managing network load.--timeout=SECONDS
: Set a maximum wait time in seconds for receiving data. If the timeout is exceeded,rsync
will exit.--no-implied-dirs
: When transferring a directory, this option prevents the creation of implied directories on the destination side that exist in the source but not explicitly specified in the transfer.--files-from=FILE
: Read a list of source files from the specified FILE. This can be useful when you want to transfer specific files.--update
: Skip files that are newer on the destination than the source. This is useful for incremental backups.--ignore-existing
: Skip files that already exist on the destination. Useful when you want to avoid overwriting existing files.--inplace
: Update files in place instead of creating temporary files and renaming them later. This can save disk space and improve speed.--append
: Append data to files instead of replacing them if they already exist on the destination.--append-verify
: Append data and verify it with checksums to ensure integrity.--backup
: Make backups of files that are overwritten or deleted during the transfer. By default, a~
is appended to the backup filename.--backup-dir=DIR
: Specify a directory to store backup files.--suffix=SUFFIX
: Specify a suffix to append to backup files instead of the default~
.--progress
: Displays the progress of the transfer, including the speed and the number of bytes transferred. This is useful for monitoring long transfers and seeing how much data has been copied so far.
These options, along with others, provide additional flexibility and control over your rsync
transfers, allowing you to fine-tune the synchronization process to meet your specific needs.
Examples
Copy data to project drives: use the
--no-perms
option:rsync -av --no-perms </path/to/local/dir> user@login.daic.tudelft.nl:/tudelft.net/staff-umbrella/<project-id>
This command copies files and directories from a local source to a remote destination, preserving file attributes except for permissions.
Synchronize a local directory with a remote directory:
rsync -avz /path/to/local/dir user@remote_host:/path/to/remote/dir
This synchronizes a local directory with a remote directory, using archive mode (
-a
) to preserve file attributes, verbose mode (-v
) for detailed output, and compression (-z
) for efficient transfer.Synchronize a remote directory with a local directory:
rsync -avz user@remote_host:/path/to/remote/dir /path/to/local/dir
This transfers files from a remote directory to a local directory, using the same options as the previous example.
Delete files in the destination that are not present in the source:
rsync -av --delete /path/to/source/dir /path/to/destination/dir
This synchronizes the source and destination directories and deletes files in the destination that are not in the source.
Exclude certain files or directories during transfer:
rsync -av --exclude='*.tmp' /path/to/source/dir /path/to/destination/dir
This synchronizes the source and destination directories, excluding files with the
.tmp
extension.
4.5 - Software
4.5.1 - Available software
General software
Most common general software, like programming languages and libraries, is installed on the DAIC nodes. To check if the program that you need is pre-installed, you can simply try to start it:
$ python
Python 2.7.5 (default, Jun 28 2022, 15:30:04)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()
To find out which binary is used exactly you can use which
command:
$ which python
/usr/bin/python
Alternatively, you can try to locate the program or library using the whereis
command:
$ whereis python
python: /usr/bin/python3.4m-config /usr/bin/python3.6m-x86_64-config /usr/bin/python2.7 /usr/bin/python3.6-config /usr/bin/python3.4m-x86_64-config /usr/bin/python3.6m-config /usr/bin/python3.4 /usr/bin/python3.4m /usr/bin/python2.7-config /usr/bin/python3.6 /usr/bin/python3.4-config /usr/bin/python /usr/bin/python3.6m /usr/lib/python2.7 /usr/lib/python3.4 /usr/lib/python3.6 /usr/lib64/python2.7 /usr/lib64/python3.4 /usr/lib64/python3.6 /etc/python /usr/include/python2.7 /usr/include/python3.4m /usr/include/python3.6m /usr/share/man/man1/python.1.gz
Or, you can check if the package is installed using the rpm -qa
command as follows:
$ rpm -q python
python-2.7.5-94.el7_9.x86_64
$ rpm -q python4
package python4 is not installed
You can also search with wildcards:
$ rpm -qa 'python*'
python2-wheel-0.29.0-2.el7.noarch
python2-cryptography-1.7.2-2.el7.x86_64
python34-virtualenv-15.1.0-5.el7.noarch
python-networkx-1.8.1-12.el7.noarch
python-gobject-3.22.0-1.el7_4.1.x86_64
python-gofer-2.12.5-3.el7.noarch
python-iniparse-0.4-9.el7.noarch
python-lxml-3.2.1-4.el7.x86_64
python34-3.4.10-8.el7.x86_64
python36-numpy-f2py-1.12.1-3.el7.x86_64
...
Useful commands on DAIC
For a list of handy commands on DAIC have a look here.
4.5.2 - Modules
In the context of Unix-like operating systems, the module
command is part of the environment modules system, a tool that provides a dynamic approach to managing the user environment. This system allows users to load and unload different software packages or environments on demand. Some often used third-party software (e.g., CUDA, cuDNN, MATLAB) is pre-installed on the cluster as
environment modules
.
Usage
To see or use the available modules, first, enable the software collection:
$ module use /opt/insy/modulefiles
Now, to see all available packages and versions:
$ module avail
---------------------------------------------------------------------------------------------- /opt/insy/modulefiles ----------------------------------------------------------------------------------------------
albacore/2.2.7-Python-3.4 cuda/11.8 cudnn/11.5-8.3.0.98 devtoolset/6 devtoolset/10 intel/oneapi (D) matlab/R2021b (D) miniconda/3.9 (D)
comsol/5.5 cuda/12.0 cudnn/12-8.9.1.23 (D) devtoolset/7 devtoolset/11 (D) intel/2017u4 miniconda/2.7 nccl/11.5-2.11.4
comsol/5.6 (D) cuda/12.1 (D) cwp-su/43R8 devtoolset/8 diplib/3.2 matlab/R2020a miniconda/3.7 openmpi/4.0.1
cuda/11.5 cudnn/11-8.6.0.163 cwp-su/44R1 (D) devtoolset/9 :
...
- D is a label for the default module in case multiple versions are available. E.g.
module load cuda
will loadcuda/12.1
- L means a module is currently loaded
To check the description of a specific module:
$ module whatis cudnn
cudnn/12-8.9.1.23 : cuDNN 8.9.1.23 for CUDA 12
cudnn/12-8.9.1.23 : NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks.
And to use the module or package, load it as follows:
$ module load cuda/11.2 cudnn/11.2-8.1.1.33 # load the module
$ module list # check the loaded modules
Currently Loaded Modules:
1) cuda/11.2 2) cudnn/11.2-8.1.1.33
Note
For more information about using the module system, runmodule help
.Compilers and Development Tools
The cluster provides several compilers and development tools. The following table lists the available compilers and development tools. These are available in the devtoolset
module:
$ module use /opt/insy/modulefiles
$ module avail devtoolset
---------------------------------------------------------------------------------------------- /opt/insy/modulefiles ----------------------------------------------------------------------------------------------
devtoolset/6 devtoolset/7 devtoolset/8 devtoolset/9 devtoolset/10 devtoolset/11 (L,D)
Where:
L: Module is loaded
D: Default Module
If the avail list is too long consider trying:
"module --default avail" or "ml -d av" to just list the default modules.
"module overview" or "ml ov" to display the number of modules for each name.
Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".
$ module whatis devtoolset
devtoolset/11 : Developer Toolset 11 Software Collection
devtoolset/11 : GNU Compiler Collection, GNU Debugger, and other development, debugging, and performance monitoring tools.
$ module load devtoolset/11
$ gcc --version
gcc (GCC) 11.2.1 20220127 (Red Hat 11.2.1-9)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
4.5.3 - Installing software
Basic principles
On a cluster, it’s important that software is available and identical on all nodes, both login and compute nodes (see Workload scheduler). For self-installed software, it’s easier to install the software in one shared location than installing and maintaining the same software separately on every single node. You should therefore install your software on one of the network shares (eg, your
$HOME
folder or anumbrella
orbulk
folder) that are accessible from all nodes (see Storage).As a regular Linux user you don’t have administrator rights. Yet, you can do your normal work, including installing software in a personal folder, without needing administrator rights. Consequently, you don’t need (nor are you allowed) to use the
sudo
orsu
commands that are often shown in manuals.DAIC provides only 8GB of storage in the
/home
directories and the project spaces (/tudelft.net/...
) are Windows-based leading to problems installing packages withpip
due to file permission errors. However,/tudelft.net/...
locations are mounted on all nodes. Therefore, the recommened way of using your own software and environments is to use containerization and to store your containers under/tudelft.net/staff-umbrella/...
. Check out the Apptainer tutorial for guidance.
Stop!
Although both Linux flavors Red Hat Enterprise Linux (RHEL, CentOS, Scientific Linux, Fedora) and Debian (Ubuntu) can run the same Linux software, they use completely different package systems for installing software. The available software, packages’ names and package versions might differ, and the package formats and package management tools are incompatible. This means:
- It is not possible to install Ubuntu or Debian
.deb
packages in CentOS or useapt-get
to install software in DAIC. So when installing software, use a manual for CentOS, Red Hat or Fedora. - If you can only find a manual for Ubuntu, you have to substitute the CentOS versions for any Ubuntu-specific packages or commands.
Managing environments
Conda/Mamba
Conda and Mamba are both package management and environment management tools used primarily in the data science and programming communities. Conda, developed by Anaconda, Inc., allows users to manage packages and create isolated environments for different projects, supporting multiple languages like Python and R. Mamba is a more recent alternative to Conda that offers faster performance and improved dependency solving using the same package repositories as Conda. Both tools help avoid dependency conflicts and simplify the management of software packages and environments. You can install it with:
Use module load conda
Miniconda is available as module.
$ module use /opt/insy/modulefiles # If not already
$ module load miniconda
$ which conda
/opt/insy/miniconda/3.9/bin/conda
Creating a conda environment
To create a new environment you can run conda create
:
$ conda create -n env
Collecting package metadata (current_repodata.json): done
Solving environment: done
==> WARNING: A newer version of conda exists. <==
current version: 4.10.1
latest version: 24.3.0
Please update conda by running
$ conda update -n base -c defaults conda
## Package Plan ##
environment location: /home/nfs/username/.conda/envs/env
Creating a conde environment from a YAML file
Conda allows you to create environments from a YAML file that specifies the packages and their versions for the desired environment. This feature makes it easier to reproduce environments across different machines and share environment configurations with others.
$ conda env create -f environment.yml (-n new-name)
For how to create a environment.yml
file see Exporting environments
Environment variables
You can set enviromnet variables to install packages and environments in other locations:
CONDA_PREFIX
: This variable points to the active conda environment’s root directory. When an environment is active,CONDA_PREFIX
contains the path to that environment’s root directory.CONDA_ENVS_DIRS
: This variable specifies the directories where conda environments are stored. You can set it to a list of directories (separated by colons on Unix-like systems and semicolons on Windows). Conda will search for and store environments in these directories.CONDA_PKGS_DIRS
: This variable specifies the directories where conda stores downloaded packages. LikeCONDA_ENVS_DIRS
, you can set it to a list of directories. Conda uses these directories as cache locations for package downloads and installations.
Examples
- Set conda environments directory:
$ export CONDA_ENVS_DIRS="/tudelft.net/staff-umbrella/my-project/"
A caveat is that the /tudelft.net
mounts are windows based and therefore have compatibility issues with pip
. When you create your conda environments there you will not be able to use pip
to install packages. It is therefore recommeneded to keep the conda environments minimal and in your home directory, and to use containerization for larger environments.
List existing environments
You can list environments with
$ conda env list
Activating environments
You can activate an existing environemnt with conda activate
, for example to install more packages:
$ conda activate env # Activate the newly created environment
Modifying environments
Sometimes you need to add/remove/change packages and libraries in existing environments. First, activate the enviroment you want to change with conda activate
and then run conda install package-name
or conda remove package-name
. You can also use pip
to install packages inside a conda environment, but for that pip
has to be installed inside the environment. To make sure pip
is installed in your enviroment run conda install pip
first.
(env) $ conda install pandas # Add a new package to the active environment
Collecting package metadata (current_repodata.json): done
Solving environment: done
==> WARNING: A newer version of conda exists. <==
current version: 4.10.1
latest version: 24.3.0
Please update conda by running
$ conda update -n base -c defaults conda
## Package Plan ##
environment location: /home/nfs/sdrwacker/.conda/envs/test
added / updated specs:
- pandas
The following packages will be downloaded:
package | build
---------------------------|-----------------
blas-1.0 | mkl 6 KB
bottleneck-1.3.7 | py312ha883a20_0 140 KB
bzip2-1.0.8 | h5eee18b_5 262 KB
expat-2.6.2 | h6a678d5_0 177 KB
intel-openmp-2023.1.0 | hdb19cb5_46306 17.2 MB
ld_impl_linux-64-2.38 | h1181459_1 654 KB
libffi-3.4.4 | h6a678d5_0 142 KB
libuuid-1.41.5 | h5eee18b_0 27 KB
mkl-2023.1.0 | h213fc3f_46344 171.5 MB
mkl-service-2.4.0 | py312h5eee18b_1 66 KB
mkl_fft-1.3.8 | py312h5eee18b_0 204 KB
mkl_random-1.2.4 | py312hdb19cb5_0 284 KB
ncurses-6.4 | h6a678d5_0 914 KB
numexpr-2.8.7 | py312hf827012_0 149 KB
numpy-1.26.4 | py312hc5e2394_0 11 KB
numpy-base-1.26.4 | py312h0da6c21_0 7.7 MB
openssl-3.0.13 | h7f8727e_0 5.2 MB
pandas-2.2.1 | py312h526ad5a_0 15.4 MB
pip-23.3.1 | py312h06a4308_0 2.8 MB
python-3.12.3 | h996f2a0_0 34.8 MB
pytz-2023.3.post1 | py312h06a4308_0 197 KB
readline-8.2 | h5eee18b_0 357 KB
setuptools-68.2.2 | py312h06a4308_0 1.2 MB
six-1.16.0 | pyhd3eb1b0_1 18 KB
sqlite-3.41.2 | h5eee18b_0 1.2 MB
tbb-2021.8.0 | hdb19cb5_0 1.6 MB
tk-8.6.12 | h1ccaba5_0 3.0 MB
tzdata-2024a | h04d1e81_0 116 KB
wheel-0.41.2 | py312h06a4308_0 131 KB
xz-5.4.6 | h5eee18b_0 651 KB
zlib-1.2.13 | h5eee18b_0 103 KB
------------------------------------------------------------
Total: 266.1 MB
The following NEW packages will be INSTALLED:
_libgcc_mutex pkgs/main/linux-64::_libgcc_mutex-0.1-main
_openmp_mutex pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu
blas pkgs/main/linux-64::blas-1.0-mkl
bottleneck pkgs/main/linux-64::bottleneck-1.3.7-py312ha883a20_0
bzip2 pkgs/main/linux-64::bzip2-1.0.8-h5eee18b_5
ca-certificates pkgs/main/linux-64::ca-certificates-2024.3.11-h06a4308_0
expat pkgs/main/linux-64::expat-2.6.2-h6a678d5_0
intel-openmp pkgs/main/linux-64::intel-openmp-2023.1.0-hdb19cb5_46306
ld_impl_linux-64 pkgs/main/linux-64::ld_impl_linux-64-2.38-h1181459_1
libffi pkgs/main/linux-64::libffi-3.4.4-h6a678d5_0
libgcc-ng pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1
libgomp pkgs/main/linux-64::libgomp-11.2.0-h1234567_1
libstdcxx-ng pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1
libuuid pkgs/main/linux-64::libuuid-1.41.5-h5eee18b_0
mkl pkgs/main/linux-64::mkl-2023.1.0-h213fc3f_46344
mkl-service pkgs/main/linux-64::mkl-service-2.4.0-py312h5eee18b_1
mkl_fft pkgs/main/linux-64::mkl_fft-1.3.8-py312h5eee18b_0
mkl_random pkgs/main/linux-64::mkl_random-1.2.4-py312hdb19cb5_0
ncurses pkgs/main/linux-64::ncurses-6.4-h6a678d5_0
numexpr pkgs/main/linux-64::numexpr-2.8.7-py312hf827012_0
numpy pkgs/main/linux-64::numpy-1.26.4-py312hc5e2394_0
numpy-base pkgs/main/linux-64::numpy-base-1.26.4-py312h0da6c21_0
openssl pkgs/main/linux-64::openssl-3.0.13-h7f8727e_0
pandas pkgs/main/linux-64::pandas-2.2.1-py312h526ad5a_0
pip pkgs/main/linux-64::pip-23.3.1-py312h06a4308_0
python pkgs/main/linux-64::python-3.12.3-h996f2a0_0
python-dateutil pkgs/main/noarch::python-dateutil-2.8.2-pyhd3eb1b0_0
python-tzdata pkgs/main/noarch::python-tzdata-2023.3-pyhd3eb1b0_0
pytz pkgs/main/linux-64::pytz-2023.3.post1-py312h06a4308_0
readline pkgs/main/linux-64::readline-8.2-h5eee18b_0
setuptools pkgs/main/linux-64::setuptools-68.2.2-py312h06a4308_0
six pkgs/main/noarch::six-1.16.0-pyhd3eb1b0_1
sqlite pkgs/main/linux-64::sqlite-3.41.2-h5eee18b_0
tbb pkgs/main/linux-64::tbb-2021.8.0-hdb19cb5_0
tk pkgs/main/linux-64::tk-8.6.12-h1ccaba5_0
tzdata pkgs/main/noarch::tzdata-2024a-h04d1e81_0
wheel pkgs/main/linux-64::wheel-0.41.2-py312h06a4308_0
xz pkgs/main/linux-64::xz-5.4.6-h5eee18b_0
zlib pkgs/main/linux-64::zlib-1.2.13-h5eee18b_0
Proceed ([y]/n)? y
....
Exporting environments
You can export versions of all installed packages and libaries inside a coda environment with conda env export
.
It is good practice to keep track of all versions that you have used for a particular experiment by exporting it into a YAML file typically called environment.yml
:
$ conda env export --no-builds > environment.yml
Install your own mamba/conda
Sometimes the versions provided by module
are outdated and users need their own installation of conda
or mamba
.
A minimal version can be installed as demonstrated in the following:
$ alias install-miniforge='
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh \
&& bash Miniforge3-Linux-x86_64.sh -b \
&& rm -f Miniforge3-Linux-x86_64.sh \
&& eval "$($HOME/miniforge3/bin/conda shell.bash hook)" \
&& conda init \
&& conda install -n base -c conda-forge mamba'
$ cd ~ && install-miniforge
(base) $ # This shows that the 'base' environment is active.
(base) $ which python
~/miniforge3/bin/python
This will already occupy around 500MB of your home directory totalling ~20k files.
$ du -h miniforge3 --max-depth=0
486M miniforge3
$ find miniforge3 -type f | wc -l
20719
Now, you can install your own versions of libraries and programs, or create entire environments as descibed above.
Stop!
You are limited to 8GB of data in your home directoy. Installing a full development environement for PyTorch can easily exceed 12 GB; Therefore, it is recommeneded to install only tools and libraries that you really need on the login nodes via this route. Instead, useApptainer
to create container files containing all dependencies.Using binaries
Some programs come as precompiled binaries or are written in a scripting language such as Perl, PHP, Python or shell script. Most of these programs don’t actually need to be “installed” since you can simply run these programs directly. In certain scenarios, you may need to make the program executable first using chmod +x
:
$ ./my-executable # attempting to run the binary `my-executable`
-bash: ./my-executable: Permission denied
$ chmod +x program # making `my-executable` executable, since it fails due to permissions
$ ./my-executable # checking `my-executable` works!
Hello world!
Installing from source
When a pre-made binary of your software is not available, you’ll have to install the software yourself from the source. You may need to set up your Installation environment before following this Installation recipe.
Installation environment
When you are installing software for the very first time, you need to set up your environment. If you have already done this before , you can skip this section and go directly to the Installation recipe section.
To set up your environment, first, add the following lines to your ~/.bash_profile
or, alternatively, download this (bash_profile.txt) as shown in the subsequent commands:
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup settings
export PREFIX="$HOME/.local"
export ACLOCAL_PATH="$PREFIX/share/aclocal${ACLOCAL_PATH:+:$ACLOCAL_PATH}"
export CPATH="$PREFIX/include${CPATH:+:$CPATH}"
export LD_LIBRARY_PATH="$PREFIX/lib64:$PREFIX/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
export LIBRARY_PATH="$PREFIX/lib64:$PREFIX/lib${LIBRARY_PATH:+:$LIBRARY_PATH}"
export MANPATH="$PREFIX/share/man${MANPATH:+:$MANPATH}"
export PATH="$HOME/bin:$PREFIX/bin:$PATH"
export PERL5LIB="$PREFIX/lib64/perl5:$PREFIX/share/perl5${PERL5LIB:+:$PERL5LIB}"
export PKG_CONFIG_PATH="$PREFIX/lib64/pkgconfig:$PREFIX/share/pkgconfig${PKG_CONFIG_PATH:+:$PKG_CONFIG_PATH}"
export PYTHONPATH="$PREFIX/lib/python2.7/site-packages${PYTHONPATH:+:$PYTHONPATH}"
Note!
- if you already have some of these settings in your
~/.bash_profile
(or elsewhere), you should combine them so they don’t duplicate the paths. - if you want to use
python3.6
instead ofpython2.7
, you need to set thePYTHONPATH
topython3.6
.
$ cp ~/.bash_profile ~/.bash_profile.bak # back up your file
$ curl -s https://wiki.tudelft.nl/pub/Research/InsyCluster/InstallingSoftware/bash_profile.txt >> ~/.bash_profile # download and append the lines above
Then, clean up any duplicate settings, and:
$ source ~/.bash_profile
$ mkdir -p "$PREFIX"
The line export PREFIX="$HOME/.local"
sets your software installation directory to /home/nfs/<YourNetID>/.local
(which is the default and accessible on all nodes). This is in your personal home directory where you have a space quota of 8GB. However, for software for your research project, you should instead use a project share, for example:
export PREFIX="/tudelft.net/staff-umbrella/project/software"
The other variables will let you use your self-installed programs. You are now ready to install your software!
Installation recipe
Software installation usually just requires you to follow the general installation recipe described below, but you always need to consult the documentation for your software.
- Place the source of the software in a folder under
/tmp
:
$ mkdir /tmp/$USER
$ cd /tmp/$USER
You can sometimes download the software directly from the internet:
$ wget http://host/path/software.tar.gz
$ tar -xzf software.tar.gz
Or, clone the software from a git repository:
$ git clone https://github.com/software
Then:
$ cd software
Note
Note:.tgz
is the same as .tar.gz
, for .tar.bz2
files use tar -xjf software.tar.bz2
.- If the software provides a
configure
script, run it:
$ ./configure --prefix="$PREFIX"
If configure
complains about missing software, you’ll either have to install that software, tell configure
where it is (--with-feature _path_=
) or disable the feature (--disable-feature
).
If your software provides a CMakeLists.txt
file, run cmake
(note: the trailing two dots on the last line are needed exactly as shown):
$ mkdir -p build $ cd build $ cmake -DCMAKE_INSTALL_PREFIX="$PREFIX" ..
Again, if cmake
complains about missing software, you’ll either have to install that software or tell cmake
where it is (-DCMAKE_SYSTEM_PREFIX_PATH="/usr/local;/usr;$PREFIX;path"
).
If neither is provided, consult the documentation for dependencies and configuration (specifically for the installation directory).
There is no point in continuing until all reported problems have been fixed.
- Compile the software:
$ make
If compilation is aborted due to an error, Google the error for possible solutions. Again, there is no point in continuing until all reported problems have been fixed.
- Install the software. When you used configure or cmake, you can simply run:
$ make install
When you used neither, you need to use:
$ make prefix="$PREFIX" install
- Your software should now be ready to use, so check it:
$ cd $ _program_
- When the program works, clean up
/tmp/netid
:
$ rm -r /tmp/$USER
4.5.4 - Containerization
Apptainer
Apptainer is a container platform. It allows you to create and run containers that package up pieces of software in a way that is portable and reproducible. You can build a container using Apptainer on your laptop, and then run it on many on an HPC cluster. Apptainer was created to run complex applications on HPC clusters in a simple, portable, and reproducible way. This repository contains a template for building a Apptainer (former Singularity) container using miniforge
, and mamba
(similar to conda). The examples directory also contains examples for other setups.
Apptainer features
- Verifiable reproducibility and security, using cryptographic signatures, an immutable container image format, and in-memory decryption.
- Integration over isolation by default. Easily make use of GPUs, high speed networks, parallel filesystems on a cluster or server by default.
- Mobility of compute. The single file SIF container format is easy to transport and share.
- A simple, effective security model. You are the same user inside a container as outside, and cannot gain additional privilege on the host system by default. Read more about Security in Apptainer.
Template
The Apptainer template repository maintained by the Research Engineering and Infrastructure Team is a good starting point to create your own apptainers.
How to use Apptainer on the cluster with SLURM?
Here is an example how to use the container in a SLURM script.
#!/bin/sh
#SBATCH --job-name="apptainer-job"
#SBATCH --account="my-account"
#SBATCH --partition="general" # Request partition.
#SBATCH --time=01:00:00 # Request run time (wall-clock). Default is 1 minute
#SBATCH --nodes=1. # Request 1 node
#SBATCH --tasks-per-node=1 # Set one task per node
#SBATCH --cpus-per-task=4 # Request number of CPUs (threads) per task.
#SBATCH --gres=gpu:1 # Request 1 GPU
#SBATCH --mem=4GB # Request 4 GB of RAM in total
#SBATCH --mail-type=END # Set mail type to 'END' to receive a mail when the job finishes.
#SBATCH --output=slurm-%x-%j.out # Set name of output log. %j is the Slurm jobId
#SBATCH --error=slurm-%x-%j.err # Set name of error log. %j is the Slurm jobId
export APPTAINER_ROOT="/path/to/container/folder"
export APPTAINER_NAME="my-container.sif"
# If you use GPUs
module use /opt/insy/modulefiles
module load cuda/12.1
# Run script
srun apptainer exec \
--nv \ # Bind NVIDIA libraries from the host
--env-file ~/.env \ # Source additional environment variables (optional)
-B /home/$USER:/home/$USER \ # Mount host file-sytem inside container
-B /tudelft.net/:/tudelft.net/ \ # (different for each cluster)
$APPTAINER_ROOT/$APPTAINER_NAME \ # Path to the container to run
python script.py # Command to be executed inside container
Tutorial
See the Apptainer tutorial.
4.6 - Job submission
Slurm job’s terminology: job, job step, task and CPUs
A slurm job (submitted via sbatch
) can consists of multiple steps in series. Each step (specified via srun
) can run multiple tasks (ie programs) in parallel. Each task gets its own set of CPUs. As an example, consider the workflow and corresponding breakdown shown in fig 2.
In this example, note:
- When you explicitly request 1 CPU per task (
--cpus-per-task=1
), you should also explicitly specify the number of tasks (--ntasks
). Otherwise,srun
may start the task twice in parallel (because CPUs are allocated in multiples of 2) - The default slurm allocation is a single task and single CPU (ie
--ntasks=1 --cpus-per-task=1
). Thus, it is not necessary to explicitly request these to run a single task on a single CPU. - When using multiple tasks, specify
--mem-per-cpu
.
Note
DAIC is dual-threaded. It means that CPUs are automatically allocated in multiples of 2. Thus, in your job use (a multiple of) 2 threads.4.6.1 - Priorities and waiting times
Slurm’s job scheduling and waiting times
When slurm is not configured for FIFO scheduling, jobs are prioritized in the following order:
- Jobs that can preempt: Not enabled in DAIC
- Jobs with an advanced reservation: See Slurm's Advanced Resource Reservation Guide
- Partition PriorityTier: See Priority tiers
- Job priority: See Priority calculations and QoS priority
- Job ID
Priority tiers
DAIC partitions are tiered:
- The
general
partition is in the lowest priority tier, - Department partitions (eg,
insy
,st
) are in the middle priority tier, and - Partitions for specific groups (eg,
influence
,mmll
) are in the highest priority tier. Those partitions correspond to resources contributed by the respective groups or departments (see Contributing departments).
When resources become available, the scheduler will first look for jobs in the highest priority partition that those resources are in, and start the highest (user) priority jobs that fit within the resources (if any). When resources remain, the scheduler will check the next lower priority tier, and so on. Finally, the scheduler will try to backfill lower (user) priority jobs that fit (if any).
The partition priorities have no impact on resources that are in use, so jobs have to wait until the resources become available.
Partition selection
The purpose of this tiering is to let you submit your jobs to multiple partitions (e.g., --partition=mml,insy,general
), allowing the scheduler to determine where the job can start the soonest. This ensures your job has the highest possible priority across different partitions in the cluster, without negatively impacting your or others’ resource access.
Keep in mind that:
- Resources of all partitions (eg,
st
) are also part of thegeneral
partition (see Fig 1). Thus:- Submitting to the
general
partition allows jobs to use all nodes - Submitting to group-specific partitions alone results in longer waiting times, since the
general
partition has much more resources than any of them (The bigger the resource pool, the more chances a job has to be scheduled or back-filled) - The optimal strategy is to submit to both
general
and group-specific partitions when accessible. This is to skip over higher-priority jobs that would otherwise get started first on resources that are also in the specific partition.
- Submitting to the
- You should only submit jobs to partitions that your account has access to. Submitting jobs to unauthorized partitions (e.g., using
--partition=insy,st
when your submitting account does not have access to both of these) will result in the job remaining in a pending state and generate excessive logging, potentially overloading the Slurm controller nodes.
Warning
Always ensure you are submitting jobs to partitions accessible by your account. You can check your account and partition permissions with the following commands- example output for a user is shown below:
$ sacctmgr show user "$USER" withassoc Format='DefaultAccount,Account' --parsable # Check your account(s)
Def Acct|Account|
ewi-insy-prb|ewi-st|
ewi-insy-prb|ewi-insy-prb|
$ echo "Partition AllowAccounts"; scontrol show partition -a | \
> awk '
> /PartitionName=/ {
> split($1, a, "=");
> partition = a[2]
> }
> /AllowAccounts=/ {
> split($2, b, "=");
> print partition, b[2]
> }
> ' | \
> grep -E 'ALL|ewi-insy-prb' # Check paritions accessible to your *default* account
Partition AllowAccounts
general ALL
insy ewi-insy,ewi-insy-cgv,ewi-insy-cys,ewi-insy-ii,ewi-insy-ii-influence,ewi-insy-mmc,ewi-insy-prb,ewi-insy-prb-dbl,ewi-insy-prb-prlab,ewi-insy-prb-spclab,ewi-insy-prb-visionlab,ewi-insy-reit,ewi-insy-sdm,ewi-insy-sup
This shows that the user can use the ewi-insy-prb
or the ewi-st
accounts.
The second command shows that all accounts can submit to the general
partition and several accounts can submit to the insy
partition.
Replace the ewi-insy-prb
in the grep line above to get the partition details for your specific account.
For the example above note the following correct and incorrect examples:
#SBATCH --account=ewi-insy-prb
#SBATCH --partition=insy,general
#SBATCH --partition=insy,general
#SBATCH --account=ewi-insy-prb
#SBATCH --partition=insy,st
#SBATCH --account=ewi-st
#SBATCH --partition=insy
Priority calculations
Slurm continually calculates job priorities and schedules the execution of jobs based on its configurations. A few configuration parameters affect priority computations:
SchedulerType
: The type of scheduling used based on available resources, requested resources, and job priorities. On DAIC, slurm is used withbackfill
scheduling mechanism. This mechanism allows low priority jobs to backfill idle resources if doing so does not delay the expected start time of any high priority job (based on resource availability).
Tip
With sched/backfil
, jobs can only be started when the resources that they request fit within the available idle resources. Thus:
- The fewer resources a job request, the higher the chance that it will fit within the available idle resources.
- The more resources a job request, the long it will have to wait before enough resources become available to start. To check how the cluster is configured, you may run:
$ scontrol show config | grep SchedulerType
SchedulerType = sched/backfil
More details is available in Slurm’s SchedulerType
PriorityType
: The way priority is computed. On DAIC, amultifactor
computation is applied, where job priority at any given time is a weighted sum of the following factors:- Fairshare: a measure of the amount of resources that a group (ie
account
in slurm terminology) has contributed, and the historical usage of the group and the user. - QOS: the quality of service associated with the job, which is specified with the slurm
--qos
directive (see QoS priority).
- Fairshare: a measure of the amount of resources that a group (ie
Info
The whole idea behind the FairShare scheduling in DAIC is to share all the available resources fairly and efficiently with all users (instead of having strict limitations in the amount of resource use or in which hardware users can compute). The resources in the cluster are contributed in different amounts by different groups (see Contributing departments), and the scheduler makes sure that each group can use a share of the resource relative to what the group contributed. To check how the cluster is configured you may run:
$ scontrol show config | grep PriorityType
PriorityType = priority/multifactor
$ sprio --weights
JOBID PARTITION PRIORITY SITE FAIRSHARE QOS
Weights 1 20000000 40000000
The following commands are useful for checking prioritization of your own jobs:
Command | Purpose |
---|---|
sprio -j <YourJobID> | Determine the priority of your job |
squeue -j <YourJobID> --start | Request your job’s estimated start time |
sshare -u <YourNetID> | Determine your current fairshare value |
Info
To get more complete priority configurations of a cluster, run the command:
$ scontrol show config | grep ^Priority
PriorityParameters = (null)
PrioritySiteFactorParameters = (null)
PrioritySiteFactorPlugin = (null)
PriorityDecayHalfLife = 2-00:00:00
PriorityCalcPeriod = 00:05:00
PriorityFavorSmall = No
PriorityFlags =
PriorityMaxAge = 7-00:00:00
PriorityUsageResetPeriod = NONE
PriorityType = priority/multifactor
PriorityWeightAge = 0
PriorityWeightAssoc = 0
PriorityWeightFairShare = 20000000
PriorityWeightJobSize = 0
PriorityWeightPartition = 0
PriorityWeightQOS = 40000000
PriorityWeightTRES = (null)
QoS priority
The purpose of the (multiple) QoSs in DAIC is to optimize the throughput of the cluster and to reduce the waiting times for jobs:
- Long jobs block resources for a long time, thus leading to long waiting times and fragmentation of resources.
- Short jobs block resources only for short times, and can more easily fill in the gaps in the scheduling of resources (thus start sooner), and are therefore better for throughput and waiting times.
Thus, DAIC has the following policy:
To stimulate short jobs, the
short
QoS has a higher priority, and allows you to use a larger part of all resources, than themedium
andlong
QoS.To prevent long jobs from blocking all resources in the cluster for long times (thus causing long waiting times), only a certain part of all cluster resources is available to all running
long
QoS jobs (of all users) combined.All running
medium
QoS jobs together can use a somewhat larger part of all resources in the cluster, and all runningshort
QoS jobs combined are allowed to fill the biggest part of the cluster.- These limits are called the QoS group limits.
- When this limit is reached, no new jobs with this QoS can be started, until some of the running jobs with this QoS finish and release some resources.
- The scheduler will indicate this with the reason
QoS Group CPU/memory/GRES limit
.
To prevent one user from single-handedly using all available resources in a certain QoS, there are also limits for the total resources that all running jobs of one user in a specific QoS can use.
- These are called the QoS per-user limits.
- When this limit is reached, no new jobs of this user with this QoS can be started, until some of the running jobs of this user and with this QoS finish and release some resources.
- The scheduler will indicate this with the reason
QoS User CPU/memory/GRES limit
.
These per-group and per-user limits are set by the DAIC user board, and the scheduler strictly enforces these limits. Thus, no user can use more resources than the amount that was set by the user board. Any (perceived) imbalance in the use of resources by a certain QoS or user should not be held against a user or the scheduler, but should be discussed in the user board.
4.6.2 - Quality of Service (QoS)
When you submit a job in a slurm-based system, it enters a queue waiting for resources. The partition and Quality of Service(QoS) are the two job parameters slurm uses to assign resources for a job:
- The partition is a set of compute nodes on which a job can be scheduled. In DAIC, the nodes contributed or funded by a certain group are lumped into a corresponding partition (see Contributing departments).
All nodes in DAIC are part of the
general
partition, but other partitions exist for prioritization purposes on select nodes (see Priority tiers). - The Quality of Service is a set of limits that controls what resources a job can use and, therefore, determines the priority level of a job. This includes the run time, CPU, GPU and memory limits on the given partition. Jobs that exceed these limits are automatically terminated (see QoS priority).
For DAIC, Table 1 shows the QoS limits on the general
partition.
*infinite QoS jobs will be killed when compute nodes go down, eg, during maintenance. It is not recommended to submit jobs with this QoS. | ||||||||||
Partition | QoS | Priority | Max run time | Jobs per user | CPU limits | GPU limits | Memory limits | |||
---|---|---|---|---|---|---|---|---|---|---|
Per QoS | Per user | Per QoS | Per user | Per QoS | Per User | |||||
general | interactive | high | 1 hour | 1 running | - | 2 | - | 2 | - | 16G |
short | normal | 4 hours | 10000 | 3672 (85%) | 2160 (50%) | 109 (85%) | 64 (50%) | 23159G (85%) | 13623G (50%) | |
medium | medium | 1 ½ day | 2000 | 3456 (80%) | 1512 (35%) | 103 (80%) | 45 (35%) | 21796G (80%) | 9536G (35%) | |
long | low | 7 days | 1000 | 3240 (75%) | 864 (20%) | 96 (75%) | 25 (20%) | 20434G (75%) | 5449G (20%) | |
infinite* | none | infinite | 1 running | 32 | - | 2 | - | 250G | - |
Note
The priority of a job is a function of both QoS and previous usage (less is better). Read Priority and waiting times for more information.See Quality of Service definitions
On DAIC you can check the QoS policies with the sacctmgr
command:
$ sacctmgr list qos
Name Priority GraceTime Preempt PreemptExemptTime PreemptMode Flags UsageThres UsageFactor GrpTRES GrpTRESMins GrpTRESRunMin GrpJobs GrpSubmit GrpWall MaxTRES MaxTRESPerNode MaxTRESMins MaxWall MaxTRESPU MaxJobsPU MaxSubmitPU MaxTRESPA MaxJobsPA MaxSubmitPA MinTRES
---------- ---------- ---------- ---------- ------------------- ----------- ---------------------------------------- ---------- ----------- ------------- ------------- ------------- ------- --------- ----------- ------------- -------------- ------------- ----------- ------------- --------- ----------- ------------- --------- ----------- -------------
normal 0 00:00:00 cluster DenyOnLimit 1.000000 cpu=1
short 50 00:00:00 cluster DenyOnLimit 1.000000 cpu=3562,gre+ 65536 04:00:00 cpu=2096,gre+ 10000 cpu=1,mem=1M
long 25 00:00:00 cluster DenyOnLimit 1.000000 cpu=3144,gre+ 65536 7-00:00:00 cpu=838,gres+ 1000 cpu=1,mem=1M
infinite 0 00:00:00 cluster DenyOnLimit 1.000000 cpu=32,gres/+ 65536 1 100 cpu=1,mem=1M
interacti+ 100 00:00:00 cluster DenyOnLimit 2.000000 65536 01:00:00 cpu=2,gres/g+ 1 1 cpu=1,mem=1M
student 10 00:00:00 cluster DenyOnLimit 1.000000 cpu=192,gres+ 65536 04:00:00 cpu=2,gres/g+ 1 100 cpu=1,mem=1M
reservati+ 100 00:00:00 cluster DenyOnLimit,RequiresReservation 1.000000 65536 10000 cpu=1,mem=1M
influence 100 00:00:00 cluster DenyOnLimit 1.000000 65536 10000 cpu=1,mem=1M
guest-sho+ 10 00:00:00 cluster DenyOnLimit 1.000000 cpu=200,gres+ 65536 04:00:00 cpu=128,gres+ 100 cpu=1,mem=1M
guest-long 0 00:00:00 cluster DenyOnLimit 1.000000 cpu=200,gres+ 65536 7-00:00:00 cpu=128,gres+ 1 10 cpu=1,mem=1M
medium 35 00:00:00 cluster DenyOnLimit 1.000000 cpu=3352,gre+ 65536 1-12:00:00 cpu=1466,gre+ 2000 cpu=1,mem=1M
How to use QoS in your sbatch
scripts?
In your sbatch.slurm
script you can specify the QoS with #SBATCH --qos=...
option.
Example:
#!/bin/bash
#SBATCH --job-name=hello-world
#SBATCH --partition=general
#SBATCH --account=ewi-insy-reit
#SBATCH --qos=short # This is how you specify QoS
#SBATCH --time=0:01:00
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=2
#SBATCH --mem=1GB
#SBATCH --output=slurm-%n-%j.out
#SBATCH --error=slurm-%n-%j.err
srun echo 'Hi, from Slurm!'
sleep 30 # Wait for 30 seconds before exiting.
QoS for reservations
In case you have a reservation you need to specify --qos=reservation
and `–reservation=
4.6.3 - Partitions
In SLURM, a partition is a scheduling construct that groups nodes or resources based on certain characteristics or policies. Partitions are used to organize and manage resources within a cluster, and they allow system administrators to control how jobs are allocated and executed on different nodes.
See partition definitions
On DAIC the scontrol
command only shows you the general partitions. More partitions are available.
$ scontrol show partition
PartitionName=general
AllowGroups=ALL AllowAccounts=ALL DenyQos=influence
AllocNodes=login[1-3],oodtest Default=YES QoS=N/A
DefaultTime=00:01:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=3dgi[1-2],100plus,awi[01-26],cor1,gpu[01-11],grs[1-4],influ[1-6],insy[11-16],tbm5,wis1
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=4064 TotalNodes=59 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerNode=1024 MaxMemPerNode=UNLIMITED
TRESBillingWeights=CPU=0.5,Mem=0.083333333G,GRES/gpu=16.0
4.6.4 - Interactive jobs
Interactive jobs on compute nodes
To work interactively on a node, e.g., to debug a running code, or test on a GPU, start an interactive session using sinteractve <compute requirements>
. If no parameters were provided, the default are applied. <compute requirement>
can be specified the same way as sbatch directives within an sbatch script (see Submitting jobs), as in the examples below:
$ hostname # check you are in one of the login nodes
login1.daic.tudelft.nl
$ sinteractive
16:07:20 up 12 days, 4:09, 2 users, load average: 7.06, 7.04, 7.12
$ hostname # check you are in a compute node
insy15
$ squeue -u SomeNetID # Replace SomeNetId with your NetID
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2 general bash SomeNetI R 1:23 1 insy15
$ logout # exit the interactive job
To request a node with certain compute requirements:
$ sinteractive --ntasks=1 --cpus-per-task=2 --mem=4096
16:07:20 up 12 days, 4:09, 2 users, load average: 7.06, 7.04, 7.12
Warning
When you logout from an interactive session, all running processes will be terminatedNote
Requesting interactive sessions is subject to the same resource availability constraints as submitting an sbatch script. It means you may need to wait until resources are available as you would when you submit an sbatch script4.6.5 - Submitting jobs
Job scripts are text files, where the header set of directives that specify compute resources, and the remainder is the code that needs to run. All resources and scheduling are specified in the header as #SBATCH
directives (see man sbatch
for more information). Code could be a set of steps to run in series, or parallel tasks within these steps (see Slurm job’s terminology).
The code snippet below is a template script that can be customized to run jobs on DAIC. A useful tool that can be used to streamline the debugging of such scripts is ShellCheck .
#!/bin/sh
#SBATCH --partition=general # Request partition. Default is 'general'
#SBATCH --qos=short # Request Quality of Service. Default is 'short' (maximum run time: 4 hours)
#SBATCH --time=0:01:00 # Request run time (wall-clock). Default is 1 minute
#SBATCH --ntasks=1 # Request number of parallel tasks per job. Default is 1
#SBATCH --cpus-per-task=2 # Request number of CPUs (threads) per task. Default is 1 (note: CPUs are always allocated to jobs per 2).
#SBATCH --mem=1024 # Request memory (MB) per node. Default is 1024MB (1GB). For multiple tasks, specify --mem-per-cpu instead
#SBATCH --mail-type=END # Set mail type to 'END' to receive a mail when the job finishes.
#SBATCH --output=slurm_%j.out # Set name of output log. %j is the Slurm jobId
#SBATCH --error=slurm_%j.err # Set name of error log. %j is the Slurm jobId
/usr/bin/scontrol show job -d "$SLURM_JOB_ID" # check sbatch directives are working
# Remaining job commands go below here. For example, to run a Matlab script named "matlab_script.m", uncomment:
#module use /opt/insy/modulefiles # Use DAIC INSY software collection
#module load matlab/R2020b # Load Matlab 2020b version
#srun matlab < matlab_script.m # Computations should be started with 'srun'.
Note
- DAIC is dual-threaded. It means that CPUs are automatically allocated in multiples of 2. Thus, in your job use (a multiple of) 2 threads.
- Do not enable mails when submitting large numbers (>20) of jobs at once
Job submission
To submit a job script jobscript.sbatch
, login to DAIC, and:
- To only test:
$ sbatch --test-only jobscript.sbatch
Job 1 to start at 2015-06-30T14:00:00 using 2 processors on nodes insy15 in partition general
- To actually submit the job and do the computations:
$ sbatch jobscript.sbatch
Submitted batch job 2
4.6.6 - Monitoring jobs
- To check your job has actually been submitted:
$ squeue -u SomeNetID # Replace SomeNetId with your NetID
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2 general jobscip SomeNetI R 0:01 1 insy15
- And to check the log of your job, use an editor or viewer of choice (eg,
vi
,nano
or simplycat
) to view the log:
$ cat slurm-2.out
JobId=2 JobName=jobscript.sbatch
UserId=SomeNetId(123) GroupId=domain users(100513) MCS_label=N/A
Priority=23909774 Nice=0 Account=ewi-insy QOS=short
JobState=RUNNING Reason=None Dependency=(null)
Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
DerivedExitCode=0:0
RunTime=00:00:00 TimeLimit=00:01:00 TimeMin=N/A
SubmitTime=2015-06-30T14:00:00 EligibleTime=2015-06-30T14:00:00
AccrueTime=2015-06-30T14:00:00
StartTime=2015-06-30T14:00:01 EndTime=2015-06-30T14:01:01 Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2015-06-30T14:01:01 Scheduler=Main
Partition=general AllocNode:Sid=login1:2220
ReqNodeList=(null) ExcNodeList=(null)
NodeList=insy15
BatchHost=insy15
NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=2 ReqB:S:C:T=0:0:*:*
TRES=cpu=2,mem=1G,node=1,billing=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
JOB_GRES=(null)
Nodes=insy15 CPU_IDs=26-27 Mem=1024 GRES=
MinCPUsNode=2 MinMemoryNode=1G MinTmpDiskNode=50M
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/home/nfs/SomeNetId/jobscript.sbatch
WorkDir=/home/nfs/SomeNetId
StdErr=/home/nfs/SomeNetId/slurm_2.err
StdIn=/dev/null
StdOut=/home/nfs/SomeNetId/slurm_2.out
Power=
MailUser=SomeNetId@tudelft.nl MailType=END
Checking slurm jobs
Sometimes, it may be desirable to inspect slurm jobs beyond their status in the queue. For example, to check which script was submitted, or how the resources were requested and allocated. Below are a few useful commands for this purpose:
- See job definition
$ scontrol show job 8580148
JobId=8580148 JobName=jobscript.sbatch
UserId=SomeNetID(123) GroupId=domain users(100513) MCS_label=N/A
Priority=23721804 Nice=0 Account=ewi-insy QOS=short
JobState=RUNNING Reason=None Dependency=(null)
Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:12 TimeLimit=00:01:00 TimeMin=N/A
SubmitTime=2023-07-10T06:41:57 EligibleTime=2023-07-10T06:41:57
AccrueTime=2023-07-10T06:41:57
StartTime=2023-07-10T06:41:58 EndTime=2023-07-10T06:42:58 Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2023-07-10T06:41:58 Scheduler=Main
Partition=general AllocNode:Sid=login1:19162
ReqNodeList=(null) ExcNodeList=(null)
NodeList=awi18
BatchHost=awi18
NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=2 ReqB:S:C:T=0:0:*:*
TRES=cpu=2,mem=1G,node=1,billing=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=2 MinMemoryNode=1G MinTmpDiskNode=50M
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/home/nfs/SomeNetID/jobscript.sbatch
WorkDir=/home/nfs/SomeNetID
StdErr=/home/nfs/SomeNetID/slurm_8580148.err
StdIn=/dev/null
StdOut=/home/nfs/SomeNetID/slurm_8580148.out
Power=
MailUser=SomeNetId@tudelft.nl MailType=END
- See statistics of a running job
$ sstat 1
JobID AveRSS AveCPU NTasks AveDiskRead AveDiskWrite
------- ------- ------- ------- ------------ ------------
1.0 426K 00:00.0 1 0.52M 0.01M
- See accounting information of a finished job (also see –long option)
$ sacct -j 8580148
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
8580148 jobscript+ general ewi-insy 2 COMPLETED 0:0
8580148.bat+ batch ewi-insy 2 COMPLETED 0:0
See overall job efficiency of a finished job
$ seff 8580148
Job ID: 8580148
Cluster: insy
User/Group: SomeNetID/domain users
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 2
CPU Utilized: 00:00:00
CPU Efficiency: 0.00% of 00:01:00 core-walltime
Job Wall-clock time: 00:00:30
Memory Utilized: 340.00 KB
Memory Efficiency: 0.03% of 1.00 GB
4.6.7 - Cancelling jobs
- And finally, to cancel a given job:
$ scancel <jobID>
Note
It is possible to specify the sbatch
directives, like --mem
, --ntasks
, … etc in the command line as in:
$ sbatch --time=00:02:00 jobscript.sbatch
This specification is generally not recommended for production, as it is less reproducible than specifying within the job script itself.
4.6.8 - Using graphic cards
Jobs on GPU resources
Some DAIC nodes have GPUs of different types, that can be used for various compute purposes (see GPUs).
To request a gpu for a job, use the sbatch directive --gres=gpu[:type][:number]
, where the optional [:type]
and [:number]
specify the type and number of the GPUs requested, as in the examples below:
Note
For CUDA programs, first, load the needed modules (CUDA, cuDNN) before running your code (see Available software).An example batch script with GPU resources
#!/bin/sh
#SBATCH --partition=general # Request partition. Default is 'general'
#SBATCH --qos=short # Request Quality of Service. Default is 'short' (maximum run time: 4 hours)
#SBATCH --time=0:01:00 # Request run time (wall-clock). Default is 1 minute
#SBATCH --ntasks=1 # Request number of parallel tasks per job. Default is 1
#SBATCH --cpus-per-task=2 # Request number of CPUs (threads) per task. Default is 1 (note: CPUs are always allocated to jobs per 2).
#SBATCH --mem=1024 # Request memory (MB) per node. Default is 1024MB (1GB). For multiple tasks, specify --mem-per-cpu instead
#SBATCH --mail-type=END # Set mail type to 'END' to receive a mail when the job finishes.
#SBATCH --output=slurm_%j.out # Set name of output log. %j is the Slurm jobId
#SBATCH --error=slurm_%j.err # Set name of error log. %j is the Slurm jobId
#SBATCH --gres=gpu:1 # Request 1 GPU
# Measure GPU usage of your job (initialization)
previous=$(/usr/bin/nvidia-smi --query-accounted-apps='gpu_utilization,mem_utilization,max_memory_usage,time' --format='csv' | /usr/bin/tail -n '+2')
/usr/bin/nvidia-smi # Check sbatch settings are working (it should show the GPU that you requested)
# Remaining job commands go below here. For example, to run python code that makes use of GPU resources:
# Uncomment these lines and adapt them to load the software that your job requires
#module use /opt/insy/modulefiles # Use DAIC INSY software collection
#module load cuda/11.2 cudnn/11.2-8.1.1.33 # Load certain versions of cuda and cudnn
#srun python my_program.py # Computations should be started with 'srun'. For example:
# Measure GPU usage of your job (result)
/usr/bin/nvidia-smi --query-accounted-apps='gpu_utilization,mem_utilization,max_memory_usage,time' --format='csv' | /usr/bin/grep -v -F "$previous"
Similarly, to interactively work in a GPU node:
$ hostname # check you are in one of the login nodes
login1.daic.tudelft.nl
$
$ sinteractive --cpus-per-task=1 --mem=500 --time=00:01:00 --gres=gpu:v100:1
Note: interactive sessions are automatically terminated when they reach their time limit (1 hour)!
srun: job 8607665 queued and waiting for resources
srun: job 8607665 has been allocated resources
15:27:18 up 51 days, 3:04, 0 users, load average: 62,09, 59,43, 44,04
SomeNetID@insy11:~$
SomeNetID@insy11:~$ hostname # check you are in one of the compute nodes
insy11.daic.tudelft.nl
SomeNetID@insy11:~$
SomeNetID@insy11:~$ nvidia-smi # check characteristics of GPU
Mon Jul 24 15:37:01 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla V100-SXM2-32GB On | 00000000:88:00.0 Off | 0 |
| N/A 32C P0 40W / 300W| 0MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
SomeNetID@insy11:~$
SomeNetID@insy11:~$ exit # exit the interactive session
4.6.9 - Job arrays
Parallelizing jobs with Job Arrays
There can be scenarios, eg in simulations or benchmarking, where a job script needs to run many times with only different parameter set each time. If done manually, keeping track of the parameter values and corresponding jobIds is cumbersome. Job Arrays are a convenient mechanism for submitting and managing such jobs.
A job array is created by adding the --array=<indexes>
directive to an sbatch script (or in the command line), where <indexes>
can be either a comma separated list of integers, or a range with optional step size, eg, 1-10:2
. The minimum index value is 0, and the maximum is a Slurm configuration parameter (MaxArraySize - 1
).
Within a job array, all jobs have the same SLURM_ARRAY_JOB_ID
, but each job will have its own environment variable SLURM_ARRAY_TASK_ID
that corresponds to the array index value. Additionally, all jobs in the array inherit the same compute resources requirements. In the following examples, arrays of size 2 are created, but with different indexes:
$ sbatch --array=1,4 jobscript.sbatch # Indexes specified as a list, and have values 1 and 4
Submitted batch job 8580151
$
$ squeue -u SomeNetID # Replace SomeNetId with your NetID
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
8580151_1 general jobscrip SomeNetID R 0:01 1 grs4
8580151_4 general jobscrip SomeNetID R 0:01 1 awi18
$ sbatch --array=1-2 jobscript.sbatch # Range specified with default step size = 1. Index have values 1 and 2
Submitted batch job 8580149
$
$ squeue -u SomeNetID # Replace SomeNetId with your NetID
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
8580149_1 general jobscrip SomeNetID R 0:21 1 grs4
8580149_2 general jobscrip SomeNetID R 0:21 1 awi18
Note
To limit the maximum number of simultaneously running jobs in an array use the%
separator, eg--array=1-15%3
to run only 3 tasks at a time.JobId and environment variables
As shown in the previous section, Parallelizing jobs with job arrays, jobs within an array are assigned special slurm variables. These variables can be exploited for various computational objectives. Among these, SLURM_ARRAY_TASK_ID
is the index of an individual task within the array, and SLURM_ARRAY_JOB_ID
is the slurm jobId of the entire array job.
In the simplest case, you can use the ${SLURM_ARRAY_TASK_ID}
directly in a script to assign parameter values. For example, to run a workflow across a set of images image_1.png
… image_5.png
, you can simply create an array using the sbatch directive --array=1-5
, and then, within your sbatch script, use image_${SLURM_ARRAY_TASK_ID}.png
to indicate the corresponding image.
In more complex scenarios, eg, when the parameters of interest are not mappable to indexes (of a job array), you can use a config file to map the parameters to the job array indexes. For example, let’s assume the following parameters:
$ cat jobarray.config
i Flower Color Origin
1 Rose Red Worldwide
2 Jasmine White Asia
3 Tulip Various Persia&Turkey
4 Orchid Various Worldwide
5 Lily Various Worldwide
Now, you can use these parameters inside a job script as follows:
$ cat jobarray.sbatch
#!/bin/bash
#SBATCH --job-name=JobArrayExample
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --array=1-5 # Arry with 5 tasks
#SBATCH --output=slurm-%A_%a.out # Set name of output log. %A is SLURM_ARRAY_JOB_ID and %a is SLURM_ARRAY_TASK_ID
#SBATCH --error=slurm-%A_%a.err # Set name of error log. %A is SLURM_ARRAY_JOB_ID and %a is SLURM_ARRAY_TASK_ID
config=jobarray.config # Path to config file
# Obtain parameters from config file:
flower=$(awk -v ArrayTaskID=$SLURM_ARRAY_TASK_ID '$1==ArrayTaskID {print $2}' $config)
color=$(awk -v ArrayTaskID=$SLURM_ARRAY_TASK_ID '$1==ArrayTaskID {print $3}' $config)
origin=$(awk -v ArrayTaskID=$SLURM_ARRAY_TASK_ID '$1==ArrayTaskID {print $4}' $config)
# Use the parameters, eg, print the index and parameter values to a file:
echo "Array task: ${SLURM_ARRAY_TASK_ID}, Flower: ${flower}, color: ${color}, origin: ${origin}" >> output.txt
$
$ sbatch jobArray.sbatch
Submitted batch job 8580317
$ squeue -u SomeNetID # Replace SomeNetId with your NetID
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
8580317_[1-5] general JobArray SomeNetID PD 0:00 1 (Priority)
In this example, slurm created 5 jobs in a job array, each using the same settings (the name JobArrayExample
, the general
partition, short
QoS, 00:01:00
time, 1
task with 1
CPU and 1G
memory, and an output and error file with both array job Id and task id). Each task looks up certain parameter values from a config file leveraging its index via the awk
command.
Note
The command:
flower=$(awk -v ArrayTaskID=$SLURM_ARRAY_TASK_ID '$1==ArrayTaskID {print $2}' $config)
assigns a value to the variable flower
by reading a configuration file ($config
), and printing the value in the second column ({print $2}
) where the first column matches the value of the ArrayTaskID
variable ($1==ArrayTaskID
). The ArrayTaskID
is an awk variable set to the value of the SLURM environment variable SLURM_ARRAY_TASK_ID
.
For more on the awk
utility, see this awk tutorial.
Jobs within a task array are run in parallel, and hence, there’s no guarantee about their order of execution. This is evident looking at the output file from this example:
$ cat output.txt
Array task: 2, Flower: Jasmine, color: White, origin: Asia
Array task: 3, Flower: Tulip, color: Various, origin: Persia&Turkey
Array task: 1, Flower: Rose, color: Red, origin: Worldwide
Array task: 5, Flower: Lily, color: Various, origin: Worldwide
Array task: 4, Flower: Orchid, color: Various, origin: Worldwide
Other slurm variables that are set inside a job array are shown in the following table, with values based on the preceding example:
Slurm Environment Variable | Description | Value in example |
---|---|---|
SLURM_ARRAY_JOB_ID | The first job ID of the array. | 8580317 |
SLURM_ARRAY_TASK_ID | The job array index value. | A value in range 1-5 |
SLURM_ARRAY_TASK_COUNT | The number of tasks in the job array. | 5 |
SLURM_ARRAY_TASK_MAX | The highest job array index value. | 5 |
SLURM_ARRAY_TASK_MIN | The lowest job array index value | 1 |
Slurm commands and job arrays
The squeue
command reports all submitted jobs. By default, squeue
reports all of the tasks associated with a job array in one line and uses a regular expression to indicate the SLURM_ARRAY_TASK_ID
values. To explicitly print one job array element per line, use the --array
or -r
flag. The following examples highlight the difference, using the same jobarray.sbatch
file from the JobId and environment variables section:
$ sbatch jobarray.sbatch
Submitted batch job 8593299
$
$ squeue -u SomeNetID # Replace SomeNetId with your NetID
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
8593299_[1-5] general JobArray SomeNetID PD 0:00 1 (Priority)
$
$ squeue -r -u SomeNetID # Replace SomeNetId with your NetID
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
8593299_1 general JobArray SomeNetID PD 0:00 1 (Priority)
8593299_2 general JobArray SomeNetID PD 0:00 1 (Priority)
8593299_3 general JobArray SomeNetID PD 0:00 1 (Priority)
8593299_4 general JobArray SomeNetID PD 0:00 1 (Priority)
8593299_5 general JobArray SomeNetID PD 0:00 1 (Priority)
scancel
, on the other hand, can be used to cancel an entire job array by specifying its SLURM_ARRAY_JOB_ID
. Alternatively, to cancel a specific task (or tasks), both its SLURM_ARRAY_JOB_ID
and SLURM_ARRAY_TASK_ID
must be specified, possibly with a regular expression, as shown in the following examples:
$ sbatch jobarray.sbatch
$ squeue -u SomeNetID # Replace SomeNetId with your NetID
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
8593321_[1-5] general JobArray SomeNetID PD 0:00 1 (Priority)
$
$ scancel 8593321_4 # Cancel task with index 4 in the array
$ squeue -u SomeNetID # Replace SomeNetId with your NetID
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
8593321_[1-3,5] general JobArray SomeNetID PD 0:00 1 (Priority)
$
$ scancel 8593321_[1-3] # Cancel tasks in index range 1-3 in the array
$ squeue -u SomeNetID # Replace SomeNetId with your NetID
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
8593321_5 general JobArray SomeNetID PD 0:00 1 (Priority)
$
$ scancel 8593321 # Cancel all tasks in the array
$ squeue -u SomeNetID # Replace SomeNetId with your NetID
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
$
Note
For more information on job arrays, refer to Slurm Job Array SupportTroubleshooting Common Issues
Please see the Frequently asked questions on Scheduler problems and Job resources
4.6.10 - Job chains
Deploying dependent jobs (job chains)
In certain scenarios, it might be desirable to condition the execution of a certain job on the status of another job. In such cases, the sbatch directive --dependency=<condition>:<jobID>
can be used, where <condition>
specifies the type of dependency (See table 2), and <jobID>
is the slurm jobID upon which dependency is based. To specify more than one dependency, the ,
separator is used to indicate that all dependencies must be specified, and, ?
is used denotes that any dependency may be satisfied.
For example, assume the slurm job scripts, job_1.sbatch
, … job_3.sbatch
need to run sequentially one after the other. To start this chain, submit the first job and obtain its jobID:
$ sbatch job_1.sbatch
Submitted batch job 8580135
Next, submit the second job to run only if the first job is successful:
$ sbatch --dependency=afterok:8580135 job_2.sbatch
Submitted batch job 8580136
Note
Note that if the first job (with jobID8580135
in the example) fails, the second job (with jobID 8580136
) will not run, but it will remain in the queue. You have to use scancel 8580136
to cancel this jobAnd, now, to run the third job only after the first two jobs have both run successfully:
$ sbatch --dependency=afterok:8580135,8580136 job_3.sbatch
Submitted batch job 8580140
Alternatively, if the third job is dependent on either job running successfully:
$ sbatch --dependency=afterok:8580135?8580136 job_3.sbatch
Submitted batch job 8580141
Warning
- If the jobs within a chain involve copying data files to a local disk (
/tmp
) on a node, you need to make sure all jobs use the same node (--nodelist=<node>
, for example--nodelist=insy15
)
Argument | Description |
---|---|
after | This job can begin execution after the specified jobs have begun execution |
afterany | This job can begin execution after the specified jobs have terminated. |
aftercorr | A task of this job array can begin execution after the corresponding task ID in the specified job has completed successfully |
afternotok | This job can begin execution after the specified jobs have terminated in some failed state |
afterok | This job can begin execution after the specified jobs have successfully executed |
singleton | This job can begin execution after any previously launched jobs sharing the same job name and user have terminated |
4.6.11 - Reservations
Resources reservations
Slurm gives the possibility to reserve one or more compute nodes exclusively for a specific user or group of users. A reservation ensures that the designated node (or nodes) are dedicated solely to the reservation holder’s tasks and are not shared with other users during the reserved period. This feature allows users to plan the execution of future workloads, and accommodates cluster users with special needs beyond the batch system (eg latency measurement scenarios).
Note
Using reservations is in line with the General cluster usage clauses of DAIC users’ agreement. However, please be mindful that reservations are intended to facilitate special needs that cannot be satisfied by the batch system, and should not be requested to guarantee fast throughput for production runs.Requesting a Reservation
To request a reservation for nodes, please use to the Request Reservation form. You can request a reservation for an entire compute node (or a group of nodes) if you have contributed this (or these) nodes to the cluster and you have special needs that needs to be accommodated.
General guidelines for reservations’ requests:
- You can be granted a reservation only on nodes from a partition that is contributed by your group (See Partitions to check the name of the partition contributed by your group, and System specifications for a listing of available nodes and their features).
- Please ask for the least amount of resources you need as to minimize impact on other users.
- Plan ahead and request your reservation as soon as possible: Reservations usually ignore running jobs, so any running job on the machine(s) you request will continue to run when the reservation starts. While jobs from other users will not start on the reserved node(s), the resources in use by an already running job at the start time of the reservation will not be available in the reservation until this running job ends. The earlier ahead you request resources, the easier it is to allocate the requested resources.
Using reservations
Once your reservation request is approved and a reservation is placed on the system, you can run your jobs in the reservation by specifying --qos=reservation
along with the following directives to your slurm commands: --reservation=<name>
and --partition=<partition>
. For example, to submit the job job.sbatch
to a reservation named icra_iv
on the cor1
node on the cor
partition use:
$ sbatch --qos=reservation --reservation=icra_iv --partition=cor job.sbatch
Alternatively, it is possible to add the following lines to the job.sbatch
file, and submitting this file as usual:
#SBATCH --qos=reservation
#SBATCH --reservation=icra_iv
#SBATCH --partition=cor
Note
It is possible to submit jobs to a reservation once it is created. Jobs will start immediately when the reservation is available, but already running jobs on resources will not be canceled for the reservation to start.Note
When a reservation is used to run your jobs, remember to also pass the reservation parameters to your srun steps:
$ srun --qos=reservation --reservation=<reservation_name> --partition=<partition_name> <some_script.sh>
To make use of an existing reservation you have to specify --qos=reservation
and --reservation=<reservation-name>
in your sbatch
script.
Viewing reservations
To view all active and future reservations run the scontrol
command as follows:
$ scontrol show reservations
ReservationName=icra_iv StartTime=2023-09-09T00:00:00 EndTime=2023-09-16T00:00:00 Duration=7-00:00:00
Nodes=cor1 NodeCnt=1 CoreCnt=32 Features=(null) PartitionName=cor Flags=
TRES=cpu=64
Users=(null) Groups=(null) Accounts=3me-cor Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a
MaxStartDelay=(null)
ReservationName=maintenance weekend 2023-10-14 StartTime=2023-10-13T20:00:00 EndTime=2023-10-16T09:00:00 Duration=2-13:00:00
Nodes=3dgi[1-2],100plus,awi[01-26],cor1,gpu[01-11],grs[1-4],influ[1-6],insy[11-12,14-16],tbm5,wis1 NodeCnt=58 CoreCnt=2000 Features=(null) PartitionName=(null) Flags=MAINT,IGNORE_JOBS,SPEC_NODES,ALL_NODES
TRES=cpu=4000
Users=root Groups=(null) Accounts=(null) Licenses=(null) State=INACTIVE BurstBuffer=(null) Watts=n/a
MaxStartDelay=(null)
Note
- Jobs can run on a reservation only if explicitly requested as shown in the Requesting a reservation section.
- Only jobs from the
Users
orAccounts
associated with the reservation (as shown in thescontrol show reservations
output) will be run on the reservation STATE
of a reservation will show asACTIVE
(instead ofINACTIVE
) during the reservation window.
4.6.12 - Kerberos
Kerberos Authentication
Kerberos is an authentication protocol which uses tickets to authenticate users (and computers). You automatically get a ticket when you log in with your password on a TU Delft installed computer. You can use this ticket to authenticate yourself without password when connecting to other computers or accessing your files. To protect you from misuse, the ticket expires after 10 hours or less (even when you’re still logged in).
File access
Your Linux and Windows Home directories and the Group and Project shares are located on network fileservers, which allows you to access your files from all TU Delft installed computers. Kerberos authentication is used to enable access to, or protect, your files. Without a valid Kerberos ticket (e.g. when the ticket has expired) you will not be able to access your files but instead you will receive a Permission denied
error.
Lifetime of Kerberos Tickets
Kerberos tickets have a limited valid lifetime (of up to 10 hours) to reduce the risk of abuse, even when you stay logged in. If your tickets expire, you will receive a Permission Denied
error when you try to access your files and a password prompt when you try to connect to another computer. When you want your program to be able to access your files for longer than the valid ticket lifetime, you’ll have to renew your ticket (repeatedly) until your program is done. Kerberos tickets can be renewed up to a maximum renewable life period of 7 days (again to reduce the risk of abuse).
The command klist -5
lists your cached Kerberos tickets together with their expiration time and maximum renewal time:
$ klist -5
Ticket cache: FILE:/tmp/krb5cc_uid_random
Default principal: YourNetID@TUDELFT.NET
Valid starting Expires Service principal
01/01/01 00:00:00 01/01/01 10:00:00 krbtgt/TUDELFT.NET@TUDELFT.NET
renew until 01/08/01 00:00:00
Where:
Ticket cache
: The Kerberos tickets that have been issued to you are stored in a ticket cache file. You can have multiple ticket cache files on the same computer (from different connections, for example) with different tickets and ticket expiration times. Some ticket cache files are automatically removed when you logout.Tip
Make sure that you renew the tickets in the right ticket cache file (see thisscreen
example).Default principal
: Your identity.Service principal
: The identity of services that you have gotten tickets for. You always need a Kerberos ticket-granting ticket (krbtgt
) in order to obtain other tickets for specific services like accessing files (nfs
) or connecting to computers (host
).Valid starting
,Expires
: Your ticket is only valid between these times (this period is called the valid lifetime). After this time you will not be able to use the service nor automatically renew the ticket (without password).Renew until
: Your ticket can only be renewed without password up to this time. After this time you will have to obtain a new ticket using your password.
Renewing Kerberos tickets
If you have a valid Kerberos krbtgt
ticket, you can renew it at any time (until it expires) by running the command kinit -R
:
$ kinit -R
$ klist -5
Ticket cache: FILE:/tmp/krb5cc_uid_random
Default principal: YourNetID@TUDELFT.NET
Valid starting Expires Service principal
01/01/01 01:00:00 01/01/01 11:00:00 krbtgt/TUDELFT.NET@TUDELFT.NET
renew until 01/08/01 00:00:00
Note
Renewing the ticket will not change the duration of the valid lifetime, i.e. akrbtgt
ticket with a valid lifetime of 1 hour will, after renewal, be valid for another hour.When the krbtgt
ticket has expired or reached it’s renew until time, you will have to obtain a new ticket by running kinit -r 7d
(note the difference in case for the r
) and authenticating with your password:
$ kinit -r 7d
Password for YourNetID@TUDELFT.NET:
$ klist -5
Ticket cache: FILE:/tmp/krb5cc_uid_random
Default principal: YourNetID@TUDELFT.NET
Valid starting Expires Service principal
01/01/01 11:00:00 01/01/01 21:00:00 krbtgt/TUDELFT.NET@TUDELFT.NET
renew until 01/08/01 11:00:00
The new ticket will have a valid lifetime of 10 hours and a renewable life of 7 days.
On the TU Delft Linux desktops your Kerberos ticket is refreshed (i.e. replaced by a new ticket) automatically every time you enter your password for unlocking the screen saver.
Tip
Do not disable the screen saver password lock.On remote computers you have to manually renew your tickets before they expire.
Slurm & Kerberos
- Slurm caches your Kerberos ticket, and uses it to execute your job
- Regularly renew the ticket in Slurm’s cache while your jobs are queued or running:
$ auks -a
Auks API request succeed
- To automatically renew your ticket in Slurm’s cache until you change your NetID password, run the following on the
login1
node:
$ install_keytab
Password for somebody@TUDELFT.NET:
Installed keytab.
You need to rerun this command whenever you change your NetID password (at least every 6 months). Otherwise, the automatic renewal will not work and you will receive a warning e-mail.
Renewal using screen
On the compute nodes, the screen
program has been modified to allow jobs to run unattended for up to 7 days. It creates a private ticket cache (to prevent the cache from being destroyed at logout) and automatically renews your ticket up to the maximum renewable life. For example, start MATLAB in Screen with screen matlab
(the order is important!).
$ screen matlab
Warning: No display specified. You will not be able to display graphics on the screen.
< M A T L A B (R) >
Copyright 1984-2010 The MathWorks, Inc.
Version 7.11.0.584 (R2010b) 64-bit (glnxa64)
August 16, 2010
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
>>
For longer jobs you have to manually obtain a new ticket at least every 7 days by running kinit -r 7d
from within screen
(so you use the specific ticket cache file that screen
is using):
- connect to screen (
screen -r
), - create a new window (
Ctrl-a c
), - run
kinit -r 7d
, - exit the window (
exit
) and - detach from screen (
Ctrl-a d
).
$ kinit -r 7d
Password for YourNetID@TUDELFT.NET:
$ klist -5
Ticket cache: FILE:/tmp/krb5cc_uid_private
Default principal: YourNetID@TUDELFT.NET
Valid starting Expires Service principal
01/08/01 09:00:00 01/08/01 19:00:00 krbtgt/TUDELFT.NET@TUDELFT.NET
renew until 01/15/01 09:00:00
$ exit
Tip
Use a repeating reminder (twice a week) in your agenda so you don’t forget.Important
When the end of the renewable life is reached, your tickets expire and your program(s) will return Permission denied errors when trying to access your files. Your program(s) will not be terminated automatically; you still have to terminate the program(s) yourself.Extra functionality can be provided by the k5start
and krenew
programs. On most computers these are not available by default but can be installed.