Help yourself
Help yourself resources.
2 minute read
Cluster monitoring
My jobs are not starting, is the cluster busy? The following links are resources that monitor the current state of DAIC.
DAIC status check (Access from TUD network) A brief overview of:- Login nodes status
- Compute nodes status
- Summary graphs
slurmtop (login required) slurmtop
is available as both a cluster command, and as a webpage. Both the command and webpage display the following tables:- Summary on resources allocations in the
general
partition in:Allocated/
Idle/Other/Total (in the command line version) or - Per-node details on status and resources allocations in the
general
partition - Normalized and Effective per-account resource usage information
- Resource usage and fairshare information for the top 10 cluster users (in terms of Normalized usage)
- Details of jobs in the cluster, sorted by priority and jobID
Total/allocation
(in the webpage version) format- Summary on resources allocations in the
SlurmEff (login required) A summary of efficiency statistics of your own jobs. Statistics are calculated on the basis of requested vs consumed resources.- Cluster Monitoring Graphs
Group-specific resources
In line with the steps in What to do in case of problems, the following links are group-specific resources that you may find relevant:
Linux support
- Linux Q&A Portal: This page aims to be a hub for sharing knowledge, seeking support and prioritizing community issues through upvoting.
- Linux Mattermost channel: for daily news, light-hearted conversations, urgent requests, and connecting with peers.
External resources:
- Introduction to High-Performance Computing
- Introduction to Using the Shell in a High-Performance Computing Context
- The Unix Shell
- HPC carpentry lessons
- Other software carpentry lessons
- Data carpentry lessons
- Unix & Linux Stackexchange
Feedback
Was this page helpful?
Glad to hear it! Please click here to notify us. We appreciate it.
Sorry to hear that. Please click here let the page maintainers know.