This covers things specific to the pat cluster. For general SLURM use see SLURM usage.

Partitions

pat has the following partitions:

Name Nodes Time limit Notes
GPU Nodes with GPUs 30 days Default partition
CPU Nodes with only CPUs 30 days  
WALES A single node with only CPUs 30 days Restricted access for DW
DEBUG All the nodes None Restricted access
CLUSTER All the nodes 30 days Pre-emptee: any jobs in here will be evicted if a job in another partition wants their nodes

 

Types of compute node

 

The cluster's nodes are not identical, unlike many local cluster systems. There are a range of GPUs available, and also sometimes more than one OS version. Different OS versions have different software available as not all compilers/CUDA versions are supported on every OS. You select the features you want with SLURM constraints. There are also cpu-only nodes.

Currently available features:

Name Description
teslak20 Nvidia Tesla K20m GPUs
titanblack Nvidia GeForce 700 Titan Black GPUs
3gpu Node has 3 gpus
4gpu Node has 4 gpus
cpu Node has CPUs only

To see what combination of features each node has run 'scontrol show nodes' on pat.

To select particular features use the '-C' or '--constraint' option to srun or sbatch. You can combine multiple features with & for a boolean AND, or | for a boolean OR.

Using the DEBUG partition

The intention of this partition is to allow people to allocate a GPU for debugging for an indefinite time. It's not for running production work. Only people nominated by group computer reps can have access to this partition. To allocate a GPU do something like this

 salloc -n1 --gres=gpu:1 -p DEBUG --no-shell

using whatever parameters you need to get the GPU you want. salloc understands all the same ones as sbatch and srun . The salloc command will return a job id. You'll be able to see this job in the queue, running with unlimited walltime. 

Then to access the allocated GPU do something like

 srun --jobid=id mycommand

where 'id' is the job id that the salloc command gave you. To get rid of the allocation and allow others to use the GPU, cancel it with

 scancel id

System status green status

Can't find what you're looking for?

Then you might find our A-Z site index useful. Or, you can search the site using the box at the top of the page, or by clicking here.