How to use the queue manager

Basic information on how to submit jobs to the cluster

ARGO cluster uses Slurm (https://slurm.schedmd.com/) for the workload management, including the queue management.

Note: until May 2018, Argo was Torque based. To assist users with their old job scripts, we still support a limited set of Torque commands by using Torque/PBS wrappers for transition from Torque/PBS to Slurm.

If you wish to quickly find new Slurm counterparts for Torque commands you know, please see any of the Rosetta Stone of Workload Managers, Translate PBS/Torque to SLURM or Slurm vs Moab/Torque.

The rest of this documentation will concentrate on Slurm environment.

To submit a job just type:

$ sbatch jobscript.sh

A simple jobscript could be:

$ cat jobscript.sh
#!/bin/bash 
#SBATCH -p testing # partition (queue) 
#SBATCH -N 1 # number of nodes 
~/fortran_code/test.x

Each line beginning with #SBATCH will be interpreted by the queue manager as options to give to the sbatch command.

A more complex example (containing some more useful options) follows:

$ cat jobscript.sh
#!/bin/bash 
# 
#SBATCH --job-name=my-test  # job name
#SBATCH -p testing # partition (queue) 
#SBATCH -N 1 # number of nodes 
#SBATCH -n 2 # number of cores 
#SBATCH -t 0-2:00 # time (D-HH:MM) 
#SBATCH -o slurm.%j.out # STDOUT 
#SBATCH -e slurm.%j.err # STDERR 
#SBATCH --mail-type=ALL # I want e-mail alerting

#############
### This job's working directory
echo \"Working directory is $SLURM_SUBMIT_DIR\"
cd $SLURM_SUBMIT_DIR
echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
# Run your executable
/home/username/fortran_codes/test_f90.x

To check the queue for all the jobs:

$ squeue

To check the queue for all jobs belonging to one user:

$ squeue -u <user-id>

To check the queue for a given job id:

$ squeue -j <job-id>

To check the the history for a given job id taht is not in the queue any more:

$ sacct -j <job-id>

To cancel a job:

$ scancel -j <job-id>

To ask for an interactive session:

 $ srun -p long -N 1  --pty bash

The login nodes of Argo provide a command called showfree that helps you identify which resources are available (idle) for immediate use before you submit a job. It is based on slurm sinfo command so you can use either:

  $ showfree

$ sinfo -o '%10P %.5a %15l %6D %8t %15C %N' | egrep -v 'drain|down|alloc'

PARTITION AVAIL TIMELIMIT NODES STATE CPUS(A/I/O/T) NODELIST

esp up 1-00:00:00 11 idle 0/132/0/132 node[74-77,90-96]

esp1 up 1-00:00:00 1 idle 0/20/0/20 node102

gpu up 1-00:00:00 2 idle 0/36/0/36 gpu[01-02]

serial up 7-00:00:00 2 idle 0/32/0/32 serial[01-02]

testing up 6:00:00 1 idle 0/16/0/16 testing02

If you do not want to use hyperthreading, use the option --hint=nomultithread in your srun/sbatch.

Example:

#SBATCH --hint=nomultithread