How to use the queue manager
ARGO cluster uses Slurm (https://slurm.schedmd.com/) for the workload management, including the queue management.
Note: until May 2018, Argo was Torque based. To assist users with their old job scripts, we still support a limited set of Torque commands by using Torque/PBS wrappers for transition from Torque/PBS to Slurm.
If you wish to quickly find new Slurm counterparts for Torque commands you know, please see any of the Rosetta Stone of Workload Managers, Translate PBS/Torque to SLURM or Slurm vs Moab/Torque.
The rest of this documentation will concentrate on Slurm environment.
- To submit a job just type:
$ sbatch jobscript.sh
A simple jobscript could be:
$ cat jobscript.sh #!/bin/bash #SBATCH -p testing # partition (queue) #SBATCH -N 1 # number of nodes ~/fortran_code/test.x
Each line beginning with #SBATCH will be interpreted by the queue manager as options to give to the sbatch command.
- A more complex example (containing some more useful options) follows:
$ cat jobscript.sh #!/bin/bash # #SBATCH --job-name=my-test # job name #SBATCH -p testing # partition (queue) #SBATCH -N 1 # number of nodes #SBATCH -n 2 # number of cores #SBATCH -t 0-2:00 # time (D-HH:MM) #SBATCH -o slurm.%j.out # STDOUT #SBATCH -e slurm.%j.err # STDERR #SBATCH --mail-type=ALL # I want e-mail alerting ############# ### This job's working directory echo \"Working directory is $SLURM_SUBMIT_DIR\" cd $SLURM_SUBMIT_DIR echo Running on host `hostname` echo Time is `date` echo Directory is `pwd` # Run your executable /home/username/fortran_codes/test_f90.x
To check the queue for all the jobs:
$ squeue
To check the queue for all jobs belonging to one user:
$ squeue -u <user-id>
To check the queue for a given job id:
$ squeue -j <job-id>
$ sacct -j <job-id>
$ scancel -j <job-id>
$ srun -p long -N 1 --pty bash
- The login nodes of Argo provide a command called showfree that helps you identify which resources are available (idle) for immediate use before you submit a job. It is based on slurm sinfo command so you can use either:
$ showfreeor
$ sinfo -o '%10P %.5a %15l %6D %8t %15C %N' | egrep -v 'drain|down|alloc'
PARTITION AVAIL TIMELIMIT NODES STATE CPUS(A/I/O/T) NODELISTesp up 1-00:00:00 11 idle 0/132/0/132 node[74-77,90-96]esp1 up 1-00:00:00 1 idle 0/20/0/20 node102gpu up 1-00:00:00 2 idle 0/36/0/36 gpu[01-02]serial up 7-00:00:00 2 idle 0/32/0/32 serial[01-02]testing up 6:00:00 1 idle 0/16/0/16 testing02
#SBATCH --hint=nomultithread