Skip to content. | Skip to navigation

Personal tools

You are here: Home

Argo Overview, Table of available queues/partitions

Argo overview of available hardware and Table of available queues/partitions are provided.

Argo overview

Argo is ICTP HPC cluster, comprising of 153 hosts/nodes, with total count of 2588 CPUs, nearly 10 TB of memory, 40Gbps+ Infiniband interconnects, 1Gbps network and several houndreds of TB of dedicated NFS storage.

The available worker/compute nodes are organised in queues(partitions).

There are three more special cluster nodes: a master node that controls job execution and  login nodes argo-login1 and argo-login2, where users login, submit jobs, and compile code.

 

Jobs can be submitted from argo (argo-login2), argo-login1 and argo-login2.

ssh argo.ictp.it

or

ssh argo-login1.ictp.it

 

List of available queues/partitions

Queue infomation can be listed with the sinfo command:

$ sinfo -s
PARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST
cmsp up 1-00:00:00 35/5/0/40 node[01-16,161-184]
esp up 1-00:00:00 24/11/1/36 node[61-96]
esp1 up 1-00:00:00 16/11/1/28 node[101-128]
long* up 1-00:00:00 19/18/1/38 node[21-32,131-156]
gpu up 1-00:00:00 0/2/0/2 gpu[01-02]
serial up 7-00:00:00 0/2/0/2 serial[01-02]
testing up 6:00:00 0/2/0/2 testing[01-02]
westmere up 6:00:00 0/0/1/1 westmere01
nehalem up 6:00:00 0/2/0/2 nehalem[01-02]
esp_guest up 1-00:00:00 0/2/0/2 nehalem[01-02]

 

The principal queue for all users is the long queue, with 26 nodes and 1 day  time limit. It is the default queue, if none is specified in the job.

Dedicated queues cmsp, esp, esp1, esp_guest and gpu are available to specific Argo users, upon authorization.

testing queue is small, comprising of two nodes, and a short time limit of 6h.

serial queue is specific for serial jobs, while it  has a very long time limit of 7 days. Two nodes are in the serial queue.

Generally for all queues the nodes are NOT shared among jobs. The exceptions are queues serial and gpu.

 

 

Node features

Overall, Argo is a heterogeneous cluster, with nodes belonging to  various generations of Intel CPU microarchitectures. Numerous are nodes of the sandybridge and ivybridge architecture, followed by broadwell.  We have kept, for historical reasons, several nodes of  older microarchitectures like nehalem and westmere, so you can run code on them, for testing and  comparison. Memory size also varies.

 

For each node we list it's microarchitecture, memory size,  and other  features in sinfo output:

$ sinfo -N -o "%.20N %.15C  %.15P   %.40b"
NODELIST   CPUS(A/I/O/T)        PARTITION                            ACTIVE_FEATURES
node01       20/0/0/20            cmsp      omnipart,128gb,broadwell-ep,e5-2640v4
node02       20/0/0/20            cmsp      omnipart,128gb,broadwell-ep,e5-2640v4
...
node21       0/12/0/12             long     infiniband,32gb,sandybridge-ep,e5-2620
node22       0/12/0/12             long     infiniband,32gb,sandybridge-ep,e5-2620
...
node61       0/12/0/12              esp     infiniband,32gb,sandybridge-ep,e5-2620
node62       0/12/0/12              esp     infiniband,32gb,sandybridge-ep,e5-2620
...
node101       20/0/0/20             esp1     infiniband,64gb,ivybridge-ep,e5-2680v2
node102       20/0/0/20             esp1     infiniband,64gb,ivybridge-ep,e5-2680v2
...
node131       20/0/0/20            long*     infiniband,64gb,ivybridge-ep,e5-2680v2
node132       20/0/0/20            long*     infiniband,64gb,ivybridge-ep,e5-2680v2
...
node139       0/16/0/16            long*     infiniband,32gb,sandybridge-ep,e5-2650
node140       16/0/0/16            long*     infiniband,32gb,sandybridge-ep,e5-2650

Within each queue, nodes are homogeneous in terms of all of their features. The ony exception is  the "long" queue.

All nodes are networked together with 1 Gbps ethernet links spanning multiple switches.

Access to storage is also done through the Gigabit ethernet network.

Nodes within each queue are also networked together in a   low-latency fabric  for MPI communication, thanks to   Infiniband (IB) or Omni-Path technology.

The two IB switches perform at  40 Gbps (QDR),­ while the Omni-Path switch supports 100 Gbps .

 

 

Table of available queues and nodes

Table below summarises queue, nodes  and their characteristics

 

Queue/Partition

Access Policy

Notes
long All users

- Allows allocations of a maximum of 10 nodes for running parallel jobs.

testing All users
serial All users

- ONLY FOR single core, cpu (or task) jobs; Parallel or MPI jobs will NOT WORK.

- Up-to a maximum of 7 running independent serial jobs are allowed.

- Resources are over-subscribed (the nodes are shared among jobs and users).

nehalem All users
westmere All users

OTHER Dedicated Queues

cmsp Special authorization needed - Omni-Path connectivity is provided.
esp
Special authorization needed
esp1
Special authorization needed

gpu

 

Special authorization needed
- several GPU Accelerators are  available of the type:
Nvidia Tesla K40
Nvidia Tesla P100
- resources are over-subscribed (the nodes are shared among jobs.).

Table with technical details

 

Queue/Partition

Max walltime

(h)
Node rangeMicro-architectureCores

Ram per core

(GB/c)

Total nodes/cores

Ram per node

(GB)

long 24:00

node[139-148]

node[131-138],[149-156]

node[21-32]

Sandybridge

Ivybridge

Sandybridge

16(8x2)

20(10x2)

12(6x2)

2

3.2

2.7

10 / 160

16 / 320

12/144

32

64

32

testing 6:00 testing[01-02] Nehalem

8(4x2)

1.5 1 / 8 12
nehalem 6:00 nehalem[01-02] Nehalem

8(4x2)

3 2 / 16 24
westmere 6:00 westmere01 Westmere 12(6x2) 2 1 / 12 24
cmsp 24:00

node[01-16]

node[161-184]

Broadwell

20(10x2)

6.4

9.4

16 / 320

24/480

128

188

esp 24:00 node[61-96] Sandybridge

12(6x2)

2.7 36 / 432 32
esp1 24:00 node[101-128] Ivybridge

20(10x2)

3.2 28 / 560 64
gpu 24:00

gpu01

gpu02

Broadwell + 2* gp100

Sandybridge+ 2* k40c

20(10x2) + GPUs

16(8x2) + GPUs

6.4

2

1 / 120

1 /16

128

32