Skip to content. | Skip to navigation

Personal tools

You are here: Home

Argo Overview, Table of available queues/partitions

Argo overview of available hardware and Table of available queues/partitions are provided.

Argo overview

Argo is ICTP HPC cluster, comprising of 112 hosts/nodes, with total count of 2100 CPUs, nearly 17 TB of memory, 100Gbps  Omnipath or Infiniband interconnects, 1Gbps network and several houndreds of TB of dedicated NFS storage.

The available worker/compute nodes are organised in queues(partitions).

There are three more special cluster nodes: a master node that controls job execution and  login nodes argo-login1 and argo-login2, where users login, submit jobs, and compile code.

 

Jobs can be submitted from argo (argo-login2), argo-login1 and argo-login2.

ssh argo.ictp.it

or

ssh argo-login1.ictp.it

 

List of available queues/partitions

Queue infomation can be listed with the sinfo command:

$  sinfo -s
PARTITION AVAIL  TIMELIMIT   NODES(A/I/O/T)  NODELIST
cmsp         up 1-00:00:00        30/6/4/40  node[01-16,161-184]
esp          up 1-00:00:00       16/12/8/36  node[21-56]
long*        up 1-00:00:00       12/13/7/32  node[61-92]
gpu          up 1-00:00:00          0/1/1/2  gpu[01-02]
serial       up 7-00:00:00          0/2/0/2  serial[01-02]
testing      up    6:00:00      42/19/11/72  node[01-16,61-92,161-184]
esp_guest    up 1-00:00:00          2/0/0/2  node[28-29]

 

The principal queue for all users is the long queue, with 32 nodes and 24 hours  time limit. It is the default queue, if none is specified in the job.

Dedicated queues cmsp, esp, esp_guest and gpu are available to specific Argo users, upon authorization.

testing queue is special, comprising of several nodes from both long and cmsp queues, but with a short time limit of 6h.

serial queue is specific for serial jobs, while it  has a very long time limit of 7 days. Two nodes are in the serial queue.

Generally for all queues the nodes are NOT shared among jobs. The exceptions are queues serial and gpu.

 

 

Node features

Overall, Argo is a heterogeneous cluster, with nodes belonging to  various generations of Intel CPU microarchitectures. Numerous are nodes of the broadwell and skylake-cascade architecture, followed by Cascade-Lake. Memory size also varies.

For each node we list it's microarchitecture, memory size,  and other  features in sinfo output:

$ sinfo -N -o "%.20N %.15C  %.15P   %.40b"
            NODELIST   CPUS(A/I/O/T)        PARTITION                            ACTIVE_FEATURES
               gpu01       0/40/0/40              gpu               128gb,broadwell-ep,e5-2640v4
               gpu02       0/0/16/16              gpu                32gb,sandybridge-ep,e5-2665

              node01       40/0/0/40             cmsp      omnipart,128gb,broadwell-ep,e5-2640v4               node02       40/0/0/40             cmsp      omnipart,128gb,broadwell-ep,e5-2640v4               ...               node15       40/0/0/40             cmsp      omnipart,128gb,broadwell-ep,e5-2640v4               node16       40/0/0/40             cmsp      omnipart,128gb,broadwell-ep,e5-2640v4              

node21       40/0/0/40              esp      omnipart,128gb,skylake-cascade,silver               node22       40/0/0/40              esp      omnipart,128gb,skylake-cascade,silver
...               node56       0/40/0/40              esp      omnipart,128gb,skylake-cascade,silver
              node61       64/0/0/64            long*   infiniband,187gb,Cascade-Lake,silver-421               node62       0/0/64/64            long*   infiniband,187gb,Cascade-Lake,silver-421               ...
              node91       64/0/0/64            long*   infiniband,187gb,Cascade-Lake,silver-421               node92       64/0/0/64            long*   infiniband,187gb,Cascade-Lake,silver-421             
node161       0/0/40/40             cmsp        omnipart,192,broadwell-ep,e5-2640v4              node162       0/40/0/40             cmsp        omnipart,192,broadwell-ep,e5-2640v4 ...
             node183       40/0/0/40             cmsp        omnipart,192,broadwell-ep,e5-2640v4              node184       0/40/0/40             cmsp        omnipart,192,broadwell-ep,e5-2640v4            
serial01       0/16/0/16           serial                32gb,sandybridge-ep,e5-2650             serial02       0/16/0/16           serial                32gb,sandybridge-ep,e5-2650

Within each queue, nodes are homogeneous in terms of all of their features. .

All nodes are networked together with 1 Gbps ethernet links spanning multiple switches.

Access to storage is also done through the Gigabit ethernet network.

Nodes within each queue are also networked together in a   low-latency fabric  for MPI communication, thanks to  Infiniband  or Omni-Path technology, performing  at  100 Gbps (HDR).

Table of available queues and nodes

Table below summarises queue, nodes  and their characteristics

 

Queue/Partition

Access Policy

Notes
long All users

- Allows allocations of a maximum of 10 nodes for running parallel jobs.

testing All users
serial All users

- ONLY FOR single core, cpu (or task) jobs; Parallel or MPI jobs will NOT WORK.

- Up-to a maximum of 7 running independent serial jobs are allowed.

- Resources are over-subscribed (the nodes are shared among jobs and users).

OTHER Dedicated Queues

cmsp Special authorization needed
esp
Special authorization needed

gpu

 

Special authorization needed
- several GPU Accelerators are  available of the type:
Nvidia Tesla K40
Nvidia Tesla P100
- resources are over-subscribed (the nodes are shared among jobs.).

Table with technical details

 

Queue/

Partition

Max walltime

(h)
Node rangeMicro-architectureCores

Ram per core

(GB/c)

Total

nodes

Total

cores

Ram per node

(GB)

long 24:00

node[61...92]

Cascade-Lake

16

11.69

32

512

187

testing 6:00 node[01...16]
node[161...184]
node[61...92]
see cmsp queue
see cmsp queue
see long queue

-
-
-

-
-
-
-
-
-

-
-
-
-
-
-
cmsp 24:00

node[01...16]
node[161...184]

Broadwell
Broadwell

20
20

6.4
9.6

16
24

320
480

128
192

serial 168:00 serial[01...02] Sandybridge 16 2 2 32 32
esp 24:00 node[21...56] skylake-cascade

20

6.4 36 720 128
gpu 24:00

gpu01
gpu02

Broadwell+ 2*gp100
Sandybridge+ 2*k40c

20
16

6.4
2

1
1

16
32

128
32