Documentation

Thunder test system (in development)

Created by BJ Smith, last modified on 2022-05-05

Temporary page not for general user community; documentation update pending.

The Thunder cluster is a test system that features Marvell's ThunderX2 Arm processors. These processors utilize the aarch64 instruction set, rather than the x86-64 instruction set used by Intel and AMD processors.

To request access to the system, email hpcrd@ucar.edu.

The cluster, procured from Aeon computing, features one login node and four batch/compute nodes.

Hardware

1 login node and 4 compute nodes
128 GB memory (login), 256 GB memory (compute)
2 Marvell ThunderX2 CN9980 CPUs with 32 cores per CPU
2 threads per physical core, for 128 total tasks per node
100 GB Mellanox EDR Infiniband
10 TB NFS space for home directories
40 TB Lustre space for scratch

Software

CentOS 7.8 operating system
Slurm batch scheduler

Page contents


Using your account on Thunder

Log in the same way you do on Cheyenne, with that username and your two-factor authentication method. Thunder must be accessed via a machine on the NCAR network. If you are working remotely, you can sign into Thunder from a Cheyenne session. Use the following domain:

ssh -l username thunder.ucar.edu

Once you sign in, you will have access to a home directory and a scratch space. For convenience, these spaces have been given the same paths as those on Cheyenne, but they do not use the same GLADE environment.
/glade/u/home/$USER – Home directory for scripts, code, and built executables
/glade/scratch/$USER – Scratch space for model input and output

Quotas are not enforced on these storage spaces and files are not purged. Please delete data that are no longer needed and be mindful of your storage footprint.


Supported software

Software is accessed via environment modules. When you first log in, you will have access to a default set of modules, but you can modify your environment using the module command. The following sub-commands are particularly useful:

module load <name> - Add any binaries, libraries, and compile headers from a particular software installation to your computing environment.

module unload <name> - Remove a software installation from your computing environment.

module purge - Remove all software installations from your computing environment.

module available - See all software installations that are installed on the system and available to load.

module reset - Return your computing environment to the default collection of modules.

By default, you will have access to the standard NCAR environment along with the gnu/9.1.0 compiler and openmpi/4.0.3. Many programs and libraries from Cheyenne and Casper are also available to load on Thunder. These include (but are not limited to) the following:

  • netCDF (serial and parallel)
  • PnetCDF
  • PIO
  • ESMF
  • Python
  • R
  • GDAL

A small subset of software is not currently available for Arm processors and has not been installed on Thunder. These include MATLAB and IDL.


Compiling programs and libraries

Use the Thunder login node to compile your programs. ThunderX2 processors have more physical cores (32) than the Intel processors on Cheyenne (18), though each core is slower. Therefore, we suggest using more compile threads than you would typically use on Cheyenne and Casper.

In general, the process of compiling software for Arm processors is the same as for Intel chips. The Intel compilers themselves are not installed on Thunder, so some adjustments may be necessary to build options to use either GCC or Clang/Flang flags. Additionally, many compiles specify the flag "-march=x86-64". Override this setting to specify "-march=native" instead, which should work for both Intel (x86-64) and Arm (aarch64) processors.

Loading the ncarcompilers module simplifies the process of including headers and linking libraries (netcdf, for example) at compile time.


Submitting interactive and batch jobs

The Thunder cluster uses Slurm for submitting jobs to the compute nodes. Both interactive and batch jobs are supported. For interactive sessions, Slurm has a two-stage process in which you first request resources, and then run programs on the resources you have been allocated. Here is an example in which we allocate a single task and then run Python on that compute node:

salloc --time=1:00:00 --ntasks=1 --mem=10G
srun python

Any commands run without srun will be executed on the login node. To run a shell on the compute node, begin a shell session as follows:

srun --time=6:00:00 --ntasks=1 --pty /bin/bash

Batch scripts are submitted with the sbatch command. Resources for the batch job are requested via header directives. In this example, we request two nodes for 6 hours of execution time on the regular partition:

#!/bin/bash
#SBATCH --job-name=wrf_simulation
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=128
#SBATCH --time=6:00:00
#SBATCH --partition=regular
#SBATCH --output=wrf.log

### Load modules used to compile WRF
module purge
module load gnu/9.1.0 netcdf/4.7.1 openmpi/4.0.3

srun ./wrf.exe

If no wallclock time is specified, a default of 12 hours is assigned. If no memory request is provided, the job will be allocated 2 GB per task. Please be mindful of other users and request only the amount of time you expect your job will need. Accounting is not active on Thunder, so jobs generally dispatch in a first-in-first-out manner.

If you need exclusive access to a Thunder node, specify the "--exclusive" flag to sbatch either at the command line or in an #SBATCH directive. To allow other users quick access to the compute resources, use this option only when necessary.

Simultaneous multithreading

Better application performance is generally expected with two computational threads per physical core (SMT-2), so this mode is enabled on Thunder. SMT-2 is analogous to Intel's hyper-threading, which can be used on Cheyenne. If you prefer to use only one thread per physical core in your job, it is important to tell Slurm to disable multiple threads and thereby ensure that the MPI library places your tasks properly. Note the difference in batch directives for each approach:

### Two threads per core (default)
#SBATCH --ntasks-per-node=128

### One thread per core
#SBATCH --ntasks-per-node=64
#SBATCH --hint=nomultithread


Getting help

Since Thunder is a test system, send any support inquiries, software and hardware concerns, or requests for access to hpcfl@ucar.edu instead of the CISL Help Desk.

We also welcome any feedback and performance reports that you can share as you test your workflows on Thunder.