Documentation

Machine learning and deep learning

Created by Unknown User (bjsmith), last modified on 2023-01-20

CISL provides several libraries for users' machine learning and deep learning (ML/DL) work on Casper nodes

These libraries have been compiled from source to use native CUDA (GPU) and MPI libraries, increasing the capabilities over downloadable distributions that are available online. The ML/DL library installations can be found in NPL versions for Python 3.7.9.

Users load them by activating the NCAR Python Library (NPL).

The libraries available are:

  • TensorFlow machine learning library v2.3.1
  • PyTorch machine learning library v1.7.1
  • scikit-learn machine learning library v0.5.3
  • Horovod deep learning framework v0.21.0
  • Keras deep learning library v2.4.3

Starting a job

ML/DL workloads are most likely targeted toward NVIDIA's Tesla V100 hardware. To start an interactive job on a Casper node using a V100 GPU, run the execcasper command with the ngpus=# and gpu_type=v100 resources set as shown in this documentation.

Then load the modules you need, including Python version 3.7.9, and activate the NPL.