HPC Systems Scheduled Maintenance - August 12-15

July 9, 2024

All HPC Resources will undergo maintenance the week of August 12th. Derecho, Casper, JupyterHub, GLADE file systems, data access nodes, and Globus transfers will all be unavailable for separate periods throughout the week.

A primary maintenance activity will be the upgrade and expansion of the Bifrost high-performance network which interconnects all HPC resources. This maintenance is required both to upgrade the software versions on the underlying network switches, and importantly to expand capacity in preparation for additional Casper resources to be installed later this year. During the Bifrost upgrade all resources will be temporarily unavailable.

Another major focus will be upgrading the management software used to control the hardware operations and operating system deployment on Derecho. This upgrade is much needed to resolve several issues we have encountered and is expected to generally improve overall system stability. The Derecho maintenance will run concurrently with the Bifrost upgrade, however may take additional time. For this reason we reasonably expect to return Casper, GLADE, Globus, and JupyterHub to service first, followed by Derecho.

As typical, scheduler reservations will be put in place across the systems to ensure that all user jobs have completed by August 12 as the downtime begins. Any jobs that are queued on Derecho and Casper when the downtime begins will be retained for execution when the systems return to service.

Progress will be communicated through the Notifier system throughout the course of the outage.