HPC systems maintenance downtime January 17-20

December 5, 2022

NCAR’s HPC resources will be unavailable to users on January 17-20 while CISL staff reconfigure the high-performance network to accommodate the Derecho supercomputer and perform urgent maintenance on Cheyenne’s cooling infrastructure. Progress will be communicated through Notifier emails during the week.

The network maintenance will make all of these resources temporarily unavailable during the beginning of the maintenance window: Cheyenne, Casper, Campaign Storage, GLADE, Quasar, Stratus, Gust, and JupyterHub. With the exception of the Cheyenne compute nodes, those resources will be restored to service after the first day of the scheduled downtime.

The remainder of the outage will focus on replacing the cooling infrastructure working fluid inside the Cheyenne compute node racks. No Cheyenne software updates are being made, so we expect no user-visible changes. CISL staff will verify resource functionality before releasing the systems to the user community.

Scheduler reservations will be put in place on both Cheyenne and Casper to ensure that all user jobs have completed by January 17 as the downtime begins. Any jobs that are queued when the downtime begins will be retained for execution when the systems return to service. Services will be returned to the community incrementally throughout the maintenance window, concluding with Cheyenne general availability.