Casper changes coming to improve throughput, simplify job scheduling

February 22, 2021

Updated: March 2, 2021

Two major changes to improve job throughput and simplify job scheduling will be made to the Casper cluster during the system maintenance downtime scheduled for March 9-10. As announced late last year, 64 nodes will be added to the system and configured similarly to Casper’s existing non-GPU nodes. This expansion will enable high-throughput computing (HTC) on Casper for batch jobs and tasks that typically use one or two nodes. The HTC environment is expected to relieve the backlog of queued jobs on Casper and, eventually, on Cheyenne as more small jobs will be able to run on the new Casper nodes.

Second, Casper’s scheduler will be transitioned from Slurm to the PBS Pro workload manager used for Cheyenne. The new HTC nodes will be accessible only through PBS Pro. Casper’s existing 36 nodes will remain available through Slurm but will be transitioned to PBS Pro over the next several weeks. CISL will provide documentation and training to help users transition their scripts and workflows to PBS. Watch for more announcements coming soon in the Daily Bulletin.