Daily Bulletin

Derecho weekly update

May 15, 2023

The availability testing portion of Derecho’s Acceptance Test Phase (ATP) continued last week and Derecho is meeting availability requirements with an average of 30,000 jobs running on the system daily and fewer than 75 daily failures. The Derecho project team continues working on two primary application-related issues with HPE subject matter experts. The first issue is that intermittent launch failures are preventing a fraction of the submitted PBS jobs from properly starting applications. HPE found a bug and is working on a fix, but does not have an estimate for when the fix will be made available to NCAR. The second issue is that 0.25% of the submitted jobs are being aborted due to performance variability, resulting in excessive runtime. Currently, this issue can be easily demonstrated by running the HPCG benchmark.