The availability testing portion of Derecho’s Acceptance Test Phase (ATP) began at 0900 on May 3, 2023 and system availability requirements are being met so far. For the last four weeks, the Derecho project team and HPE subject matter experts have been working around the clock to resolve various challenges with faulty hardware and the software stack, particularly with the Slingshot network, the PBS scheduler, and the Lustre file system. In preparation for the longer duration availability and benchmarking tests, the project team has set up system monitoring and metrics collections and the consulting team has completed application work and deployed the latest available versions of compilers and libraries from HPE and third-party vendors. Additionally, codes in the NWSC-3 benchmark suite have been re-compiled against the latest software stack. Application performance tuning is ongoing and, while some tuning work is needed to get the best performance from the Slingshot network interconnect at scale, performance is generally meeting expectations. As a result, Derecho is looking significantly more stable than four weeks ago.
Barring any serious issues with Derecho during the remaining availability testing, CISL is on track to complete system acceptance before June. If that happens, CISL will provide Accelerated Scientific Discovery (ASD) users access to Derecho in the first week of June. If ASD users are unable to keep Derecho busy, CISL will provide access to additional early users in order to increase the utilization of the system.
After the ASD usage phase, which is expected to last for two months, all remaining users will be given access to Derecho and may begin transitioning their work from Cheyenne. Cheyenne will be kept online until the end of December 2023 to allow users to remain productive during their transition to the new system.