Daily Bulletin

Derecho weekly update

April 17, 2023

The Derecho project team and HPE engineers continued working through the Acceptance Testing Phase (ATP) and encountered several functional and resiliency issues. The project team worked with HPE subject matter experts and have resolved most of those issues, including some major performance problems with the Lustre scratch file system. The team also completed a second power-down and power-up test for Derecho and its storage system. In preparation for the longer-duration availability and benchmarking tests, which are expected to start in a few days, the team set up system monitoring and metrics collections and finished porting the application test suite that will be used for availability testing. Finally, the supporting facility cooling infrastructure was successfully stress-tested with a heavy workload applied to all compute nodes.

The project is on schedule to complete acceptance testing by the end of May and we expect to make Derecho available to Accelerated Science Discovery (ASD) users in the first week of June. Depending on ASD usage levels, CISL may provide access to additional early users. ASD access is expected to last for two months, after which all users with allocations will have access to Derecho and can begin transitioning their work from Cheyenne. Cheyenne will be kept online until the end of December. 

The project team has arranged a Derecho Roadshow overview and training event for ASD and selected early users on April 26. This will be a hybrid event from the Mesa Lab Main Seminar Room and available through Zoom. The April roadshow will be a half-day introduction to all things Derecho, including hardware, software, best practices, and transition guidance from Cheyenne. The project team plans to tour NCAR labs and conduct virtual roadshows for user communities including university users.