Daily Bulletin

New Default “Rerun” Behavior for PBS Jobs on NCAR’s HPC Resources

June 12, 2024

In response to a PBS issue observed in the past month, CISL staff have changed the default “rerun” behavior of submitted batch jobs on both Derecho and Casper.  Previously, batch jobs were considered rerunnable by default, meaning that in case of startup issues, the scheduler would automatically attempt to rerun the job. After the PBS upgrade this past May, this feature began interacting negatively with job deletion, making jobs occasionally “unkillable” by users.

CISL staff has raised the issue with the vendor to seek a long-term solution to the root cause. In the interim, the default behavior on NCAR systems has been changed for jobs to not be rerunnable. Since this feature is rarely influential on a healthy system, we do not expect notable impact in the vast majority of user cases. 

Of important note, the array launching command launch_cf has been updated for this, but any user who natively launches job arrays will need to make changes to their workflow, detailed below.

For any users wishing to control the re-runnability there are three options available. At the time of job submission you may:
  • Include the #PBS -r y  directive in your batch script
  • Submit your job via qsub -r y […usual arguments…]
  • After submitting your job, you may qalter -r y <JOBID>

Thank you for considering this matter. Feel free to contact the help desk with any questions.