Some Derecho jobs are suitable for running in the system's "preempt" queue on an as-available basis on resources that would otherwise be idle. Running them in that queue allows them to be preempted with minimal impact when a higher-priority job requires the use of those resources. Suitable workflows include those with short or fairly unpredictable runtimes; for example, data processing, file movement, or running analysis tools that have an efficient checkpoint/restart capability.
The "preempt" queue is similar to the "main" Derecho queue in that it serves to route jobs to the system's CPU or GPU nodes. It is different, however, in that:
The start time of a job in the preempt queue will be unpredictable because of the idle resource requirement. Once it starts, it is guaranteed at least 10 minutes of runtime but potentially much more. The duration depends on jobs other users submit to PBS after your job begins.
To submit a job to the preempt queue, simply specify preempt as the queue name in your PBS script header as follows:
The walltime specification is the job duration upper limit. Use the specifier #PBS -r to indicate if the job should be rerun if it is preempted; valid options are y or n for yes or no. All other aspects of the PBS script are unchanged by the use of preemption.
Abrupt termination may be entirely acceptable for some workflows. This could be the case for batch removal of a large number of files, for example, or if the application writes frequent checkpoint files and can restart successfully after being interrupted. In other cases, it may be beneficial for the application to take a specific action within the 10-minute grace window. Such an approach is possible with minor changes to the application as described below.
Idle resources are a prerequisite for jobs in the preempt queue to start. The smaller the resource request, the more likely there will be an idle portion of the machine on which to schedule the job. Conversely, large jobs in the queue are likely to wait for long periods of time, if they execute at all. The ideal use case is small-to-medium sized jobs that are robust to interruption and that can make meaningful progress in short periods of time.
Jobs run in the preempt queue are charged at a queue factor of only 0.2, less than jobs in the "economy" queue. Jobs that do not run to completion because of preemption are not charged against your allocation.
When a job running in the preempt queue is targeted for preemption, PBS notifies the running process through a UNIX signal. PBS then waits 10 minutes before killing the process. A properly configured application can receive the notification signal, act upon it (typically through an existing checkpoint functionality), and then terminate gracefully rather than be terminated abruptly at the end of the grace period. The steps required to configure an application in this manner are:
Steps 1 and 2 are fairly common across applications and even programming languages. Step 3 is application-specific, and usually involves writing the application state to disk so that it can be restarted later. For some applications, however, an even simpler approach may be possible. For example, if the target application is a data-processing pipeline, it may suffice to receive the termination notification, complete the current processing step, and simply exit without beginning additional steps in the pipeline.
For traditional compiled languages such as C/C++ and Fortran, signal handling is most readily accomplished through some minimal C functions, even inside a predominantly Fortran application. This is because the operating system application interface is C based. The following shows the minimal required steps.
First, we declare a C function my_sig_handler, which takes the signal identifier as input. In this example we construct a switch statement that allows for processing different types of signals in the same code block. It is evident from the listing that if the function is called with a SIGINT, SIGTERM, or SIGUSR1 signal then we set the flag checkpoint_requested and print an informative statement. For completeness, if called with any other signal, we print a diagnostic message as well but take no other action.
Second, we call the system routine signal() to register our function for the specific signals we want the application to process. In this case, we are asking the operating system to call our function my_sig_handler() any time a SIGINT, SIGTERM, or SIGUSR1 is encountered.
The third step is application specific and not listed, but the general idea is elsewhere in the application (for example, the main time step loop) we would check the value of the checkpoint_requested flag and take appropriate action to save state and exit gracefully.
To integrate such an approach into a Fortran application, it is simplest to create a C function taking no arguments that encapsulates the signal registration process and calling it from within your Fortran main application. Please contact CISL help for further assistance.
While the most common use case is compiled languages as shown above, it is also possible to catch and act upon signals when your main application is a shell script or in Python, as shown below.
In shell scripting, the process is generally similar, with some very slight changes in terminology. Notably, in shell scripts 'traps' can be used to intercept signals and call a user-specified function, in much the same way a signal handler can be installed in a compiled language. A complete example follows.
Running the previous code will enter a "Main Function" loop that executes a number of steps. Sending Control+C to the running program effectively sends a SIGINT and invokes the desired signal handling function through the bash trap mechanism.
Finally, Python provides the signal module, which can be used in a user application to catch signals from the operating system as shown here:
All the sample scripts above are available through the NCAR hpc-demos GitHub repository in the PBS/preempt subdirectory.