Is there a way to tell if a workflow is stopping from a job?

I have a long running job that acts as a server for other jobs in the workflow. When the workflow terminates early, such as when a job fails, the server remains running.

This is what I want, as it allows restarting the tasks to continue the workflow, however when the workflow is stopped in this state it just hangs waiting for the server task to finish.

While it is possible to kill the workflow job, after which it will stop properly, I was wondering if it was possible to detect at runtime if the workflow was stopping, so I can have the job stop itself.

I’m using cylc 8.2.

Depending on exactly what you want to do scheduler events might do what you want:

[scheduler]
  [[events]]
    shutdown handlers = """
      kill $(grep -o MARKERSTRING(12345)MARKERSTRING ~/cylc-run/stall-nice/runN/log/job/1/server/NN/job.out)
    """

...

[runtime]
  [[long_running_server]]
    script = """
      while true; do
        echo "MARKERSTRING${$}MARKERSTRING"
        # whatever else your server is up to
      done
    """

If I’ve misunderstood you, and you are using task events from the non-server tasks to pull the workflow down, and you want the server leaving up you can use cylc stop --now --now to stop the scheduler without killing the server job.

1 Like

cylc stop --now is sufficient to stop the workflow (I wouldn’t recommend --now --now unless you are certain you won’t want to restart your workflow).

However, if the aim is to kill the job, just use cylc stop --kill.

Shutdown handlers look like they can do what I want, thanks.

Essentially all I want is the behavour of cylc stop --kill when I press the button in the GUI, without having to remember to set the stop mode (Unless there is a way to set that as the default?).

cylc stop --now doesn’t work for my usecase as the server job will then just remain running, blocking ports and resources on the job host. Because the server task needs to run for the entire length of the workflow, it has very generous wall clock limits, so will take hours to time out naturally.

There’s no way to hardcode --kill as a default stop mode in the GUI, I don’t think we’ll be adding one.

I’m not quite sure I understand the behaviour you want here.

If you want the workflow to automatically shut down, killing active jobs in the process when a certain task fails, then you could do something like this:

some_task? => do_something
some_task:fail? => shutdown

[runtime]
  [[shutdown]]
    script = cylc stop --kill "${CYLC_WORKFLOW_ID}"

If you want to extend this to a family of tasks you could do:

SOME_FAMILY:fail-any? => shutdown
1 Like