Is there a way to tell if a workflow is stopping from a job?

jfrost-mo · September 28, 2023, 1:34pm

I have a long running job that acts as a server for other jobs in the workflow. When the workflow terminates early, such as when a job fails, the server remains running.

This is what I want, as it allows restarting the tasks to continue the workflow, however when the workflow is stopped in this state it just hangs waiting for the server task to finish.

While it is possible to kill the workflow job, after which it will stop properly, I was wondering if it was possible to detect at runtime if the workflow was stopping, so I can have the job stop itself.

I’m using cylc 8.2.

wxtim · September 28, 2023, 7:54pm

Depending on exactly what you want to do scheduler events might do what you want:

[scheduler]
  [[events]]
    shutdown handlers = """
      kill $(grep -o MARKERSTRING(12345)MARKERSTRING ~/cylc-run/stall-nice/runN/log/job/1/server/NN/job.out)
    """

...

[runtime]
  [[long_running_server]]
    script = """
      while true; do
        echo "MARKERSTRING${$}MARKERSTRING"
        # whatever else your server is up to
      done
    """

If I’ve misunderstood you, and you are using task events from the non-server tasks to pull the workflow down, and you want the server leaving up you can use cylc stop --now --now to stop the scheduler without killing the server job.

dpmatthews · September 29, 2023, 5:55am

cylc stop --now is sufficient to stop the workflow (I wouldn’t recommend --now --now unless you are certain you won’t want to restart your workflow).

However, if the aim is to kill the job, just use cylc stop --kill.

jfrost-mo · September 29, 2023, 7:03am

Shutdown handlers look like they can do what I want, thanks.

Essentially all I want is the behavour of cylc stop --kill when I press the button in the GUI, without having to remember to set the stop mode (Unless there is a way to set that as the default?).

cylc stop --now doesn’t work for my usecase as the server job will then just remain running, blocking ports and resources on the job host. Because the server task needs to run for the entire length of the workflow, it has very generous wall clock limits, so will take hours to time out naturally.

oliver.sanders · October 2, 2023, 8:25am

There’s no way to hardcode --kill as a default stop mode in the GUI, I don’t think we’ll be adding one.

I’m not quite sure I understand the behaviour you want here.

If you want the workflow to automatically shut down, killing active jobs in the process when a certain task fails, then you could do something like this:

some_task? => do_something
some_task:fail? => shutdown

[runtime]
  [[shutdown]]
    script = cylc stop --kill "${CYLC_WORKFLOW_ID}"

If you want to extend this to a family of tasks you could do:

SOME_FAMILY:fail-any? => shutdown

Topic		Replies	Views
Stop a workflow from a cycle, killing future tasks Cylc Support	3	35	June 24, 2025
Can't restart workflow Cylc 8 Migration	5	243	April 20, 2023
Cylc kill failing within event handler Cylc Support	4	141	April 8, 2024
Showing workflow state after run has finished Cylc Support	3	209	July 3, 2023
Killing rose tasks Rose Support	2	358	September 13, 2021

Is there a way to tell if a workflow is stopping from a job?

Related topics