Robust coding of expired tasks

fredw · April 15, 2020, 4:27am

I use cylc 7.8.1. Per cycle point all my tasks lead to a housekeep task, which uses rose_prune to tidy up logs/work/share etc. So without the housekeep task being triggered the suite will eventually fail to advance.

In this example I have tasks that copy data which doesn’t exist when the suite gets too far behind wall-clock time. I think I got this right by writing:

[[[T12]]]
    graph = """
        copy | copy:expired =>  housekeep
    """

This is the simplest case. Could I write this in a shorter way as copy:finish => housekeep ? Does :finish include :succeeded and :expired?

In a more complex case I create a plot from this copied data and other local model output which is polled for by a polling task (succeeds if local file exists, typically fails a couple of times before eventually succeeding, but may expire). The plotting shouldn’t happen if the polled data has expired, neither should it plot when the copied data has expired. But if poll and copy expire (and plot never happened), the housekeeping should still run. The thing with the polling task is that I can reasonably trust for this local data to eventually turn up (but I still want to handle expiry as there is no use in plotting data from weeks ago). The copy task on the other hand is less robust and data may never turn up. So the copy task should be allowed to fail without holding up the rest of the suite.

This is what I came up with to achieve this behaviour, and I am wondering whether it is complete:

[[[T12]]]
    graph = """
        poll & copy => plot
        poll:expired => !plot
        copy:expired => !plot
        copy:fail  => !plot
        poll:expired | copy:expired | copy:fail | plot => housekeep
"""

Thanks for any comments and hints!
Fred

hilary.j.oliver · April 15, 2020, 10:15pm

Hi Fred,

No, copy:finish is short for copy:succeed | copy:fail, i.e. “finished executing”. Whereas copy:expired means don’t bother executing task copy because it is too far behind the clock.

If I understand the description of your workflow properly, your graph should look something like this:

[[[T12]]]
    graph = """
        poll & copy => plot
        poll:expired | copy:expired | copy:fail => !plot & no_plot
        plot | no_plot => housekeep
            """

To explain:

Separate triggers for the same task are equivalent to AND. So this:

a => plot
b => plot

is equivalent to this:

a & b => plot

And the same goes for suicide triggers:

poll:expire => !plot
copy:expire => !plot
copy:fail => !plot

is equivalent to:

poll:expire & copy:expire & copy:fail => !plot

i.e. BOTH poll AND copy have to expire, AND copy has to fail, for plot to be removed from the workflow - which ain’t gonna happen. From your description, you really want this:

poll:expired | copy:expired | copy:fail => !plot

The other change is optional: I’ve used a dummy task no_plot to signify that plot was removed and no plotting was done. Then you can explicitly trigger housekeeping off of plot or no_plot:

        poll:expired | copy:expired | copy:fail => !plot & no_plot
        plot | no_plot => housekeep

I happen to think that’s easier to understand, but you could stick to your original housekeep trigger line:

        poll:expired | copy:expired | copy:fail => !plot
        poll:expired | copy:expired | copy:fail | plot => housekeep

Note (just in case I’ve got it wrong!) you should be able to test this with a dummy suite (i.e. just your graph with sleep 10 tasks or whatever) with initial cycle point and clock and expire triggers contrived so that you don’t have to wait long to see what happens.

Hope that helps.

Hilary

fredw · April 16, 2020, 5:16am

Thank you so much! Your explanations are very clear.

I programmed the suite according to your last suggestion:

poll:expired | copy:expired | copy:fail => !plot
poll:expired | copy:expired | copy:fail | plot => housekeep

I find it easier to get my head round it, as I was a bit scared of introducing a dummy task.

Again, thanks very much. Your help is a life-line!

fredw · October 4, 2023, 5:15pm

Hello @hilary.j.oliver

I have come back to look at suicide triggers in my workflow as I am porting them to cylc8.
How should this be done in cylc8?

My suites have tons of these constructs in them (inspired by the original replies above):

            graph = """
                download => run_processing => merge_files
                download:expired => !run_processing
                download:expired | run_processing:expired => !merge_files
                download:expired | run_processing:expired | merge_files => housekeep
...

hilary.j.oliver · October 4, 2023, 8:27pm

I have come back to look at suicide triggers in my workflow

Good news, you can remove them all.

In Cylc 7-, suicide triggers had to be used to clean up graph branches not taken at run time, because the scheduler would pre-spawn upcoming instances of all tasks to be available if needed. The Cylc 8 scheduling algorithm is event-driven - tasks spawn on-demand as upstream outputs are generated. So suicide triggers are no longer needed, in general.

https://cylc.github.io/cylc-doc/stable/html/user-guide/writing-workflows/suicide-triggers.html#suicide-triggers

fredw · October 5, 2023, 10:39am

Thanks, that is good news indeed, because long chains of branching tasks coded with suicide triggers were becoming unmaintanable after a certain chain length.

But my question remains - what do I do with this construct I mentioned above.

I want the post-processing of the downloaded file and the merging of that processed file into the downloaded file to be omitted only if the download expired (the server only keeps files online for 7 days , so it’s pointless attempting the download after that, hence the expiration)

In this discussion Workflow with suicide triggers I saw the construct

@clock => a? => b? => c?
a:finish => housekeeping

So can I translate my above construct into cylc8 like this?

    download => run_processing? => merge_files? => housekeep?

or like this?

    download? => run_processing? => merge_files? => housekeep

where do the question marks go?

Thanks!

MetRonnie · October 5, 2023, 11:36am

The question marks go where you do not want the scheduler to stall if that task did not succeed. So if it’s the download task that you are handling not succeeding, you would just do:

download? => run_processing => merge_files => housekeep

But if you want housekeep to run if download expired too then I think you would need to do something like

download? => run_processing => merge_files
(download:expired | merge_files ) => housekeep

fredw · October 5, 2023, 12:06pm

Oh wonderful. Thank you so much. That makes this type of chain a bit simpler.

I have several versions of a more complicated version of this scheme:

            graph = """
                get_gfs & (get_sst[+PT18H] | get_sst[+PT18H]:expired) => convert_gfs => housekeep
                get_gfs:succeed => ! expire_convert_gfs
                get_gfs:expired => expire_convert_gfs
                get_gfs:expired | convert_gfs | convert_gfs:expired => housekeep
            """

It’s supposed to download the GFS data and the SST data (for +PT18H), then convert the GFS+SST data. But the SST data is optional for th convert_gfs task, if the SST files have already expired the convert_gfs task can run without SST (the GFS data is online for longer than the SST data, so get_sst might expire first)

But get_gfs might also expire, and in that case we don’t need to run convert_gfs at all. In cylc7 I had a task called expire_convert_gfs that uses cylc set-outputs ... --output=expired to expire the conversion task. It sounds like this is no longer needed.

So I re-wrote the whole thing like this:

            graph = """
                (get_gfs & (get_sst[+PT18H] | get_sst[+PT18H]:expired))? => convert_gfs
                (get_gfs:expired | convert_gfs) => housekeep
            """

Does that look correct?

Thanks!

MetRonnie · October 5, 2023, 3:11pm

Ah unfortunately I had forgotten that the :expired trigger is currently broken, pending a fix targeted for Cylc 8.3.0 - :expire trigger broken · Issue #5364 · cylc/cylc-flow · GitHub.

Additionally the :expired trigger is going to be incorporated in the optional outputs scheme so will need a question mark. So ultimately I think what you’ll be after is this:

get_gfs? => convert_gfs
(get_sst[+PT18H]? | get_sst[+PT18H]:expired?) => convert_gfs
(get_gfs:expired? | convert_gfs) => housekeep

but I’m afraid it won’t work until 8.3.0 is released.

Note it isn’t valid to put a question mark after parentheses the way you did
Instead of using an AND operator I’ve split your first line into two for readability, but both do the same thing

fredw · October 5, 2023, 3:43pm

That notation makes a lot of sense. Making suites deal with expiring tasks is not the most important thing as it only serves to recover from rare errors. It makes the suites more resilient, but it can wait for the next version. So I changed my graphs according to your suggestions, and the validation is OK.

Thanks for taking the time to reply
Fred

Topic		Replies	Views
Debug a stalling suite that uses expiring tasks Cylc 8 Migration	6	271	November 23, 2023
Dependencies during boostrapping (R1 tasks) Cylc 8 Migration	19	330	November 16, 2023
Graph branching vs. suicide trigger Cylc Support	4	34	January 29, 2025
Skipping tasks that aggregate forecasts Cylc Support	5	461	January 13, 2022
Translate Cylc 7 suicide triggers to Cylc 8 Cylc Support	7	28	July 15, 2025

Robust coding of expired tasks

Related topics