Scheduling a "final" task?

funkapus · September 27, 2021, 12:26pm

I need a task to run during the final cycle point and after all other tasks associated with that or earlier cycle points have completed – in other words, enforced to be the final task of the suite run.

In my suite, the flow of tasks in any given cycle point is such that I can easily write a scheduling dependency for this finalization task that will make it the last task of the final cycle point. But that doesn’t ensure it’s the last task of the suite, because there may still be tasks associated with earlier cycle points that have not yet completed (and in fact, there typically will be). I need this task to trigger when there are no other tasks remaining in the suite run.

Is there a straightforward way to do this? I haven’t spotted it in the docs.

Thanks!

EDIT: I should say that (of course) shortly after I posted this, I did think of one workaround to make this happen: implement my full big suite as a subsuite of some other suite, with only one cycle point and two tasks. The first task runs my suite, and the second (which depends on the first’s success) is the final task I want to implement. But this approach is problematic, as the task will need access to a large number of environment variables that will be present in the main suite. So instead I’d need to set those in the new parent suite, and then pass them down via Jinja2 to the original main suite, which seems a bit ugly just to run one task so I hope I don’t have to do that.

edmundh · September 27, 2021, 11:45pm

Interested to hear what the experts have to say here, but I wonder whether the answer might involve the [[[ R1/P0Y ]]] (aka [[[ R1/$ ]]]) syntax mentioned in the scheduling advanced examples?

Should be clear I’ve never played with this, mention only as have spotted it in passing!

Not sure if you can put your final task in the graph there “bare” - would be nice if could (bit like a finally clause in Python).

But given usual eager-to-run behaviour in normal cycle points, suspect you might have to explicitly specify in graph that all the various last-task-in-earlier-cycle-points are prerequisites for this final task, so you could get your guaranteed “this really will run last”?

If it’s this, it looks like the $ symbol may be helpful when specifying prerequisites, to avoid missing any other tasks running at the final cycle point.

Alternatively, if you can make each cycle point complete with an identical task, and you can avoid running those tasks at the final cycle point (e.g. via date exclusion???), I wonder if it would be possible to use a trigger family (my new favourite toy!), and have something like (untested)

# <snip>
[[dependencies]]
[[R1]
# <snip>
[[[PT1H]]]
# Or whatever your regular cycle is
# Likely needs adjusting to avoid running *at* final cycle point?
# Would this be PT1H!$ ???
graph = """a =>
    b =>
    cycle_complete
"""
[[[ R1/P0Y ]]]
graph = COMPLETE_TRG:succeed-all => my_really_final_task

[runtime]
# <snip>
[[COMPLETE_TRG]]
   [[[meta]]]
       title = Trigger for tasks at end of given cycle points
# <snip>
[[cycle_complete]]
inherit = FOO, BAR, COMPLETE_TRG
# <snip>

This is naively hoping that not specifying previous cycles in COMPLETE_TRG:succeed-all automagically picks up cycle_complete tasks from all previous cycles as prerequisites…

Edit: hmm, actually, from the inter-cycle trigger docs I don’t think this will pick up cycle_complete tasks from previous cycles :(.

wxtim · September 28, 2021, 7:19am

Absolutely Edmund - R1/POY is the thing! It’s hardly the most intuitive label.

If I’ve read the question correctly, the real problem is that you can easily identify the last task in each cycle point, and needs them all to have completed before running the final task. One solution that occurs to me is use jinja2 to add many dependencies:

#!jinja2
[scheduling]
  [[dependencies]]
    [[[ P1Y ]]]
      graph = everything => cycle_point_end_task
    [[[ R1/P0Y ]]]
      {% for X in range(1, 12) %}
      graph = cycle_point_end_task[-P{{X}}Y] => workflow_end_task
      {% endfor %}

WARNING: Depending on how many cycle points you have this pattern may lead to the Cylc scheduler holding vast numbers of tasks in memory. Use with caution.

hilary.j.oliver · September 28, 2021, 11:18pm

Interesting question.

What you really want here is a distinct pre-shutdown graph that runs after the main graph has finished (kind of like a “finally clause” as mentioned by @edmundh).

We actually have a plan to implement start-up and pre-shutdown graphs, but it’s not done yet.

For the moment you’ll have to use:

a recurrence graph that targets the final cycle point as suggested by @edmundh and @wxtim , along with dependence on whatever tasks your final one needs to wait for (the cycle point itself does not determine when a task can run - unless it also has a clock-trigger - that’s entirely down to dependencies)
or a back-door way of delaying the final task, by having it wait on another task or an xtrigger that watches the scheduler to determine when all previous cycle points have finished.

For 1., @wxtim 's suggestion makes the final task depend on all previous cycle-point-end tasks. That’s a good solution so long as the Jinja2 loop is easy (which depends on the complexity of your cycling).

Alternatively, without constraining earlier tasks you could force the cycle-point-end tasks run in order:

[scheduling]
    cycling mode = integer
    initial cycle point = 1
    final cycle point = 5
    [[dependencies]]
        [[[P1]]]
            graph """
                a => b => cycle_end
                cycle_end[-P1] => cycle_end
            """
        [[[R1/$]]]
            graph = "cycle_end => workflow_end"

For 2., you could do this:

        [[[P1]]]
            graph """
                a => b => cycle_end
                cycle_end[-P1] => cycle_end
            """
        [[[R1/$]]]
            graph = "cycle_end => check_prev_cycles => workflow_end"

where the new task check_prev_cycles repeatedly polls the scheduler to check that no tasks remain in previous cycles, e.g. by parsing the output of cylc dump -t $CYLC_SUITE_NAME. It could do the polling internally itself, or use automatic retries to keep trying until the condition is met. OR you could write an xtrigger function that does something similar and make the final task depend on that.

Hilary

hilary.j.oliver · September 28, 2021, 11:37pm

True, but (for other readers) at least it is standards-based, and the ISO 8601 date-time notation provides the power and flexibility we need!

MetRonnie · September 29, 2021, 11:53am

R1/P0Y is short for R1/P0Y/<final-point> (i.e. it is short for ISO 8601 recurrence format number 4).

However I feel that R1/$ is the more intuitive way of writing the same thing. In this case it’s short for recurrence format 3 (the duration is omitted as it’s irrelevant/meaningless considering R1 means there’s only 1 occurrence)

edmundh · September 30, 2021, 8:03pm

Ooh - I’d not seen that table (/repo) before @MetRonnie - very nice - adding that to my goto references when I need to think about recurrences!

Also clink - the sound of the penny dropping as I finally understand why random-glyphs-to-me-up-to-now ^ and $ syntax is used in for start and end points: presumably by analogy with regex’s start and end of line anchors!

Would you all be open to a PR on docs very briefly making this analogy explicit? Say somewhere around:

here (or equivalent place in cylc8 ones) in cylc docs (maybe here too?)
here in the tutorials

Ask as speaking personally, as a non-SSE, who has only ~recently added regexes to my toolbox, that insight wasn’t as blindingly obvious as am sure would be to many/most people writing suites. And a very brief mention in passing in docs would have been super-helpful - I think that having understood this link now will make remembering/parsing/using this syntax so much so easier for me (e.g. I won’t have to think “I know syntax exists, but will need to go to docs to look it up”)

hilary.j.oliver · September 30, 2021, 10:31pm

Yep, dead right

That would be great, thanks! I’m sure you’re right that it’s not blindingly obvious to all. You should go straight to the Cylc 8 docs, the git repository is here: GitHub - cylc/cylc-doc: Documentation (User Guide, Cheat Sheets, etc.) for the Cylc Workflow Engine.

Let us know if you need any pointers on creating and submitting a pull request.

MetRonnie · October 1, 2021, 9:34am

Caveat: the description of recurrence format 1 does not match the behaviour in Cylc 7 due to metomi/isodatetime#45. We’re planning to make the behaviour match the description in Cylc 8. Luckily we think this recurrence format is rarely used.

bencash · May 16, 2022, 7:36pm

Hello everyone - I wanted to check and see if the recommendations here for scheduling a final task after all other tasks have completed is still valid for cylc8. Thanks!

hilary.j.oliver · May 17, 2022, 2:57am

Yep, that hasn’t changed for Cylc 8.

Topic		Replies	Views
Final task where final cycle point may not be congruent with cycle period Cylc Support	6	118	February 16, 2024
Run task at last cycle point after all instances of another task are successful Cylc Support	3	26	November 1, 2024
Revisiting sequential tasks Cylc Support	2	453	August 27, 2019
How to know all tasks of a cycle point finished Cylc Support	2	517	September 23, 2019
Ignore dependencies after the final cycle point Cylc Support	3	43	March 21, 2025

Scheduling a "final" task?

Related topics