Migrating clock-triggers from cylc7 to cylc8

Hello,

Can you help me convert clock-triggers from old-style cylc7 syntax to the new-style cylc8 style? I see lots of very strange behaviour in my suites when running in cylc8 and I suspect it’s not dealing with clock-triggers in the same way in compatability mode. In cylc8, tasks get started that are way beyond the runahead and the scheduler doens’t wait for the clock-trigger time properly, as it would have done in cylc7.

I saw the notes in Workflow Configuration — Cylc 8.2.2 documentation and External Triggers — Cylc 8.2.2 documentation
but I can’t see how my existing suite.rc should be converted to this new syntax. My suites don’t look like these simple example, and the documentation is missing a before-and-after example showing how to port the old into the new style.

In my modelling suites each cycle point starts with a blank dummy task that, if successfully completed, starts the down-stream tasks that run the model cycle. The only purpose of these “start” tasks is to be clock-triggered, and their trigger time is an offset from the cycle point time (in UTC).

My suites have jinja2 switches in them which turn parts of the suite on/off. Because cylc7 only has a single point where clock-triggers were set, my solution was to build a list of tasks which then get added to the single clock-trigger statement. Here’s a simple example where the clock-trigger statement is only used when the suite is not in HINDCAST mode (i.e. it’s in FORECAST mode, when you need to wait for certain start tasks to wait for their trigger time).

In my example below the CLOCK_TRIGGER_TASKS would only include the fcst_XX_suite_start_12 tasks if the variable FORECAST_DAYS is greater than 0.

    [[special tasks]]
        {% if not HINDCAST %}

        {% set CLOCK_TRIGGER_TASKS = [ ] %}
        {{ CLOCK_TRIGGER_TASKS.append( "upd_suite_start_00(PT3H37M)" ) or "" }}
        {{ CLOCK_TRIGGER_TASKS.append( "upd_suite_start_06(PT3H37M)" ) or "" }}
        {{ CLOCK_TRIGGER_TASKS.append( "upd_suite_start_12(PT3H37M)" ) or "" }}
        {{ CLOCK_TRIGGER_TASKS.append( "upd_suite_start_18(PT5H)" ) or "" }}

        {% for DAY in range(0,FORECAST_DAYS|int) %}
        {% set TSEC = 13800 + DAY * 440 %}  # from 3H50M in steps of 7.333 minutes (440secs)
        {{ CLOCK_TRIGGER_TASKS.append( "fcst" ~ ('%02d' % DAY) ~ "_suite_start_12(+PT" ~ ((TSEC/60/60%24) | int) ~"H" ~ ((TSEC/60%60) | int) ~ "M)" ) or$
        {% endfor %}

        {% if CLOCK_TRIGGER_TASKS|length %}
        clock-trigger = {{ CLOCK_TRIGGER_TASKS|join(", ") }}
        {% endif %}

        {% endif %}  {# HINDCAST #}

Is this syntax still supported, or do I need to port it to cylc8? And how would I do this?

My graph is defined further down in the [[dependencies]] section where it was in cylc7, but in the notes for cylc8 (the links mentioned above) I see a very different structure, but I couldn’t find any documentation that had any examples of how to write this section in the new way.

So, I guess what I’m asking is how to port this minimal example (based on something quoted in the documentation) from old-style to new-style clock triggers, keeping in mind that I have these jinja2 rules about what goes when in the clock-trigger list

[scheduling]
    [[special tasks]]
        clock-trigger = foo(PT2H)
    [[dependencies]]
        [[[T00]]]
            graph = """
                   foo
               """

The documentation gives an example of the new-style clock-trigger as

[scheduling]
    initial cycle point = now
    [[xtriggers]]
        # Trigger 5 min after wallclock time is equal to cycle point.
        clock = wall_clock(offset=PT5M)
    [[graph]]
        T00 = @clock => get-data => process-data

but there’s no mention of how to get from cylc7 to cylc8 style. The examples also use a strange short hand I’ve never used T00 = ... - is that the same as [[[T00]] graph =""" ... """?

Can you help?

Thanks, Fred

Clock-trigger syntax has actually not changed in Cylc 8. You’re talking about the difference between old-style clock-triggered tasks (where a clock trigger was an internal property of a task); and (newer, but existed in Cylc 7!) clock “xtriggers” (external trigger), where the trigger is a function called repeatedly by the scheduler, not part of the task definition.

The advantage of the newer xtriggers is they are explicit in the graph, not hidden inside a task definition (and they’re also more efficient, if multiple tasks share the same xtriggers).

BTW it looks to me like you’ve jumped in the deep end with some large and complex workflows written by others? If so, I would recommend trying to understand Cylc concepts using tiny example workflows.

Here’s the old clock-trigger:

[scheduling]
   initial cycle point = 2022
   [[special tasks]]  # OLD-STYLE CLOCK TRIGGERS
      clock-trigger = foo(PT0S)
   [[graph]]
      P1Y = "foo"
[runtime]
   [[foo]]
      script = "sleep 10"

Note you can’t see by simply looking at the graph, that the task foo is clock triggered.

Here’s the new xtrigger way, where the clock trigger is explicit in the graph (but it’s properties are defined under the xtriggers section).

[scheduling]
   initial cycle point = 2022
   [[xtriggers]]
      clock = wall_clock(PT0S)
   [[graph]]
      P1Y = "@clock => foo"
[runtime]
   [[foo]]
      script = "sleep 10"

The new way is recommend, but the old way is still supported - as evidenced by the fact that both of these examples validate OK. If the old triggers were obsolete or deprecated you’d get errors or warnings from cylc validate.

You can run both of these examples, and they both behave the same. With an initial cycle point of 2022, the first two instances of foo run immediately and concurrently, the rest (spawned out to the default 5-cycle runahead limit) are waiting on their clock-triggers (for 1, 2, 3, … years respectively, in this case).

shot

From your description, you have old-style clock-triggered dummy tasks (whose only purpose is to be clock-triggered) that appear in the graph (because they are tasks) like the new-style xtriggers do. That changes my first example to this:

[scheduling]
   initial cycle point = 2022
   [[special tasks]]  # OLD STYLE CLOCK TRIGGERS
      clock-trigger = dummy(PT0S)
   [[graph]]
      P1Y = "dummy => foo"
[runtime]
   [[dummy]]
   [[foo]]
      script = "sleep 10"

But the same principles apply here, it’s just a different task that has the trigger attached.

This is just a minor change to remove an unnecessary level of nesting in the config file. See the Cylc 8 migration guide: Configuration Changes at Cylc 8 — Cylc 8.2.2 documentation

Also: validation tells you how to upgrade the graph syntax, once you switch to the new flow.cylc filename.

This is concerning if true! As you can see from my simple examples (which I also just tested in compatibility mode) it should not be happening.

If you are able to reproduce with a simple example (sleep 10 tasks, e.g.) we might be able to help.

1 Like

I investigated why my suite was stalling. I had intended for it to catch up to now-time, but it stalled around 20231104 - a cyclepoint over 10 days ago.

I tend to look at the housekeep task of a given cyclepoint with cylc show, and then trace back and show those tasks that are shown as not completed.

This led me to this task:

$ cylc show downloader_suite_cl4 //20231104T0100Z/get_gpm_precip
...
state: waiting
prerequisites: (None)
outputs: ('-': not completed)
  - 20231104T0100Z/get_gpm_precip expired
  - 20231104T0100Z/get_gpm_precip submitted
  - 20231104T0100Z/get_gpm_precip submit-failed
  - 20231104T0100Z/get_gpm_precip started
  - 20231104T0100Z/get_gpm_precip succeeded
  - 20231104T0100Z/get_gpm_precip failed
other: ('-': not satisfied)
  - xtrigger "_cylc_wall_clock_get_gpm_precip = wall_clock(trigger_time=2023-11-04T16:40:00Z)"

As far as I can see this is incorrect. The clock-trigger for this task is defined as get_gpm_precip(+PT15H40M), so the trigger should have fired, but the output of cylc show still shows a - minus sign infront of the xtrigger.

I am running the suite in debug mode, so I grepped the scheduler log files (there are several, as I have restarted the suite a few times) :

$ grep '20231104T0100Z/get_gpm_precip' *
01-start-01.log:2023-11-16T21:15:40Z DEBUG - [20231104T0100Z/get_gpm_precip waiting(runahead) job:00 flows:1] spawned
01-start-01.log:2023-11-16T21:15:40Z DEBUG - [20231104T0100Z/get_gpm_precip waiting(runahead) job:00 flows:1] added to main task pool
06-start-01.log:      * 20231104T0100Z/housekeep is waiting on ['20231104T0100Z/crop_gpm_precip:succeeded', '20231104T0100Z/get_gpm_precip:expired']
07-restart-02.log:2023-11-17T10:51:23Z INFO - + 20231104T0100Z/get_gpm_precip waiting
07-restart-02.log:      * 20231104T0100Z/housekeep is waiting on ['20231104T0100Z/get_gpm_precip:expired', '20231104T0100Z/crop_gpm_precip:succeeded']

What I can see is that the task was correctly spawned in the first run of the suite. But I can’t see a report of it being run.
That’ll be because I use queues to only every run 1 instance of that task at a time (the download portal will shut me out if they receive too many requests, hence the use of a queue). So the task was waiting in a queue.

The queue is defined like this:

[scheduling]
...
    [[queues]]
...
            [[[get_nasa_gpm_q]]]
                limit = 1  # the server is prone to go offline due to "This server is temporarily unable to service your request due to either high I/O processing or your IP address has reached the limit of concurrent connections"
                members = get_gpm_precip

Now what I suspect that the queues are not correctly re-initialised when restarting a suite. I have a feeling that (due to the long catchup period between the suite’s start date and the time I first ran it) the queue still had quite a few instances of get_gpm_precip waiting in it. And this particular instance 20231104T0100Z/get_gpm_precip was never re-inserted into the task pool, so the housekeep task of that cyclepoint waited forever for its get_gpm_precip task to complete, but it never did and the suite stalled (it stalled on multiple cyclepoints, this is just one example).

The graph for this task is defined as:

        [[[T-00]]] # every hour at zero minutes past (every hour on the hour). Note that the - character takes the place of the hour digits as we may not omit components after the T character.
            # Task is clock triggered at <CYCLE> + 15 hours 50 minutes (e.g. the T00 is triggered at 19:50 GST)
            graph = """
                 get_gpm_precip? => crop_gpm_precip
                 (get_gpm_precip:expired | crop_gpm_precip) => housekeep
            """

The expiry handling seems to work now, as discussed in another thread, but I now suspect the queue handling to be broken.

What are your thoughts on this? Thanks so much!