Starting a new flow to re-run earlier clock-triggered cycle

Hi,

I have a clock-triggered workflow that runs twice a day, at midnight and midday. At midday the graph is a subset of the graph run at midnight, something like this:

[scheduling]
    initial cycle point = previous(T00)
    [[graph]]
        PT12H = "get_data" 
        P1D = "get_data => process_data"

I want to re-run all the tasks in the midnight cycle. I am trying to do this after midday, so I want to ignore the last cycle and go back to the cycle before.

From the documentation, I thought that I just needed to start a new flow from the start of the cycle, i.e:

cylc trigger --flow=new workflow_name//20240229T0000Z/get_data

However, the above command launches get_data for both the previous midnight and midday cycles. Looks like this is because the clock-trigger launches the midday cycle as well. Is it possible to just re-run an earlier cycle in this situation? Or do I need to stick to the exisiting flow and trigger tasks one by one? Or have I misunderstood this completely!

Thanks,
Annette

Hi,

Flows make it easy to get Cylc to “flow on” from a set of triggered tasks, however, there isn’t presently a mechanism to tell it where to stop. You can stop a flow (cylc stop --flow=<number>), however, this will stop the entire flow which isn’t what you want.

At present, your options are, either:

  • Trigger the whole cycle’s worth of tasks manually.
  • Trigger a new flow and remove the surplus tasks manually:
    $ cylc pause  # pause the workflow to prevent surplus tasks from being re-run
    $ cylc trigger //20240229T0000Z/get_data --flow=new  # start a new flow
    $ cylc remove // '20240229T1200Z/*'  # remove all tasks in the T12 cycle
    $ cylc play  # resume the workflow
    

However, there are some changes on the horizon that will make this easier:

  1. flow specific hold/release (scheduled for release in 8.3.0)
  2. flow specific stop after cycle point
  3. flow specific remove

Until then, terminating a flow requires a bit of manual intervention :frowning:

Cheers,
Oliver

1 Like

Hi Oliver,

There are quite a few tasks in the full graph so your second suggestion is what I ended up doing and that worked fine.

I am really trying to get to grips with cylc-8 and make sure I understand what’s going on, so I appreciate your quick response.

Annette

Hi Annette,

Actually the command only launches get_data at the target cycle, but just like the original flow, the new one will continue on if includes any tasks that “flow on” to subsequent cycles.

In your case, you triggered get_data which has no upstream parents to trigger it, but in a cycling workflow all parentless tasks out to infinity (or the final cycle) are technically ready to go right now. The scheduler avoids causing the computer to undergo gravitational collapse by just spawning them out to the runahead limit.

Putting an external trigger (which includes clock-triggers) on get_data doesn’t change that. The scheduler spawns them all into its “active pool” and begins checking their external triggers.

Knowing that, there is another option for re-running a previous cycle of the graph.

P1D = "@wall_clock => get_data => task-a => task-b => ... 
  1. do you need to re-run get_data? (it already ran earlier and presumably “got” its time-based data). If not, just trigger task-a with --flow=new to start the new flow - that won’t lead on to future cycles.

  2. If you do need to re-run get_data, you can either:

  • follow @oliver.sanders advice above
  • or trigger get_data with --flow=none, then trigger task-a as per 1.

Hilary

Hi Hilary,

Thanks for your explanation. I realise I missed the @wall_clock from my example graph. So triggering the earlier task puts the next one back in the active pool, because these tasks are parentless?

In this case, get_data is not pulling in time-based data - it is just rsyncing a load of log files. I had edited the task to pull in some extra files, and wanted to see what the analysis looked like. So in my case, I could have just re-run the latest get_data, then gone back and triggered a new flow from the earlier time point to run the processing tasks.

Annette

[Update: I’ve amended my previous response to clarify it a bit: you can trigger get_data with --flow=none to avoid launching other cycles, then trigger task-a with --flow=new to get the target cycle going.]

That’s right. Here’s an example:

[scheduler]
   allow implicit tasks = True
[scheduling]
   cycling mode = integer
   [[graph]]
      P1 = """
         x => a => b & c
         a[-P1] => a
      """
[runtime]
   [[root]]
       script = "sleep 10"

Here task x is parentless - it has no upstream prerequisites, so every instance of it is technically ready to run right now.

So in Cylc 8, the way the original flow “flows through the graph” from start-up is:

  • parentless tasks (x) are automatically spawned out to the runahead limit
  • then other tasks are triggered on the fly by the upstream outputs that they depend on

Same goes for manually triggering tasks with --flow=new at run time

  • the new flow continues by triggering downstream tasks on demand, as outputs get completed
  • if the triggered task is parentless, it will also get automatically spawned out to the runahead limit
    • otherwise the new flow could not continue on to subsequent cycle points
    • if you don’t want it to continue on to subsequent cycles, see above

That makes sense. I appreciate you taking the time to explain this.

Annette