Roll back a flow and forget prior progress

Hello,
I’m running a set of two concurrent workflows that occasionally need to be rolled back to an earlier cycle point and re-run from there. I’m familiar with launching a new flow as a means of starting again from an earlier state, but run into an issue where the two have xtriggers that look for a completed task in each other’s flow before continuing, which triggers instantly upon seeing that the prior flow’s task had already succeeded.

I saw in a few other issues that resetting triggers is coming up in 8.5, but I’m hoping for my case there might be an easier solution that I’m missing. I’m okay with (and even prefer) just wiping away all files/progress/logs beyond the problematic cycle point and starting fresh from there, but I haven’t been able to find the right set of remove, trigger, and set commands to get everything in the necessary state for running as expected.

Is there a built-in way to handle this in 8.4.1, or do I need to rewrite the flow graph to look for outputs instead of flow states and just remove the outputs?

Thanks!

Hi @Vve

From what you’ve said, the problem is that your xtriggers are still looking for flow 1 outputs in the other workflow, so after the roll-back with a new flow they immediately trigger off the original flow.

Fortunately, the built-in workflow_state xtrigger already has a flow_num argument (which defaults to 1, hence your problem).

Before triggering the new flows you just need to modify both xtrigger declarations (to add flow_num=2) then reinstall and reload (or cylc vr) both worklfows.

A couple of notes:

  • There’s no alternative to hardwiring flow numbers here, because it’s not possible for Cylc to reliably infer which flow (in the other workflow) you want to trigger off.
  • Resetting xtriggers, from 8.5.0, will not do what you want - in the new flow, xtrigger prerequisites are already unsatisfied. Your problem is that they’ll immediately become satisfied as soon as the xtrigger gets called, becasue they’re still looking for flow 1 outputs.
  • In principle, instead of starting a new flow you could (a) cylc remove all the tasks that must rerun in the roll back - this erases the flow history allowing them to rerun in the same flow; then (b) trigger the rollback flow without incrementing the flow number and without changing the xtrigger declarations. In practice that’s not so easy (yet) because task globs only match active tasks, so you might have to remove a lot of tasks by their individual IDs.
    • (This will get easier: from 8.5 cylc trigger will automatically do this kind of removal; and glob matching beyond the active tasks should come in the 8.6 release).

This is a tricky enough scenario that I knocked up an example to check that it really does work: two integer cycling workflows with similar graphs a[-P1] => a => b that are mutually dependent:

  • in vve/one, task b triggers off of b in vve/two
  • in vve/two, task a trigger off of a in vve/one
# ~/cylc-src/vve/one
[scheduling]
  cycling mode = integer
  [[xtriggers]]
    b_two = workflow_state("vve/two//%(point)s/b", flow_num=1)
  [[graph]]
    P1 = """
      a[-P1] => a
      a & @b_two => b
    """
[runtime]
  [[a, b, c]]
    script = "sleep 10"

(Note I’ve made the default flow_num=1 arg explicit).

# ~/cylc-src/vve/two
[scheduling]
  cycling mode = integer
  [[xtriggers]]
     a_one = workflow_state("vve/one//%(point)s/a", flow_num=1)
  [[graph]]
    P1 = """
      a[-P1] => a
      a & @a_one => b
    """
[runtime]
  [[a, b, c]]
     script = sleep 10

How I tested the roll-back:

  1. start both workflows running (flow 1)
  2. set 8/a:failed in both (cylc set --out=failed vve/one//8/a, same for vve/two)
  3. once they have stalled at the failed tasks, edit the xtrigger declarations to flow_num=2 and reinstall and reload both worklfows
  4. roll back vve/one to cycle 4: cylc trigger --flow=2 vve/one//4/a
  5. check that flow 2 stalls at the runahead limit with 4/b, 5/b, etc. stuck with their xtriggers waiting on flow 2 outputs in vve/two (i.e., the xtriggers are not being satisfied by flow 1).
  6. roll back vve/two to cycle 4: cylc trigger --flow=2 vve/two//4/a
  7. now both workflows can continue on with mutual triggering working as expected in flow 2

(It worked :tada: )

Thank you for such a thorough reply! I didn’t even think to check if xtriggers could take a flow argument, but I implemented your suggestion and everything has been running smoothly. I had to manually trigger the first few tasks to get everything started for some reason, but I suspect my graph isn’t as clean and linear as it could be for a straightforward flow.

Does there happen to be a shortcut for triggering the first expected node(s) in a cycle point as if the scheduler was entering it for the first time?

Does there happen to be a shortcut for triggering the first expected node(s) in a cycle point as if the scheduler was entering it for the first time?

Not yet, however, we will be implementing this soon (should be available in the next few months, we’ll document it here once released).

Until then, you have to make a note of the start task(s) in the cycle.

A common pattern (used for other reasons) is to have a start_cycle task at the head of each cycle, which makes this easier.