Continuing a run that completed

I had a workflow that completed, then I extended the final cycle point and restarted it. (This was so much easier in cylc 7! it just did it!)
It immediately shut down (facepalm and restart with --pause)
Cycles after the last completed cycle don’t exist, so trigger some initial tasks, and unpause. Okay so far.
Trigger some more tasks, but these depend on the previous completed cycle, which apparently are no longer known about (why?). set-outputs on a preceding cycle’s task seems to improve the dependency issue; cylc show indicates that task is no longer waiting on the previous cycle.

And then I think, why don’t I just set-outputs on the entire previous cycle? It all ran to completion and succeeded, so this is valid in theory… (Definitely a bad idea).
set-outputs workflow/run1//* results in every task in that cycle being immediately submitted with flow=none. Is this the intended behaviour?
Do I need to specify a flow (and if so, why did it work fine for setting a single task’s outputs?)
If flow=none in the prerequisite’s run, does that make the task’s output valid for a downstream task with flow=1?
If I did set-outputs --flow=1 …/* would this still result in mass triggering?

Is there a best-practice on kicking off a workflow after extending the end date? (Yes, I read the documentation and appreciate this is not generally encouraged.)
Is a cylc warm-start (in cylc7) basically replaced by kicking off a new flow? I basically want to warm-start from the cycle after the original end-point.

Hi,

Extending a workflow run is currently possible, but a little bit clunky. We are currently working at improving the interface to make this easier and more intuitive.

The process for extending a workflow run should look like this:

# restart the workflow in paused mode
$ cylc play --pause <workflow-id>

# trigger the first task(s) in the new cycle
$ cylc trigger --flow=1 <workflow-id>//<first-new-cycle>/<first-task(s)-in-new-cycle>

# unpause the workflow
$ cylc play <workflow-id>

Note: You shouldn’t need to use set-outputs, Cylc should load the status of previously run tasks from the database.

So three complications here:

  1. You have to start the workflow in paused mode, otherwise the scheduler can’t see anything to run so shuts itself down immediately.
  2. You have to specify a “flow” to trigger the new tasks in. This is because the default is to trigger tasks with all active “flows”, but there are no active flows at the time of the trigger so Cylc unhelpfully gives you --flow=none which is highly unlikely to be what you wanted.
  3. You have to know which tasks to trigger in the new cycle.

We are working on three enhancements to tackle these three complications:

  1. Rather than shutting down immediately, workflows will log a message explaining the situation and advising you to trigger new tasks. The workflow will keep running for a configured period and only shut down if no tasks are triggered before that time. https://github.com/cylc/cylc-flow/pull/5231
  2. Rather than defaulting to --flow=new we will get Cylc to look through the database to find the flow of the last task Cylc submitted before the workflow completed. Tweak triggering when no flows present by hjoliver · Pull Request #5084 · cylc/cylc-flow · GitHub
  3. We are looking into how we could automatically determine the right tasks to start with. easier way to select cycle start tasks · Issue #5416 · cylc/cylc-flow · GitHub

We should get (1) and (2) sorted in time for Cylc 8.2.0, (3) might require a little more thought and work.

I’ll just take the opportunity to evangelize the new ways a little!

The way we originally (Cylc 7 and earlier) solved the problem of cycling with no barrier between cycles worked well enough for the original flow through the graph, but for any kind of manual intervention Cylc 8 is vastly better. You no longer have to resort to mysterious operations such as “inserting” tasks into the graph just to set up a small sub-graph rerun. Things like properly rewinding to any point in the graph were basically impossible, and Cylc 8 solves a bunch of other real scheduling problems too.

However it is a big change, and while everything works already, as Oliver notes above we haven’t yet finished refining the manual intervention command set so it may be difficult to figure out how to do some things at this stage (sorry!). (That’s what this forum is for!).

A few operations might appear to be easier in Cylc 7, when in fact it is doing the right thing for the wrong reasons and would therefore break in some cases. Extending a workflow by changing the final cycle point is an example of this. The final cycle point defines the end of the graph, and changing it (post shutdown) actually does not make sense if any part of the graph references the final cycle point.

So, despite what you might think from using Cylc 7 on particular kinds of workflows, in general you should expect an already-finished workflow to require manual triggering to get a new flow going after extending the graph. Although in due course we may be able to make it work automatically for cases in which the graph does not reference the final point.

Unfortunately cylc set-outputs is the prime offender in the short list of triggering and flow-related commands that aren’t fully user-friendly yet.

For the moment, if you need to use that command, what it does is tell the scheduler to carry on as if the outputs of the target task had been completed. So downstream tasks that depend on those outputs will be spawned, and will be ready to run immediately - unless they have other prerequisites that need to be “set” as well. But you also need to be aware of the flow that the spawned tasks belong to. With cylc trigger the default is to belong to the current flow (and as Oliver pointed out, on restarting a finished workflow there is no current flow). With cylc set-outputs the default is still no-flow, I think, so you’ll need to use (e.g.) the --flow=1 command line argument if you want the newly-triggered tasks to flow on in the graph, even when there is a current active flow.

And at the moment that involves looking at the scheduler log.

If you don’t deliberately trigger new flows (as opposed to triggering tasks as part of the original flow, which is the default) you shouldn’t have to worry about this. But we have seen some users get into trouble because they weren’t sure if they needed to trigger new flows or not.

Flows and triggering are pretty well documented here although there’s bound to be some details and edge cases that we need to clarify yet.