Should set-outputs work in a stalled workflow?

My workflow has stalled because of a failed task. I know the task won’t run successfully because the input data is corrupt. All I want to do is reset the status to get things moving again, it’s not a problem downstream.

However, I issue the set-outputs command and nothing happens even though it says Done. I am able to retrigger the task, so my clumsy way forward seems to be to tamper with the task command so that it completes, then reset for following cycles.

Hi,

There’s a subtle difference between cylc reset (Cylc 7) and cylc set-outputs (Cylc 8).

  • Reset, actually changed the task’s status.
  • Set-outputs, satisfies the prerequisites of downstream tasks but does not change the task’s status

So, set-outputs to allow any downstream tasks to run if desired, then if you don’t want to keep the failed task kicking around, do cylc remove <workflow>//<cycle>/<task> and it will no longer be able to stall the workflow.

Note, this disparity has been a problem point with Cylc 8 migration, so we have found a way to re-implement “reset” like functionality within the new scheduling algorithm, this will arrive in Cylc 8.3.0.

Thanks for the clarification. I certainly find it very useful to be able to do this in one command, so I’ll look forward to 8.3.0!

For the moment, think of cylc set-outputs as telling the scheduler to carry on as if those outputs had been completed. i.e., it acts on downstream tasks that depend on those outputs, rather than the target task itself.

I have a question related to this problem, and your proposed solution but I don’t know if it is better to start a new thread or continue here. Sorry if this is poor etiquette on my part.

I encountered this while testing a large workflow and recreated the issue I had in a very simple graph, but the scenario is I am trying to test a task in the middle of my graph without expending the resources from the triggering tasks. I simply want them marked as succeeded and to test “task_c” and continue on in my workflow. I did extend my graph to 3 in order to make other tasks visible in the GUI. I held the first and 4th task to avoid wasting resources, and I trigger “task_c”; it succeeded. My initial thought like @andy.smith, was to set-output of task_a and task_b to succeeded. This only triggered the downstream tasks. I found this thread and tried removing the tasks as suggested using the GUI remove option from the menu and then cylc remove my_workflow//20240220T0000Z/task_c, but this also seems to have some nuance as it will not work on a task that is not “active” in the GUI.
When using the same command for “task_a” it cleared, but task_b remained. I assume because of my graph extent being raised. Just the same I marked it to be removed, but got no response in the tree-view. When the graph extent was lowered task_b disappeared (or so I thought), but this wasn’t flagged as and optional(?) task and eventually stalled my suite.

Here is a quick breakdown of my simplified flow.cylc:

[scheduling]
   initial cycle point = 20240220T0000Z
   [[xtriggers]]
       start = wall_clock()
[[graph]]
      T00, T06, T12, T18 = """
               @start => task_a => task_b => task_c => task_d => task_e => task_f & task_g
      """
[runtime]
   [[task_a, task_b, task_c, task_d, task_e, task_f, task_g]]
       script = """
               cylc message "I'm done."
           """

Screenshot of the workflow in tree-view after set-output failed to remove task_b and re-triggered task_c:

Screenshot of log after trying to use cylc remove command to take out task_b:

As I stated removing task_a did actually remove the task from my graph, and when the extent was reduced this removed/hid task_b from my tree-view as well. However when I repeated these steps of triggering task_c and removing task_a in subsequent cycles my graph stopped loading new cycles.
Screenshot of last cycle: with now new cycles loading:

So my question is, What am I doing wrong? Can I remove these tasks without breaking my workflow? This situation may not be totally practical, but if I ever did switch my filesystems to a backup, and was trying to restart mid-cycle without wasting resources to rerun the apps that already finished, is that possible?
Just to see what happens, I released my last cycle to make sure my holding tasks wasn’t doing anything odd to prevent cycles from loading, but that just triggered the remaining tasks and stopped my workflow. The workflow’s log states:

[UPDATE: I’ve edited this post to reduce the length of my original response]

All good, please don’t worry about that!

T00, T06, T12, T18 = """
    @start => task_a => task_b => task_c => task_d => task_e => task_f & task_g
"""

So you want to run task_c, and then later carry on with d, e, f (and subsequent cycle points), without running task_a and task_b first - is that right? If so, here’s how to do that:

  1. hold task_a and task_d
  2. trigger and retrigger task_c till happy
  3. release task_d to continue the workflow downstream of task_c
  4. remove task_a, to avoid a stall

I’ve tried this with your example, at 8.2.4, and it all works.

Now to explain what I think went wrong for you:

cylc set-outputs tells the scheduler to carry on as if the outputs of the target task were completed - i.e. it triggers the downstream tasks that depend on those outputs.

  • you don’t need it, because you directly triggered task_c and you don’t want task_b to run

cylc remove removes tasks from the active window, if they are there, or else it just logs a warning that no matching active tasks were found

  • task_a is active, because all parentless tasks (which don’t depend on any upstream outputs) are by definition ready right now according to the task graph. So task_a can be removed.
  • task_b is not active - it is just a future task that will be spawned when the outputs that it depends on (i.e., task_a:succeeded) get completed. So task_b does not need to be, and in fact can’t be, removed - but trying to do so won’t cause a problem.

The GUI’s “graph window extent” is purely for visualization. It includes the “active” tasks plus future and past tasks out to a default n=1 graph edges from them.

  • by “clearing” a task you mean (?) removing it from the active window. Once active a task must complete, or else be forcibly removed, so it can safely slip into history.
  • but removing a task from the scheduler’s active window does not erase it from the graph! If your GUI window extent is big enough, the task will still be visible - to show what happened in the past.

The new GUI does not yet show which tasks beyond the obvious are in the active window, unless you actually set the window extent to n=0.

  • unfortunately that probably makes it harder for new Cylc 8 users to understand the difference between (e.g.) trying to remove task_a (active n=0) vs task_b (future n=1)

Finally, the early shutdown is a bug that’s been fixed for Cylc 8.3.0. You must have removed the runahead-limited future instance of task_a, which (pre-8.3.0) prevents spawning further ahead once the limit moves on

  • but you don’t need to remove that task in order to achieve your goal, so it’s easy to avoid the bug.

Hi again. I think that our Cylc 8 “window” terminology may be causing some confusion. I’ve flagged to the team that we should clear this up somehow. As it stands we talk about two related but different windows:

  • the scheduler has an active window containing all the tasks that it is actively managing, so called because it can be imagined as a window of activity that moves along the graph.

  • the GUI displays a wider graph-based visualization window that extends n (default n=1) graph edges out from these active tasks. Which future and past tasks you see, in addition to the active ones, depends on the window extent n. In terms of this GUI window, the scheduler’s active tasks are all n=0.

Note that neither of these windows is necessarily contiguous in the graph. Think of labeling all the scheduler’s active tasks, wherever they are in the graph, with n=0, then construct the GUI window by walking n graph edges out from each of those tasks.

The potential confusion is this: removing a task from the (scheduler’s) active window in effect just removes the n=0 label and allows the scheduler to forget the task (otherwise the scheduler will insist that it completes its required outputs first). The “removed” task is no longer in the core set of tasks (n=0) around which we construct the GUI window, but that does not necessarily mean it has been “removed” from the GUI window, i.e. that it won’t be visible anymore (unless the window extent is set to n=0 only). If the window extent is large enough to show that part of the graph, the “removed” task will still be visible, and its state will correctly reflect its history.

Yes, I think I understand now. I am still learning how to interact with the interface and what the new commands are doing. I’m probably misstating the terms too. I’m just familiar with old suicide-removal of tasks from cylc 7 that has a visual result where the task is marked succeeded. The new graph system does create a sort of abstraction that I think I need to wrap my head around :slight_smile:

“Clearing”, yes I meant removing. The confusion I think also stemmed from when I tried viewing the active tasks in the GUI compared with the terminal. The GUI (because the extent was altered by me) included more tasks than the terminal acknowledged. Which I assume are the actual active tasks. The GUI just allowed me to view future tasks.

I think the way I had held my workflow to remove tasks did in fact lead me to removing the run-ahead task which caused the shutdown. I will be careful to avoid this misstep until we get 8.3. Thanks for this and the additional follow-up explanation.