Task:succeeded can't be both required and optional

So I’m working on porting a large suite to Cylc 8 and there’s some logic in the graph that worked fine in Cylc 7 but Cylc 8 fails validation. I’ve distilled the error down to a small case:

GraphParseError: Output task2:succeeded can't be both required and optional

The graph:

# 1 must run first, followed by 2:
task1 => task2

# 1 is required, 2 is optional:
task1 => task3
task2:finish => task3

I’m not seeing the fallacy in the graph logic… Task1 is required for both Task2 and Task3, but Task2 is optional for Task3 (success or failure doesn’t matter but it has to try).

So I’m working on porting a large suite to Cylc 8

Brilliant :tada:

I’m not seeing the fallacy in the graph logic… Task1 is required for both Task2 and Task3, but Task2 is optional for Task3 (success or failure doesn’t matter but it has to try).

task2:finish

is a “pseudo output” that is short for:

task2:succeed? | task2:fail?

The ?s (optional outputs) are implied by the finish trigger, because task2 success and failure can’t both be required (or “expected” as the documentation puts it … we’ll settle on one term or the other before 8.0 is released!).

The problem is, your first line says that task2 success is required, which conflicts with the finish trigger.

So here’s the correct graph, with explanatory comments:

# If 1 succeeds, trigger 2
#   Also: 1 is required to succeed, and 2 (if it runs) may succeed or fail
task1 => task2?

# If 1 succeeds, trigger 3
#   Also: both 1 and 3 are required to succeed
task1 => task3

# If 2 succeeds or fails, trigger 3
#   Also: 2 may succeed or fail (a:finish means "a? | a:fail?"); 
#   and 3 is required to succeed
task2:finish => task3

To avoid ambiguity, if a particular task output is marked as optional, it must be marked optional wherever it appears in the graph.

Documented here: Scheduling Configuration — Cylc 8.2.2 documentation

… but I think we need to extend that to show an optional output on the right side of a trigger (like your task2) which might not seem intuitive when you first encounter it.

Ok I’ll go back and read the Cylc 8 guide more closely. I may have glazed over the entire new section about expected vs optional outputs…

I’m glad I read this post. I’ve always thought that anything on the right hand side of the => was being specified to run, with nothing being implied about its success or failure. Is this different in cylc8 or have I just been misreading cylc7 all this time?

In Cylc 7, all tasks were expected to succeed. Optional outputs are new in Cylc 8.
See https://cylc.github.io/cylc-doc/latest/html/7-to-8/major-changes/suicide-triggers.html

1 Like

In Cylc 7, if you had

[[[P1D]]]
    graph = foo => bar

Then bar would be expected to succeed. If it failed, then the suite/workflow would stall at the final cycle point, unless you had added a self-suicide (bar:fail => !bar) to remove it from the graph if it failed.

(In Cylc 8, that example stalls at the runahead limit instead of the final cycle point, and instead of adding a self-suicide you could just add a ? to the end of bar to mark its success as optional)

1 Like

I’ve always thought that anything on the right hand side of the => was being specified to run, with nothing being implied about its success or failure.

Same here but in our case we EXPECT the workflow to stall because that means something is wrong and someone needs to be called (we have a 24/7 watchfloor).

There is a LOT that’s different in Cylc 8. I should have looked at it sooner but we’re an operational environment so we’re not supposed to be mucking with R&D and release candidates. I STRONGLY advise anyone in a similar situation to start looking at Cylc 8 documentation NOW. It’s sufficiently different that you will need a lot of time to change your existing processes and optionally port your suites. I haven’t tested the Cylc 7 compatibility mode yet…

From my reading yesterday it seems that using “?” on tasks that could be considered splits in logic lets the graph act on it without having to deal with suicide triggers to prevent the workflow from stalling. So you’re trading:

taskA => taskB
taskA:failed => !taskB

for
taskA => taskB?

In Cylc 7, the first example is how you’d prevent the workflow from stalling when taskA fails. Cylc 8 takes care of it for you when you add the “?”. At least this has been my understanding, I’m still relearning the commands for workflow management so I haven’t been able to do any real experiments yet.

For us, this likely means changing workflows from:

taskA => taskB
taskB:finish => taskC

to

taskA => taskB?
taskB:finish => taskC

So not a large effort but it does mean changing the workflow later will require more thought since we now have to be explicit about which tasks are optional with the “?”.

I do want to add: THANK YOU for providing the details about differences between Cylc 7 and 8 guide (Detailed Description of Major Changes — Cylc 8.0rc3 documentation). It’s been VERY helpful.

Edit: I’ve been about 50% “oh no!” and 50% “that’s going to be super helpful” so it’s not all doom and gloom as far as transitions go…

Edit 2: And the TUI is honestly fantastic so far. It’s actually interactive now and that’s huge for when we’re remoting and X11 forwarding is unrealistic.

2 Likes

(Pinging @srennie here too)

You can absolutely still do that with Cylc 8. In fact it is the default behaviour.

Optional outputs are only needed when the workflow is designed to handle both the “output completed” and “output NOT completed” scenarios. This includes alternate graph branches (where in Cylc 7 both success and failure are in effect “required” so you have to use ungainly suicide triggers to remove the unused branch).

Consider the simplest (single-task) workflow:

R1 = "foo"  # (short for "foo:succeed"; success is not optional)

Above, if foo fails, the scheduler will stall and report the presence of an incomplete task foo (wihich indicates the workflow did not run to completion as expected):

R1 = "foo?"  # (short for "foo:succeed?"; success is optional)

But here, if foo fails the scheduler will still log the failure and trigger any associated event handlers, but it will shut down as “workflow completed”, because the workflow writer has stated that failure of foo is OK (i.e. foo is expected to fail sometimes and the workflow is designed to handle that).

Sorry for the pain!

However, we have tried to engage with users, warn of upcoming changes, and encourage early uptake and testing throughout the ~3 years of planning and development of Cylc 8 (the first pre-release was available in 2019)

Also, the major changes in Cylc 8 were necessary to allow Cylc to efficiently scale to “the workflows of the future” - the primary motivation for the Cylc 8 project. On the optional outputs front, the new scheduling algorithm is better in every way, and it solves a whole bunch of long-standing Cylc 7 problems without introducing any new problems so far as we’re aware (see Scheduling Algorithm — Cylc 8.2.2 documentation)

And, the backward compatibility mode allows Cylc 8 to run Cylc 7 workflows “out of the box” (with only a few caveats) so you can upgrade at your leisure. This also replicates Cylc 7 stall behavior.

Finally, we are very keen to help users (via this forum) to migrate to Cylc 8, and to take advantage of all the improvements!

Not quite!

According to your Cylc 7 graph you want this:

  • if taskA succeeds, run taskB
  • if taskA fails,
    • remove waiting taskB as not needed (so it doesn’t stall the workflow)
    • but do not remove the failed taskA (that should stall the workflow)

In Cylc 8, there is no need for any suicide triggers OR optional outputs here, because taskB is not “spawned” at all unless taskA succeeds. So the Cylc 8 graph is just this:

taskA => taskB

This means:

  • if taskA succeeds, spawn taskB and run it (and then taskB’s success is required)
  • it taskA fails, it will be marked as incomplete (because its success is required)
  • success of both taskA and taskB is required for the workflow to be considered complete

With taskA => taskB? the 3rd bullet point changes to “only success of taskA is required for the workflow to be considered complete”. But that is not implied by your Cylc 7 graph (it would also need taskB:failed => !taskB"

That’s right.

The :finish trigger says your workflow handles both success or failure of taskB, which implies success of B must be optional. But cylc validate will flag the first case as an error, to help you upgrade.

Please do tell us about the "oh no!"s in case it just means we haven’t documented something clearly enough.

Remember it’s the output that is optional, not the task itself. foo is short for foo:succeed, and foo? is short for foo:succeed? - the :succeed output is what is optional.

Tasks that do not have a question mark after them might not run if they have a prerequisite on an optional output, e.g. in

a? => b => c

both b and c will not run if a fails, and the workflow will not stall.

1 Like

They’re more in terms of our own internal processes and how we interface with R&D and eventually manage the runs. Long story short, we’re mostly a bunch of technological Luddites. Do not take any of my comments here as criticisms of Cylc 8 or on how the transition is being handled.

This part is really hard for me to apparently get right… If a fails why wouldn’t that stall the suite? I get it could load the next cycle point if there’s no prior cycle point dependency but isn’t that controlled by max active cycle points? Once you run out of those, it’ll still stall correct?

Because a:succeed is the cue to create task b, and the succeed? says that success isn’t the only possible outcome of a which would lead to another set of tasks. If a does not succeed Cylc won’t spawn b. If you mark the succeeded path as optional Cylc thinks that fine if succeeded never happens and does not worry about it.

It might make more sense in a case less freighted with significance the “success” and “failure”. Consider a workflow with custom outputs:

does_data_exist:yes? => create_plots
does_data_exist:no? => download_data => create_plots

In this case, without the ? Cylc would want both yes and no outputs to be satisfied to move on: The ? stops a task output stalling the workflow if we want to fork the workflow down mutually exclusive paths.

In Cylc 7 the next task was created by each task starting, not by the outcome of the task - by contrast at Cylc 8 download_data doesn’t exist until does_data_exist has returned an output of :no - if that doesn’t happen then the download_data task will never exist. If download_data:finish happens without :yes or :no happening then download_data and create_plots will never be called into existence - that cycle will finish with does_data_exist - the Cycle’s graph is done and not counted against your max active cycle points any more.

1 Like

You’re welcome to criticize if you like :grin: We can either respond with explanations or counter-arguments, or take it on the chin and make changes - either way, that’s a useful discussion for the forum!

1 Like

Ok I see my problem. We’re used to thinking of workflows as static and every task is “required”, so all tasks are instantiated up to the max number of cycle points. Any expiration/suicide trigger logic we used was for cleanup in non-monitored systems (i.e., non-operational). There was a specific scenario back in 6.x where this was the solution but I haven’t tested to see if that’s still the case in 7.x. I think what we were really after was just “max active cycle points” so that a failure wouldn’t prevent the next cycle point from starting/instantiating.

1 Like

I think you’re talking about the bug where

      [[P1D]]
         graph = a => b => c

has an implicit dependency between each task and it’s successor in the next cycle. If 1/b fails 1/c will not submit, and since 2/c is created by the submission of 1/c, so that what is actually happening is:

      [[P1D]]
         graph = """
             a => b => c
             a[-P1D]:submit => a
             ....
         """

This bug existed at Cylc 7, but has now gone at Cylc 8.

1 Like

I didn’t realize that was a bug, good to know!

Well, @wxtim 's description of the behavior is bang on, but strictly speaking it was a known deficiency of the original scheduling algorithm, not a bug. Fortunately the new “spawn on demand” scheduler solves that problem, and a bunch of others too.