I am having trouble understanding completion statements and their relationship to the graph.
in my Cylc 7 graph, I have a task “A” that has a bunch of dependencies:
B => A
C => A
D => A
Task A has a custom output “do_something” that must be output which will trigger a new task. Task A also could trigger another task if Task A fails. Task A’s success will lead to additional tasks triggering but also suiciding a task, if it is running. I have this in Cylc 8 so far:
A:do_something => E
A:fail? => F
A? => !F
A? => G
A? => H
My completion statement looks like this:
completion = (succeeded or failed) and log_ready
[[[outputs]]]
do_something = "Say Something"
Validation fails:
WARNING - 1 suicide trigger(s) detected. These are rarely needed in Cylc 8 - see
https://cylc.github.io/cylc-doc/stable/html/7-to-8/major-changes/suicide-triggers.html
GraphParseError: Output fcst_run:succeeded can't be both required and optional
It successfully validates if i add a “?” to all of the A tasks in my graph that appear on the right side of “=>”. Why is that? I can’t seem to find any examples on the Cylc user docs regarding completion statements appearing in tasks on the right side of the dependencies.
I presume you mean task “A” to represent “fcst_run” (which appears in the validation error message)? And “log_ready” should be “do_something” in the output completion condition?
A task must satisfy its output completion condition or else the scheduler will retain it (in the active window) as a “final status incomplete” task that will stall the workflow until the user intervenes - it indicates something has gone wrong that the graph is not designed to handle.
The graph defines dependencies and output optionality at the same time, but the latter has no bearing on the dependencies - it only affects task completion.
It successfully validates if i add a “?” to all of the A tasks in my graph that appear on the right side of “=>”. Why is that?
Because you stated A? elsewhere (i.e., the “succeeded” output is optional ) and to avoid ambiguity that must be consistent throughout the graph.
That goes for tasks on the right side of dependency arrows too, because output completion can matter even if nothing triggers off of an output (consider tasks at the end of a dependency chain).
a => b?
This means:
if a triggers its success is required
(if it fails, it will stall the workflow, requiring manual intervention)
if a succeeds, trigger b
(note that putting ? on either a or b makes no difference to this bit)
if b triggers its success is optional
(if it fails, that’s OK, no need to stall the workflow)
Okay, thank you this makes sense. But just to clarify, based on:
It sounds like in your a=>b? example, if I dont have ‘b’ with a “?” anywhere else in the graph, then I don’t need to have a “?” in this particular dependency?
Yes, but the way to think of it is, “is b:succeeded” optional or not? i.e. is my workflow designed to handle failure of b? If it can, put b? everywhere.
Anywhere that b appears in the graph is a reference to the same task and output(s) so it has to be consistent throughout the graph.
I have a related issue that came up that I need help with. Referencing the example above that I am going to paste below:
A:do_something => E
A:fail? => F
A? => !F
A? => G
A? => H
In Cylc 7, Task A runs and if it fails, it triggers task F. Task F checks a log file produced by Task A for completion (because sometimes Task A can finish correctly even though Cylc produces a failure) and if the log file displays a full completion, cylc reset is run to set the state of Task A to succeeded. Else, cylc will essentially reset Task A to run again.
Additionally, Task A has a pre-script that is run if the task attempt number is above 1. This pre-script resets the state of Task F to waiting so that it might again be triggered if Task A fails once again.
How can we replicate this behavior in cylc 8? I read in another discourse thread that there is no ability to reset a task to waiting anymore and that we can run cylc trigger to start a new “flow” beginning at Task F. But when I run this command, Task F runs immediately before any output is produced by Task A, which isn’t what I want. Similarly, running cylc set --pre=all --flow=new produces the same result as trigger.
Use the error script to identify your failed-but-not-failed scenario and set the task output.
For example
err-script = """
if [[ $(logfile_check.sh) ]] then;
cylc set ${CYLC_WORKFLOW_ID}//${CYLC_TASK_CYCLE_POINT}/A --out succeeded
fi
"""
3. Rolling your own error handling inside the script
Potentially the most flexible, but Bash is nasty to write and maintain.
if [[ ! $(myscript) ]]; then
if [[ $(logfile_check.sh)]]; then
echo "script appeared to fail, but log says otherwise"
exit 0
else
echo "unhandled failure"
fi
fi
Direct answer to you question
But if you need to do this at the Cylc level…
Firstly, you don’t need the suicide trigger - task F will not be spawned if A doesn’t fail.
Does this toy workflow give you the behaviour you are looking for?
[scheduling]
initial cycle point = 1122
# final cycle point = 1122
[[graph]]
R1 = """
A? => G & H
A:failed? => F
"""
[runtime]
[[root]]
# Just make it take a bit longer to allow the watcher to see stuff:
pre-script = sleep 5
[[A]]
script = """
# Only succeed on submit number 3.
if [[ ${CYLC_TASK_SUBMIT_NUMBER} -gt 2 ]]; then
exit 0
else
exit 1
fi
"""
[[F]]
script = cylc trigger ${CYLC_WORKFLOW_ID}//${CYLC_TASK_CYCLE_POINT}/A --flow=new
I hope by that you mean “sometimes Task A returns failure despite finishing correctly” (or vice versa) i.e., the incorrect final status is the fault of the task job, not Cylc?
The scenario you describe should really be handled by a single task A with execution retries to make it re-run on failure. If A correctly reports success or failure, there is no need for F.
On that basis, I think you should fix task A and use retries, or if that’s not possible take @wxtim’s approach 2 - use an err-script with task A to correct the incorrect failed status to succeeded before exiting - and use retries (which also kind of counts as "fixing task A).
(Note if the opposite can also occur you also need an exit-script to correct the incorrect succeeded status to failed before exiting).
Correct, “state reset” (like suicide triggers) is not needed in the new event-driven scheduling model. Now you just set task prerequisites or outputs, or trigger tasks directly, and activity will flow on naturally from those events in the graph.
… and that we can run cylc trigger to start a new “flow” beginning at Task F. But when I run this command, Task F runs immediately before any output is produced by Task A, which isn’t what I want
Yes, the problem here is that the new flow you want is A => F => ..., i.e. it starts at A not F.
So if you do want or need to keep the two-task approach, then I agree with @wxtim on this as well, i.e. have task F trigger a new flow at A if it determines, based on log inspection, that A => F needs to rerun.
(And also that you don’t need the suicide triggers in Cylc 8).
[Some arguably “advanced” explanatory info, read at your own risk!]:
Note that with failure of F not marked optional you could have it fail in the re-flow case, so it will get retained in the active window as a final-status incomplete task - i.e. it finished without completing its required outputs - then the new flow (if triggered at A) will merge with and rerun F so that downstream activity flows on with flows=1,2. Otherwise it (the flow=1 instance of F) will leave the active window as complete, and the new flow will continue as flow=2 only (there is no flow=1 F stuck in the active window to merge with)… however that might also merge with flow=1 further downstream if the workflow has other branches that you’ve not shown above] … I see @wxtim’s example takes the former approach