Hi,
I’ve been experimenting using xtriggers to detect the arrival of a data file. The polling xtrigger function correctly detects the incoming data file and returns the contents which is a directory path. Is the result then only broadcast to the first dependent/downstream task in the workflow rather than them all?
The workflow (toy example below) has several downstream tasks which all need the information from the xtrigger function (a directory path to be processed). When run the poll_dataset_dir variable is only available to task1. Is there a way to make it available to task2 and tidy_up as well?
[[xtriggers]]
poll = poll_flags(flags_dir="/path/to/incoming", pattern="dataset-*.txt", cycle="%(point)s", sequential=True):PT10S
[[graph]]
P1 = """
@poll => task1 => task2 => tidy_up
"""
[[COMMON]]
[[[environment]]]
DATASET_DIR = $poll_dataset_dir
[[task1]]
inherit = COMMON
platform = plat1
script = "echo 'Running task1'; echo $DATASET_DIR; sleep 30s"
[[task2]]
inherit = COMMON
platform = plat2
script = "echo 'Running task2'; echo $DATASET_DIR; sleep 30s"
Not sure if I’m doing something wrong or misunderstanding the documentation.
Was trying to avoid doing it the old way and writing the data to the filesystem.
Thanks,
Cheers, Ros.
Hmmm,
From an inspection of the code, it looks like the broadcast targets the downstream task(s). So if you want share the result of an xtrigger with a task, add it as a direct dependency.
Let us know if there’s any misleading documentation (likely) and we’ll sort it out.
Note, ext-triggers (not to be mistaken for xtriggers) seem to broadcast their results to the whole cycle (External Triggers — Cylc 8.6.0 documentation).
The documentation seems to be correct on this:
Xtrigger functions must return a flat dictionary of results to be broadcast to dependent tasks, via environment variables…
The general idea is, tasks that need the xtrigger result should depend on it, and we don’t broadcast more widely - to avoid pumping task environments with information that’s not needed.
So you can either make more tasks depend on the xtrigger, or have the dependent tasks distribute the result more widely. The former is preferable IMO - real dependence should be reflected by dependencies in the graph.
Thanks Oliver & Hilary.
So I guess the problem I have with the above example is that task2 is dependent on task1 running first and making both directly dependent on the xtrigger doesn’t work (and probably doesn’t really make sense).
[[graph]]
P1 = """
@poll => task1 & task2
task1 => task2 => tidy_up
"""
If I’m understanding correctly, then would need to have an extra local task dependent on the xtrigger that could then do the broadcast to task1 & task2?
Hi Ros,
No, that should be fine (let us know if it isn’t!) - it just means task2 waits on task1 and the xtrigger.
Hilary
Hi Hilary,
Thanks for verifying that should work. Unfortunately, I’ve just tried it again and it’s not working.
When the xtrigger is satisfied task1 runs. When task1 has completed successfully task2 does not run. The workflow stalls with task2 waiting for the xtrigger to run again.
This is with Cylc-8.5.3
Thanks,
Ros
[scheduling]
cycling mode = integer
initial cycle point = 1
final cycle point = 2
[[xtriggers]]
poll = poll_flags(flags_dir="/path/to/incoming/data", pattern="dataset-*.txt", cycle="%(point)s", sequential=True):PT10S
[[graph]]
P1 = """
@poll => task1 & task2
task1 => task2
"""
[runtime]
[[root]
script = "echo 'Root script called'"
[[COMMON]]
[[[environment]]]
DATASET_DIR = $poll_dataset_dir
[[task1]]
inherit = COMMON
script = "echo 'Running task1'; echo $DATASET_DIR; sleep 5s"
[[task2]]
inherit = COMMON
script = "echo 'Running task2'; echo $DATASET_DIR; sleep 5s"
Hi Ros,
OK at first glance, some behaviour that I didn’t expect - but it should not be stalling your workflow.
[[xtriggers]]
poll = poll_flags(..., cycle="%(point)s", ...):PT10S
[[graph]]
P1 = @poll => task1 & task2
Here, a single successful @poll call will satisfy both tasks (at one cycle point), but adding this:
"task1 => task2"
results in the xtrigger being re-called later for task2 - because the scheduler housekeeps xtrigger results once no active tasks remain that depend on them - and nowtask2 only gets spawned when task1 succeeds, after the old result is gone - hence the re-call.
However this should not stall your workflow so long as @poll returns the same result on re-call - can you check that’s the case please?
You can avoid the re-call by making task2 enter the active window earlier, if you like:
dummy => task1 & task2 # spawn both into the active window immediately
...
[[dummy]]
run mode = skip
A plot twist - this example worked without re-calling the xtrigger prior to a (valid) bug fix in Cylc 8.5: the scheduler had been checking active tasks for all xtriggers not just unsatisfied ones, so it would retain the result for the already-satisfied task1, not for the (future, unsatified) task2 !
We’ll discuss on the team whether xtrigger housekeeping needs to be tweaked somehow…
Unfortunately a log message is being dropped that would make the re-call more obvious (it should say “commencing xtrigger call sequence…” again before task2) - I’ll get that fixed - but you will see the re-call in debug mode.
Here’s the example I used, with the built-in toy xrandom xtrigger:
[scheduling]
cycling mode = integer
initial cycle point = 1
final cycle point = 1
[[xtriggers]]
poll = xrandom(percent=100, _="%(point)s"):PT10
[[graph]]
P1 = """
@poll => task1 & task2
task1 => task2
# dummy => task1 & task2 # spawn both early
"""
[runtime]
[[dummy]]
run mode = skip
[[POLLED]]
script = "echo ${poll_COLOR}, ${poll_SIZE}"
[[task1, task2]]
inherit = POLLED
Hi Hilary,
Yes @poll will return a different result on re-call as the xtrigger is being used to detect incoming datasets so each dataset will only be picked up once. So looks like we will have to add the dummy task to avoid the second recall or make the dependent task distribute the result more widely.
Thanks for all you help with this.
Cheers,
Ros