My workflow keeps stalling after it hits a cycle with a cycle-specific dependency.
Minimal example (hopefully transcribed correctly)
[scheduler]
[scheduling]
initial cycle point = 20230101T06
final cycle point = 20230103T02
initial cycle point constraints = T00, T06, T12, T18
runahead limit = P3
[[graph]]
R1 = install => run
PT1H = """
setup_1 => task_1
task_1[-PT1H] => task_1
"""
T23 = setup_2 => setup_1
[runtime]
[[root]]
platform = localhost
[[install, run, setup_1, task_1, setup_2]]
script = sleep 10
I’m running with –mode=dummy in any case.
Running with 8.6.1 (latest I have available), this workflow runs hourly until T23, then the next T00/setup_1 task fails to start despite having no outstanding prerequisites (i.e. I can’t see what it’s waiting for).
Incidentally, if I remove the intercycle dependency task_1[-PT1H] => task_1 then the workflow runs, but at 8.5.0 it runs hourly up to the first T23, then skips to the next T23, then finishes, whereas at 8.6.1 it runs correctly.
I feel like I must be doing something wrong, because mixed scheduling like this should work fine?
Hi @srennie
I think you have uncovered some kind of a bug. I’ve reproduced it, but not had time to figure out what’s going on yet. Here’s a slightly simplified case that gets to the stall faster:
[scheduling]
initial cycle point = 20230101T18
runahead limit = P3
[[graph]]
PT1H = """
b => c
c[-PT1H] => c
"""
T23 = a => b
[runtime]
[[a, b, c]]
Result (at 8.6.3)
...
INFO - [20230101T2300Z/c/01:running] => succeeded
INFO - [20230102T0000Z/c:waiting(runahead)] => waiting
WARNING - Partially satisfied prerequisites:
* 20230102T0000Z/c is waiting on ['20230102T0000Z/b:succeeded']
CRITICAL - Workflow stalled
WARNING - PT1H stall timer starts NOW
The active tasks at this point:
$ cylc dump -t sre
20230102T2300Z/a:waiting (runahead)
20230102T0000Z/c:waiting
The log gives the reason for the stall. But 20230102T0000Z/b should just run automatically, because it doesn’t depend on anything in that cycle.
$ cylc log sre | grep '/b' | grep '=> submitted'
2026-03-17T16:57:00+13:00 INFO - [20230101T1800Z/b/01:preparing] => submitted
2026-03-17T16:57:00+13:00 INFO - [20230101T1900Z/b/01:preparing] => submitted
2026-03-17T16:57:00+13:00 INFO - [20230101T2000Z/b/01:preparing] => submitted
2026-03-17T16:57:00+13:00 INFO - [20230101T2100Z/b/01:preparing] => submitted
2026-03-17T16:57:07+13:00 INFO - [20230101T2200Z/b/01:preparing] => submitted
2026-03-17T16:57:15+13:00 INFO - [20230101T2300Z/b/01:preparing] => submitted
I suspect something is wrong with the code that handles tasks that are parented in some cycles and parentless in others (which is a tricky situation, but I thought we had tests covering it…)
I’ve posted a tentative bug fix, which needs more work due to the highly sensitive nature of the code it touches: Fix parentless spawning. by hjoliver · Pull Request #7237 · cylc/cylc-flow · GitHub
It looks to have same root cause as an earlier bug report mixed parentless/non-parentless task cause premature shutdown · Issue #5730 · cylc/cylc-flow · GitHub which wasn’t prioritized because “Fortunately this sort of alternating parented/parentless structure is probably unlikely in real workflows” … (but yours has it).
FYI, in this context we say b is parentless in the T12 recurrence but parented in T00:
T00 = "a => b"
T12 = "b"
This matters because Cylc normally “spawns” tasks into the active window in response to upstream “parent” output completion (in the T00 recurrence, completion of a:succeeded will spawn b), but parentless tasks do not depend on upstream outputs so Cylc has to handle them differently.
@srennie - in lieu of a fix in (I hope) the next maintenance release, if you can explain what you’re trying to achieve by having a task wait on another task in some cycles but not in others, we might be able to suggest a workaround.
@hilary.j.oliver the real-life scenario is that I have a task get_obs (task b) that runs hourly, but I have a task dearchive_radar (task a) that runs daily to untar daily radar tarballs, which get_obs then pulls from every hour. Then what follows is the usual DA → forecast pattern (task c) that introduces the inter-cycle dependency (I got different behaviour in my example suite with and without inter-cycle dependency, at different versions).
The workaround I have implemented is b[-PT1H]:submit => b which forces b to have a parent, and has minimal delay in runtime. Open to better suggestions, but that is probably sufficient?
Nice - I think that’s a good workaround.