I have a workflow running in Cylc 8.0.1 that is stopping due to reaching its runahead limit. But when I run cylc log
or cylc tui
I don’t see any failed or pending tasks prior to the runahead limit. Is there a way to find out what task(s) are preventing the runahead limit from advancing forward?
First of all, 8.0.1 is now fairly old and there have been a number of important bugfixes since. We strongly recommend upgrading to 8.1.4.
The icon displayed in TUI indicates whether the task is runahead limited - it should have a dot above and to the left, like this:
or in the GUI:
I believe Cylc is supposed to internally raise the runahead limit if this is necessary to avoid a stall. So it’s worth confirming that there are really are any runahead limited tasks at all.
Yes, that ° symbol is exactly what I’m seeing in the tui. The problem is that I don’t know why my tasks are being runahead limited. I have tasks in the oldest active cycle point that are being runahead limited, which means that either 1) there are older active tasks that aren’t being shown in the tui and that I don’t know about or 2) the runahead limiting is determined by something other than the oldest active cycle. Is there a way to find out what task(s) might be responsible for causing the runahead limiting?
Here’s a screenshot of my TUI view. It shows that the first active cycle (20150107T1350Z) is runahead limited. The only older tasks being displayed are those from the immediate prior cycle that have already completed.
Ok, so I found the following in cylc log
:
2023-07-20T14:45:02Z WARNING - Partially satisfied prerequisites:
* 20150107T1345Z/make_plots is waiting on ['20150107T1345Z/compute_predictions:succeeded']
* 20150107T1350Z/make_plots is waiting on ['20150107T1350Z/compute_predictions:succeeded']
* 20150107T1355Z/make_plots is waiting on ['20150107T1355Z/compute_predictions:succeeded']
* 20150107T1315Z/make_plots is waiting on ['20150107T1310Z/write_truth_run_obs:succeeded']
* 20150107T1320Z/make_plots is waiting on ['20150107T1315Z/write_truth_run_obs:succeeded']
* 20150107T1325Z/make_plots is waiting on ['20150107T1320Z/write_truth_run_obs:succeeded']
2023-07-20T14:45:02Z CRITICAL - Workflow stalled
Those write_truth_run_obs
had already completed successfully, not sure why the log file said otherwise. I manually triggered the older make_plots
tasks, and once they completed the runahead limit advanced and the workflow resumed. So then the question is, why did cylc behave as if the write_truth_run_obs
tasks had never completed, and why didn’t the stalled make_plots
tasks show up in the tui?
Hi @jhaiduce
First I’ll second @MetRonnie: please upgrade Cylc. The 8.0.1 release is almost year old. You can have new releases installed alongside older ones, so it needn’t disrupt anyone who doesn’t want to upgrade yet.
Those
write_truth_run_obs
had already completed successfully, not sure why the log file said otherwise. … why did cylc behave as if [they] had never completed
Did your workflow undergo any runtime interventions, such as manual re-triggering or use of cylc remove
?
why didn’t the stalled
make_plots
tasks show up in the tui?
What you see in the Cylc UIs is n
(default 1) graph edges around the “active window” of the workflow, which includes active tasks, incomplete tasks, and tasks actively waiting on external triggers. Your make_plots
tasks are still waiting on unsatisfied prerequisites - they’re not in the active window, so they’ll only be visible if captured by the n=1
window.
However, there should be incomplete tasks upstream that are visible, and which are the root cause of the unsatisfied prerequisites. Which is why I’m wondering if (a) manual interventions caused this; or (b) you’ve hit a bug fixed in the last year.
[As a result of this I’ve flagged with the team that we should consider moving partially satisfied tasks into the active window of the workflow].
I have cylc installed in my home directory; the only thing stopping me from upgrading is the concern that doing so might affect the troubleshooting process in a way that prevents me from understanding what the original problem was (that and the flow with the problem was still running). I just upgraded a few minutes ago.
Yes, I’ve done quite a bit of that. I’ve used both cylc trigger
and cylc remove
multiple times on various tasks and cycles.
Either of those seem very possible.
Thanks @hilary.j.oliver, much appreciated!
Also thank you @MetRonnie for your help on this!