Cylc 7.9.3: suicided task *occasionally* not suiciding, causing suite hangs

Hi. I hope it’s still possible to get support for 7.9.3 – my customer won’t be migrating to Cylc 8 for a while, so I can’t.

I have a workflow where a task compares the cycle point to the wall clock and sends a message based on the degree to which the suite is lagging the wall clock. On that basis, if the lag is too severe, a great many tasks are suicided, to speed up processing. I use an intermediate dummy task in the way the graph is written, e.g.:

check_lag:too_much_lag => suicide_tasks
suicide_tasks => !taskA
suicide_tasks => !taskB
suicide_tasks => !taskC
suicide_tasks => !taskD
(etc.)

I am occasionally experiencing some weird behavior: one of tasks that should be suicided occasionally isn’t. For example, from the lines above, taskC is not suicided (the suite log shows no attempt to suicide taskC), but only occasionally. Most of the time it’s supposed to suicide, it does so just fine. But occasionally, it’s as if the suite server never tries to suicide the task.

This causes a problem because it leaves the task in a “waiting” state; but it’ll never be spawned because it’ll never be triggered, because all the tasks it’d trigger off of have been successfully suicided. Then, a few hours later, the suite runs into the “max active cycle points” limit and just stops. If I discover this has happened, and manually remove-without-spawning the task that should have been suicided, everything works OK.

Any advice on what to look at? I’ve been banging on this for a while, and folks are rattling my cage for a fix.

Thanks!

Hi, we’re not aware of any issue with suicide triggers at Cylc 7.

It’s worth checking that taskC was constrained by its dependencies at the time suicide_tasks is run. If it gets queued or marked as ready the suicide logic may be skipped.

Could you take a look at the logs to find the point where suicide_tasks yielded the too_much_lag output. After this you the workflow should list each of the tasks it attempted to suicide-trigger.

Cheers,
Oliver