I have a workflow (full DA cycling workflow) that got stuck because I had missed a connection in my graph. After fixing it, I restarted, but got an error.
2023-05-15T01:21:43Z ERROR - ('20230314T0600Z', 'sy_um_fcst_ss_06', 'succeeded')
Traceback (most recent call last):
File "/g/data/dp9/cst565/conda_envs/productive/lib/python3.9/site-packages/cylc/flow/scheduler.py", line 687, in start
await self.configure()
File "/g/data/dp9/cst565/conda_envs/productive/lib/python3.9/site-packages/cylc/flow/scheduler.py", line 476, in configure
self._load_pool_from_db()
File "/g/data/dp9/cst565/conda_envs/productive/lib/python3.9/site-packages/cylc/flow/scheduler.py", line 737, in _load_pool_from_db
self.workflow_db_mgr.pri_dao.select_task_pool_for_restart(
File "/g/data/dp9/cst565/conda_envs/productive/lib/python3.9/site-packages/cylc/flow/rundb.py", line 873, in select_task_pool_for_restart
callback(row_idx, list(row))
File "/g/data/dp9/cst565/conda_envs/productive/lib/python3.9/site-packages/cylc/flow/task_pool.py", line 502, in load_db_task_pool_for_restart
itask_prereq.satisfied[key] = sat[key]
KeyError: ('20230314T0600Z', 'sy_um_fcst_ss_06', 'succeeded')
2023-05-15T01:21:43Z CRITICAL - Workflow shutting down - ('20230314T0600Z', 'sy_um_fcst_ss_06', 'succeeded')
After some finagling, I managed to get the suite started again, long enough to trigger tasks. But my suite refused to recognise that dependencies were met and continue the current flow. I had to very manually trigger the next cycle, much like Continuing a run that completed - Cylc / Cylc 8 Migration - Cylc Workflow Engine except in this case my run was nowhere near completed.
Q1. Is there any good way to fix/work around a key error like this?
I tried reproducing in a simple suite, but it did not give an error. However, it also refused to recognise the fulfilled prerequisites and continue on. I’m not sure if this is a bug, or a feature, or if it has been fixed since 8.1.2.
Here is the example.
Original graph with mistake:
[scheduling]
initial cycle point = 20230101T06
final cycle point = 20230103T00
initial cycle point constraints = T00, T06, T12, T18
runahead limit = P3
[[graph]]
R1 = install => setup_6
PT1H = """
housekeep[-PT1H] => task_6?
setup_6 => task_6? => housekeep
task_6:fail? => task_6_ss
"""
Forcing task_6 (e.g. a forecast task) to fail, task_6_ss (e.g. a shortstep forecast) runs successfully. But housekeep doesn’t run because it has unmet prerequisites.
If I fix the graph
PT1H = """
housekeep[-PT1H] => task_6?
setup_6 => task_6?
task_6:fail? => task_6_ss
task_6? | task_6_ss => housekeep
"""
And reload/restart, housekeep should now be allowed to run. However, housekeep does not run and does not appear as an active task. No further tasks run therefore run.
Q2. Why does housekeep not recognise that it is now able to run, or why does it lose its status as an active task after changing the graph?