I have a workflow (full DA cycling workflow) that got stuck because I had missed a connection in my graph. After fixing it, I restarted, but got an error.
2023-05-15T01:21:43Z ERROR - ('20230314T0600Z', 'sy_um_fcst_ss_06', 'succeeded')
    Traceback (most recent call last):
      File "/g/data/dp9/cst565/conda_envs/productive/lib/python3.9/site-packages/cylc/flow/scheduler.py", line 687, in start
        await self.configure()
      File "/g/data/dp9/cst565/conda_envs/productive/lib/python3.9/site-packages/cylc/flow/scheduler.py", line 476, in configure
        self._load_pool_from_db()
      File "/g/data/dp9/cst565/conda_envs/productive/lib/python3.9/site-packages/cylc/flow/scheduler.py", line 737, in _load_pool_from_db
        self.workflow_db_mgr.pri_dao.select_task_pool_for_restart(
      File "/g/data/dp9/cst565/conda_envs/productive/lib/python3.9/site-packages/cylc/flow/rundb.py", line 873, in select_task_pool_for_restart
        callback(row_idx, list(row))
      File "/g/data/dp9/cst565/conda_envs/productive/lib/python3.9/site-packages/cylc/flow/task_pool.py", line 502, in load_db_task_pool_for_restart
        itask_prereq.satisfied[key] = sat[key]
    KeyError: ('20230314T0600Z', 'sy_um_fcst_ss_06', 'succeeded')
2023-05-15T01:21:43Z CRITICAL - Workflow shutting down - ('20230314T0600Z', 'sy_um_fcst_ss_06', 'succeeded')
After some finagling, I managed to get the suite started again, long enough to trigger tasks. But my suite refused to recognise that dependencies were met and continue the current flow. I had to very manually trigger the next cycle, much like Continuing a run that completed - Cylc / Cylc 8 Migration - Cylc Workflow Engine except in this case my run was nowhere near completed.
Q1. Is there any good way to fix/work around a key error like this?
I tried reproducing in a simple suite, but it did not give an error. However, it also refused to recognise the fulfilled prerequisites and continue on. I’m not sure if this is a bug, or a feature, or if it has been fixed since 8.1.2.
Here is the example.
Original graph with mistake:
[scheduling]
    initial cycle point = 20230101T06
    final cycle point = 20230103T00
    initial cycle point constraints = T00, T06, T12, T18
    runahead limit = P3
    [[graph]]
        R1 = install => setup_6
       PT1H = """
          housekeep[-PT1H] => task_6?
          setup_6 => task_6? => housekeep
          task_6:fail? => task_6_ss
       """
Forcing task_6 (e.g. a forecast task) to fail, task_6_ss (e.g. a shortstep forecast) runs successfully. But housekeep doesn’t run because it has unmet prerequisites.
If I fix the graph
        PT1H = """
            housekeep[-PT1H] => task_6?
            setup_6 => task_6?
            task_6:fail? => task_6_ss
           task_6? | task_6_ss => housekeep
        """
And reload/restart, housekeep should now be allowed to run. However, housekeep does not run and does not appear as an active task.  No further tasks run therefore run.
Q2. Why does housekeep not recognise that it is now able to run, or why does it lose its status as an active task after changing the graph?
