Cylc broadcast is being cleared automatically

Two of us encountered this same issue today. When you do a cylc broadcast on a task which has already run and succeeded, the broadcast is automatically cleared.

I was using CYLC_VERSION=8.3.1 and my colleague using CYLC_VERSION=8.3.3. We both first noticed it via the WUI but we have shown it also happens via CLI. I don’t know the behaviour in 8.2.

From the scheduler log:

2024-07-31T02:00:40Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20240327T0000Z'], mode=put_broadcast, namespaces=['ct_to_simpler'], settings=[{'environment': {'FORCE_RESEND': 'no'}}])
2024-07-31T02:00:40Z INFO - Broadcast set:
    + [20240327T0000Z/ct_to_simpler] [environment]FORCE_RESEND=no
2024-07-31T02:00:40Z INFO - Broadcast cancelled:
    - [20240327T0000Z/ct_to_simpler] [environment]FORCE_RESEND=no

Pausing the workflow, triggering the task, and then broadcasting allowed the broadcast to be kept.

2024-07-31T02:04:48Z INFO - Pausing the workflow
2024-07-31T02:04:48Z INFO - Command "pause" actioned. ID=0ea0b0b8-5b94-4d06-b90b-81b43962e8a9
2024-07-31T02:05:06Z INFO - Command "force_trigger_tasks" received. ID=8055b39c-10e3-4ccf-97ac-ed3a9bdd5d17
    force_trigger_tasks(flow=['all'], flow_wait=False, tasks=['20240327T0000Z/ct_to_simpler'])
2024-07-31T02:05:07Z INFO - [20240327T0000Z/ct_to_simpler:waiting(runahead)] => waiting
2024-07-31T02:05:07Z INFO - [20240327T0000Z/ct_to_simpler:waiting] => waiting(queued)
2024-07-31T02:05:07Z INFO - Command "force_trigger_tasks" actioned. ID=8055b39c-10e3-4ccf-97ac-ed3a9bdd5d17
2024-07-31T02:06:07Z INFO - Command "broadcast" received.
    broadcast(cycle_points=['20240327T0000Z'], mode=put_broadcast, namespaces=['ct_to_simpler'], settings=[{'environment': {'FORCE_RESEND': 'no'}}])
2024-07-31T02:06:07Z INFO - Broadcast set:
    + [20240327T0000Z/ct_to_simpler] [environment]FORCE_RESEND=no
2024-07-31T02:07:13Z INFO - Command "resume" received. ID=d74fcf91-6398-4317-b4e2-a251c290029c
    resume()
2024-07-31T02:07:14Z INFO - RESUMING the workflow now
2024-07-31T02:07:14Z INFO - Command "resume" actioned. ID=d74fcf91-6398-4317-b4e2-a251c290029c
2024-07-31T02:07:14Z INFO - [20240327T0000Z/ct_to_simpler:waiting(queued)] => waiting
2024-07-31T02:07:14Z INFO - [20240327T0000Z/ct_to_simpler:waiting] => preparing
2024-07-31T02:07:19Z INFO - [20240327T0000Z/ct_to_simpler/05:preparing] submitted to user-dm_calthunder_d:pbs[6195899]
2024-07-31T02:07:19Z INFO - [20240327T0000Z/ct_to_simpler/05:preparing] => submitted
2024-07-31T02:07:27Z INFO - [20240327T0000Z/ct_to_simpler/05:submitted] => running
2024-07-31T02:07:31Z INFO - [20240327T0000Z/ct_to_simpler/05:running] => succeeded
2024-07-31T02:07:31Z INFO - [20240327T0000Z/ct_archive:succeeded] already finished and completed (flows=1))
2024-07-31T02:07:32Z INFO - Broadcast cancelled:
    - [20240327T0000Z/ct_to_simpler] [environment]FORCE_RESEND=no

What were we trying to do? Say we generated some data, and something was found wrong with it in a future task, or the disk got corrupted and we want to rerun the task again with a slightly different setting to delete the old data perhaps. We modify an environment variable in the runtime and run the task. As we can see the task in the WUI and broadcast to it via CLI, we were under the impression we could just do the broadcast and then trigger the task, but that does not work. We can’t see a way to do trigger new task with modified runtime, so our only option appears to be to pause the whole suite. We did not test hold -> trigger -> broadcast -> release.

Is this a bug, or intended behaviour? Is there a way to do this better?

I think that’s a bug, although I haven’t time to investigate today.

To avoid causing (in effect) a memory leak, broadcasts are designed to clear automatically once the workflow has moved on past the point where they can affect upcoming tasks.

The new (Cylc 8) ability to trigger new flows in the past graph is probably subverting this mechanism. Although it’s somewhat surprising that we haven’t run into it before just for retriggering individual tasks in the past.

Note in Cylc 8 that means only that your n-window (around active tasks) is wide enough to capture that part of the graph.

That will work because triggering the task promotes it to the active task pool, so it no longer looks (to the broadcast clearing system) as if the workflow has passed on beyond it.

Just to confirm, is this a bug or correct functionality?

Given what you are trying to achieve, which is perfectly reasonable, I would definitely consider this to be a bug. Hoping to put some thought into it later today. It might be a simple as making the broadcast clearing algorithm flow-aware.

Ok, thanks. One idea, although I feel like this would be hard given the current design, would be more like a classic “edit run” - a once off “edit runtime and trigger” option. It edits runtime and submits and then clears those runtime edits somehow.

I’ve raised an Issue on github about this.

Thanks, and apologies for failing to get onto this yet - too much going on at once at the moment :grimacing: