Any way to replay from broken task instead of playing from start?

bpabla50 · June 2, 2025, 8:10pm

Can someone please suggest on these two sub topics:

Suppose I have workflow like it installed in workflow/runname
A =>B =>C =>D1 =>D2 =>D3 , where D1-3 are the 3 instances of task D.

In order to play : cylc play workflow/runname , which runs all tasks.
Suppose task D2 failed. I fixed it, and I want to start from broken point..
What will be the syntax of “cylc play” say starting from task D2?

“cylc install” assumes that previous workflow/runname should not exist. I have to clean this directory, before I issue “cylc install”. Is there any way to force cylc install to overwrite previous workflow directory? If not, do we have “cylc clean” type of command to clean the previous workflow directory instead of “rm -rf”.

hilary.j.oliver · June 2, 2025, 10:15pm

cylc play naturally restarts from wherever the workflow got to previously, but if there’s a failed task present you have tell the scheduler what to do about that. Either:

manually retrigger the failed task after fixing it (cylc trigger)
or carry on as if it succeeded, by manually setting outputs (cylc set).

The procedure is the same even if the workflow was stopped or killed: it will revive in the same state, with the failed task still present.

To be explicit, if D2 fails due to a fixable bug in it:

fix the bug (and install the fix with cylc reinstall, if it is in a workflow source file)
retrigger the failed task - it should now succeed so the flow can continue

(Note if the bug is in the flow.cylc rather than a script, and you’re intervening in a running workflow rather than restarting a stopped one, you also need to do cylc reload to tell the running scheduler to update it’s configuration from the altered flow.cylc.)

Note also we have shortcut commands for these processes:

cylc vip - validate, install, play
- (run from scratch)
cylc vr - validate, reinstall, reload or play
- (update a running workflow after changing the source directory)

hilary.j.oliver · June 2, 2025, 10:29pm

It sounds like you are using an explicit run name (e.g. cylc install --run-name=blah). If so, to rerun your workflow from scratch you must either:

install to a new run name, for the new run
or clean out the old run directory, to reuse the old run name

It’s a bit easier to use the default numbered runs instead of run names. Then you can install new runs as you like and it will automatically increment the run number.

Either way (run names or numbers) previous runs still exist (and you can even continue running them) until you clean them up.

If not, do we have “cylc clean” type of command to clean the previous workflow directory instead of “rm -rf”.

That’s exactly what cylc clean does - it removes run directories. But it’s better than rm -rf because:

it checks that the target workflows are not running at the time
it automatically cleans up symlinked run sub-directories (i.e., if follows the symlinks)
it automatically cleans up remote run directories too, for distributed workflows

We don’t support this because it’s a dangerous practice - if you accidentally rerun a workflow from scratch in an existing run directory then you’ve wiped out the correct workflow state and contaminated its outputs.

If you really want to force it, you can manually remove the workflow .service directory, so that Cylc will not recognise it as an existing run directory anymore - not recommended!

bpabla50 · June 10, 2025, 1:15pm

Thanks.
I tried to restart from failure task run_mod_2p5km_num2 using “cylc trigger”. Please clarify argument list “id,cycle,task” with respect to directory path of task in work directory.

ENV) [bpabla68@cedar1 1]$ ls
run_mod_10km  run_mod_2p5km_num1  run_mod_2p5km_num2  run_mod_2p5km_num3  run_prep_10km  run_prep_2p5km
(ENV) [bpabla68@cedar1 1]$ pwd
/home/bpabla68/scratch/cylc-run/GM_workflowv1/GM_runnamev1/work/1
(ENV) [bpabla68@cedar1 1]$ cd /home/bpabla68/scratch/cylc-run
(ENV) [bpabla68@cedar1 cylc-run]$ cylc trigger GM_workflowv1/GM_runnamev1/1/run_mod_2p5km_num2
InputError: IDs must be tasks  ...thought id is workflow/runname
(ENV) [bpabla68@cedar1 cylc-run]$ cylc trigger GM_workflowv1/GM_runnamev1//1/run_mod_2p5km_num2
WorkflowStopped: GM_workflowv1/GM_runnamev1 is not running
(ENV) [bpabla68@cedar1 cylc-run]$ cylc play GM_workflowv1/GM_runnamev1

 ▪ ■  Cylc Workflow Engine 8.4.2
 ██   Copyright (C) 2008-2025 NIWA
▝▘    & British Crown (Met Office) & Contributors

INFO - Extracting job.sh to /scratch/bpabla68/cylc-run/GM_workflowv1/GM_runnamev1/.service/etc/job.sh
GM_workflowv1/GM_runnamev1: cedar1.int.cedar.computecanada.ca PID=109318
(ENV) [bpabla68@cedar1 cylc-run]$ squeue --me
          JOBID     USER      ACCOUNT           NAME  ST  TIME_LEFT NODES CPUS TRES_PER_N MIN_MEM NODELIST (REASON) 
(ENV) [bpabla68@cedar1 cylc-run]$ squeue --me
          JOBID     USER      ACCOUNT           NAME  ST  TIME_LEFT NODES CPUS TRES_PER_N MIN_MEM NODELIST (REASON) 
(ENV) [bpabla68@cedar1 cylc-run]$ cylc trigger GM_workflowv1/GM_runnamev1//1/run_mod_2p5km_num2
Command queued
Are above steps correct or I am missing something?

wxtim · June 10, 2025, 3:24pm

That looks right (you correctly interpreted the error messages and changed your command).

If you want to see the results of the trigger (and you haven’t got a GUI) either look at

> cylc log GM_workflowv1/GM_runnamev1   # consider adding "-m t" - tail mode
# or
> cylc tui  # An ascii graphical interface

hilary.j.oliver · June 10, 2025, 11:30pm

@bpabla50 - Cylc has workflow IDs to uniquely identify workflows, and task IDs to uniquely identify tasks within workflows. The components of both workflow and task IDs are separated by / characters. When you need to target a specific task in a specific workflow, e.g. to use a command like cylc trigger, separate the workflow ID from the task ID with //.

This is all explained in the User Guide and in cylc help id in the terminal.

As @wxtim says, it looks as if your IDs are good, once you corrected / to // in the second attempt, but the workflow was not running. You can’t trigger a task in workflow unless the workflow is running.

Then you started up the workflow, and tried the trigger command again - good.

The response is “Command queued” because the what you’ve done (with the trigger command) is to tell the running scheduler to trigger that task in the workflow. The request gets queued to be actioned asynchronously inside the scheduler, ASAP but not instantly, so the result can’t be returned immediately to the command line.

To see the result - which should be that the task submitted a job to run - you have to observe the workflow in the UI, and/or watch the scheduler log, which records all workflow events as they happen.

Topic		Replies	Views
Restarting failed workflow Cylc Support	35	691	September 26, 2022
Retry from a previous task Cylc Support	2	204	August 23, 2023
Workflow start procedure if --no-run-name used Cylc Support	6	387	March 23, 2023
Restarting a workflow mid-way when it has finished, running 1 task Cylc 8 Migration	4	476	August 16, 2022
Symlink resolution when restarting a workflow Cylc Support	26	340	March 11, 2024

Any way to replay from broken task instead of playing from start?

Related topics