Hi Cylc team
My question is directly related to and an extension of the discussion covered in this post.
I’m trying to do the following -
- Try to run main forecast (with 3 retries) (optional success and completion)
- If it fails trigger short timestep forecast (compulsory success and completion)
- In pre-script check the status of main forecast, and set to
expired
- If all goes well, the workflow moves on. However, I need a way to possibly remove the main forecast from the graph entirely.
I’ve basically got the first 3 steps working OK, based on past work in our team, and some updated commands in Cylc 8. However, there appears to be some challenges and confusing outputs for some of the Cylc commands -
- I’ve found
cylc workflow-state <workflow>
to be the most consistent and cleanest to parse. However, only seems to work on scheduler VM and not on the HPC. cylc dump <workflow> --tasks
was fairly easy to parse, but appears to only show things in the core active window (n=0). The help guide suggests it’s for n=0 windowcylc show <workflow>
was also easy to parse, and this is what I’m using at the moment. The help guide suggests it’s forn-window
, so is there a way to choose then
value? Is it basically 1?
I am currently using cylc show
to get my status of the main and alternate forecast task, similar to the snippet below. The main forecast task has the task_status
as a local variable and checks for success of the alternate task when someone tries to run (despite it possibly being on expired
status, so I’m trying to guard against operator error)
{% set FCST_PRE_SCRIPT_TO_LIMIT_TO_ONE = '
# Find out if the alternative UM task has been at least submitted
local task_status
task_status=$(cylc show "${CYLC_SUITE_NAME}//${CYLC_TASK_CYCLE_POINT}/${ALTERNATIVE_TASK_NAME}" \
| grep "state" \
| cut -d":" -f2 \
| xargs)
case ${task_status} in
...
succeeded)
echo "WARNING: Alternative task has succeeded. Do not re-run. Removing from graph and aborting." | tee >(cat >&2)
# cylc remove "${CYLC_SUITE_NAME}//${CYLC_TASK_CYCLE_POINT}/${CYLC_TASK_NAME}"
# Initial attempts at this caused the task to stay on `running` state in the graph, but the logs suggest it succeeded.
return 1
;;
esac
' %}
So, the attempt at removing the main task in its own pre-script didn’t work (commented out above) as it sits indefinitely on the Web UI graph as running
status but logs are generated. At the moment I’m preventing running via the error messages and return 1 on main forecast task if the alternate is queued or running, or even succeeded. Here is a snippet of how I’m setting the main forecast task to expire in the alternate task’s pre-script.
{% set ALT_FCST_PRE_SCRIPT_TO_LIMIT_TO_ONE = FCST_PRE_SCRIPT_TO_LIMIT_TO_ONE ~ '
# The alternate task is running, so we want to hold the normal one so people do not accidently run it
cylc set -o expired "${CYLC_SUITE_NAME}//${CYLC_TASK_CYCLE_POINT}/${CYLC_TASK_NAME/_REPLACEME_FCST_ALT/}"
' %}
Can you please clarify the use of cylc dump
and cylc show
, and which you’d recommend in such cases? I have found that when I have things like submit-failed
on a current cycle, and a waiting
task on the next cycle, it shows up as the same with something like
## From cylc dump
nq_um_fcst_18, 20240915T1800Z, waiting, not-held, not-queued, not-runahead
## From cylc show
state: waiting
output completion: incomplete
I suppose it’s possible to use a combination of the state and the output completion variable values, but if I query the nq_um_fcst_19
(which spawned but cycle isn’t running) it shows the same waiting and incomplete status - so how does one differentiate? The nq_um_fcst_18
is currently on purple in the web UI suggesting its submission failed and I don’t see a way of picking this up unfortunately.
Finally, what would you recommend as the best approach to remove the main forecast task from the graph if I’m trying to guard against operators running failed
or expired
state tasks thinking it needs to run? I am thinking it might be possible to use an exit-script
but I have no experience with this, the suite currently doesn’t use any, and the docs suggest that tasks should be “fast”. I’m not sure if a cylc show
to determine success of the alternate task, and then cylc remove
on the main forecast task is considered fast enough or is bad practice?
Thanks in advance for your help and any guidance on this!