Cylc tui and web ui not detecting running jobs?

hey,

a few hours ago, several of my cylc jobs either failed, or went to submit-failed. the jobs themselves are still running however and producing data.

is there a way to get cylc to ‘reconnect’ to the running jobs?

i’ve tried polling them but this doesn’t do anything.

thanks

jonny

Polling a supposedly-failed task should return it to running or succeeded status, if it did not in fact fail.

I suspect something has got wrong on our system recently that causes incorrect poll results. I will investigate tomorrow…

1 Like

thanks @hilary.j.oliver, as you mentioned in our local Teams chat, it seems the same as what happened last week to another NIWA user.

I think I have just run into this same issue on Maui. I have jobs that are still running and producing output, but showing Failed in the Web UI. Interestingly, the box in the Workflows sidebar is still blue, but if I click on the running suite it shows Failed.

1 Like

I haven’t been able to reproduce the problem here. If you think it’s reproducible for you, please get in touch with me directly. However, I suspect it is a symptom of recent Slurm problems on our HPC.

Thanks @hilary.j.oliver, my conversation with NeSI support suggests it’s probably the Slurm controller issues. I’ll let you know if it doesn’t resolve once that is fixed.

2 Likes