Dear Cylc,
I have found that for some of my Cylc jobs the Cylc GUI says they have status ‘submitted’ when according to sacct they are actually ‘failed’ (see screenshot on https://www.tobymarthews.com/uploads/1/1/3/1/11315558/sacct_status.png ).
I guess either they are failed (in which case the Cylc GUI is wrong) or not (in which case sacct is wrong). Can anyone help me resolve this? What am I missing?
Thanks very much for any comments.
Toby Marthews
It is possible that the submissions disappeared from the Slurm queue, try polling the submissions using cylc poll
.
In your job.err
files you will find:
ERROR: cylc-7.8.1-dirty not found in /apps/jasmin/metomi
You appear to be using a very old, non-standard, version of Cylc which is not available on the compute nodes.
Thank you for the comments. I have restarted my example and managed to get the sacct status to change from “failed” to “pending”, but the jobs still do not ever start so I am in more or less the same position as before. New screenshot: https://www.tobymarthews.com/uploads/1/1/3/1/11315558/screenshot.png
-
dpmatthews: I have upgraded to Cylc 7.8.12 (the latest installed by the managers of the platform I am using). I have requested an upgrade to Cylc 8.1.3, but it seems I am the first user to request this, so when they will do it I don’t know. I don’t have installation rights myself.
-
oliver.sanders: Thank you for the advice. However, I don’t get any information from cylc poll (see below). Am I using this right? Could you please give an example?
[tmarthews@cylc1 ~]$ sacct --format=JobID%20,JobName%60,Partition,AllocCPUS%5,State%10,Submit%20,Start%20,End%20,Elapsed,ExitCode
JobID JobName Partition Alloc State Submit Start End Elapsed ExitCode
50334743 vn7.2_elevq.fcm_make_debug.1 par-multi 1 PENDING 2023-05-12T00:37:13 Unknown Unknown 00:00:00 0:0
50334744 vn7.2_elevq.fcm_make_mpi.1 par-multi 1 PENDING 2023-05-12T00:37:13 Unknown Unknown 00:00:00 0:0
50334745 vn7.2_elevq.fcm_make_mpi_rivers-only.1 par-multi 1 PENDING 2023-05-12T00:37:13 Unknown Unknown 00:00:00 0:0
[tmarthews@cylc1 ~]$ cylc poll vn.2_elevq.fcm_make_debug.1
Contact info not found for suite “vn.2_elevq.fcm_make_debug.1”, suite not running?
[tmarthews@cylc1 ~]$ cylc poll vn.2_elevq.fcm_make_debug
Contact info not found for suite “vn.2_elevq.fcm_make_debug”, suite not running?
[tmarthews@cylc1 ~]$ cylc poll vn.2_elevq
Contact info not found for suite “vn.2_elevq”, suite not running?
[tmarthews@cylc1 ~]$ cylc poll 50334745
Contact info not found for suite “50334745”, suite not running?
[tmarthews@cylc1 ~]$
I maintain the Cylc installation on JASMIN (and Cylc 8.1.4 is installed).
Your suite appears to be running OK - the fact that the jobs are queuing is not a Cylc issue.
I’ll send you an email.