Dear Cylc,
I have found that for some of my Cylc jobs the Cylc GUI says they have status ‘submitted’ when according to sacct they are actually ‘failed’ (see screenshot on https://www.tobymarthews.com/uploads/1/1/3/1/11315558/sacct_status.png ).
I guess either they are failed (in which case the Cylc GUI is wrong) or not (in which case sacct is wrong). Can anyone help me resolve this? What am I missing?
Thanks very much for any comments.
Toby Marthews
It is possible that the submissions disappeared from the Slurm queue, try polling the submissions using cylc poll.
In your job.err files you will find:
ERROR: cylc-7.8.1-dirty not found in /apps/jasmin/metomi
You appear to be using a very old, non-standard, version of Cylc which is not available on the compute nodes.
Thank you for the comments. I have restarted my example and managed to get the sacct status to change from “failed” to “pending”, but the jobs still do not ever start so I am in more or less the same position as before. New screenshot: https://www.tobymarthews.com/uploads/1/1/3/1/11315558/screenshot.png
-
dpmatthews: I have upgraded to Cylc 7.8.12 (the latest installed by the managers of the platform I am using). I have requested an upgrade to Cylc 8.1.3, but it seems I am the first user to request this, so when they will do it I don’t know. I don’t have installation rights myself.
-
oliver.sanders: Thank you for the advice. However, I don’t get any information from cylc poll (see below). Am I using this right? Could you please give an example?
[tmarthews@cylc1 ~]$ sacct --format=JobID%20,JobName%60,Partition,AllocCPUS%5,State%10,Submit%20,Start%20,End%20,Elapsed,ExitCode
JobID JobName Partition Alloc State Submit Start End Elapsed ExitCode
50334743 vn7.2_elevq.fcm_make_debug.1 par-multi 1 PENDING 2023-05-12T00:37:13 Unknown Unknown 00:00:00 0:0
50334744 vn7.2_elevq.fcm_make_mpi.1 par-multi 1 PENDING 2023-05-12T00:37:13 Unknown Unknown 00:00:00 0:0
50334745 vn7.2_elevq.fcm_make_mpi_rivers-only.1 par-multi 1 PENDING 2023-05-12T00:37:13 Unknown Unknown 00:00:00 0:0
[tmarthews@cylc1 ~]$ cylc poll vn.2_elevq.fcm_make_debug.1
Contact info not found for suite “vn.2_elevq.fcm_make_debug.1”, suite not running?
[tmarthews@cylc1 ~]$ cylc poll vn.2_elevq.fcm_make_debug
Contact info not found for suite “vn.2_elevq.fcm_make_debug”, suite not running?
[tmarthews@cylc1 ~]$ cylc poll vn.2_elevq
Contact info not found for suite “vn.2_elevq”, suite not running?
[tmarthews@cylc1 ~]$ cylc poll 50334745
Contact info not found for suite “50334745”, suite not running?
[tmarthews@cylc1 ~]$
I maintain the Cylc installation on JASMIN (and Cylc 8.1.4 is installed).
Your suite appears to be running OK - the fact that the jobs are queuing is not a Cylc issue.
I’ll send you an email.