Jobs show as 'submitted' on Cylc GUI but actually they have failed

Dear Cylc,

I have found that for some of my Cylc jobs the Cylc GUI says they have status ‘submitted’ when according to sacct they are actually ‘failed’ (see screenshot on https://www.tobymarthews.com/uploads/1/1/3/1/11315558/sacct_status.png ).

I guess either they are failed (in which case the Cylc GUI is wrong) or not (in which case sacct is wrong). Can anyone help me resolve this? What am I missing?

Thanks very much for any comments.

Toby Marthews

It is possible that the submissions disappeared from the Slurm queue, try polling the submissions using cylc poll.

In your job.err files you will find:

ERROR: cylc-7.8.1-dirty not found in /apps/jasmin/metomi

You appear to be using a very old, non-standard, version of Cylc which is not available on the compute nodes.

Thank you for the comments. I have restarted my example and managed to get the sacct status to change from “failed” to “pending”, but the jobs still do not ever start so I am in more or less the same position as before. New screenshot: https://www.tobymarthews.com/uploads/1/1/3/1/11315558/screenshot.png

  • dpmatthews: I have upgraded to Cylc 7.8.12 (the latest installed by the managers of the platform I am using). I have requested an upgrade to Cylc 8.1.3, but it seems I am the first user to request this, so when they will do it I don’t know. I don’t have installation rights myself.

  • oliver.sanders: Thank you for the advice. However, I don’t get any information from cylc poll (see below). Am I using this right? Could you please give an example?

[tmarthews@cylc1 ~]$ sacct --format=JobID%20,JobName%60,Partition,AllocCPUS%5,State%10,Submit%20,Start%20,End%20,Elapsed,ExitCode
JobID JobName Partition Alloc State Submit Start End Elapsed ExitCode


        50334743                                 vn7.2_elevq.fcm_make_debug.1  par-multi     1    PENDING  2023-05-12T00:37:13              Unknown              Unknown   00:00:00      0:0
        50334744                                   vn7.2_elevq.fcm_make_mpi.1  par-multi     1    PENDING  2023-05-12T00:37:13              Unknown              Unknown   00:00:00      0:0
        50334745                       vn7.2_elevq.fcm_make_mpi_rivers-only.1  par-multi     1    PENDING  2023-05-12T00:37:13              Unknown              Unknown   00:00:00      0:0

[tmarthews@cylc1 ~]$ cylc poll vn.2_elevq.fcm_make_debug.1
Contact info not found for suite “vn.2_elevq.fcm_make_debug.1”, suite not running?
[tmarthews@cylc1 ~]$ cylc poll vn.2_elevq.fcm_make_debug
Contact info not found for suite “vn.2_elevq.fcm_make_debug”, suite not running?
[tmarthews@cylc1 ~]$ cylc poll vn.2_elevq
Contact info not found for suite “vn.2_elevq”, suite not running?
[tmarthews@cylc1 ~]$ cylc poll 50334745
Contact info not found for suite “50334745”, suite not running?
[tmarthews@cylc1 ~]$

I maintain the Cylc installation on JASMIN (and Cylc 8.1.4 is installed).
Your suite appears to be running OK - the fact that the jobs are queuing is not a Cylc issue.
I’ll send you an email.