Currently trying to run a workflow on Cylc 8.2.4.
When it gets to a certain job, it hangs on the job without submitting. I get the following messages in the job-activity.log file:
[jobs-poll ret_code] 0
[jobs-poll out] 2024-05-20T10:48:35+12:00|20140101T0000Z/atmos_main/01|{“job_runner_name”: “slurm”, “job_id”: “3987323”, “job_runner_exit_polled”: 0, “time_submit_exit”: “2024-05-20T10:33:32+12:00”}
[jobs-poll ret_code] 0
[jobs-poll out] 2024-05-20T11:03:35+12:00|20140101T0000Z/atmos_main/01|{“job_runner_name”: “slurm”, “job_id”: “3987323”, “job_runner_exit_polled”: 0, “time_submit_exit”: “2024-05-20T10:33:32+12:00”}
[jobs-poll ret_code] 0
This messages is repeating every 15m minutes with a new timestamp, as I’m assuming it tries to execute the job but fails? Any ideas what would be causing this issue?
I am also getting an error at the start of the job-activity.log file reading:
2024-05-19T22:33:32Z [STDERR] sbatch: error: plugin_load_from_file: dlopen(/opt/cray/pe/atp/libAtpSLaunch.so): /opt/cray/pe/atp/libAtpSLaunch.so: cannot open shared object file: No such file or directory
2024-05-19T22:33:32Z [STDERR] sbatch: error: spank: /opt/cray/pe/atp/libAtpSLaunch.so: Dlopen of plugin file failed
This error doesn’t happen every time I try submit and run the workflow.