Hi Cylc Community,
I am using a Cylc-based model on the JASMIN HPC resource in the UK. JASMIN has a queuing system for jobs (slurm) and I have a maximum walltime of 1 week. My runs often go over this limit.
It would be very useful if I could get an email if my job goes over the walltime limit. I have the following set up in my suite.rc file:
script = "rose task-run --verbose"
[[[events]]]
mail events = submission failed, submission timeout, failed, timeout, succeeded
however, I get no email when the job aborts because it goes over the walltime (I get an error like “slurmstepd: error: *** JOB 16250455 ON host627 CANCELLED AT 2022-09-28T00:37:45 DUE TO TIME LIMIT ***
2022-09-27T23:37:47Z CRITICAL - failed/EXIT”)
I feel sure I just need to add something to my mailevents list above, but what? I didn’t find a list of potential events at Message Triggers — Cylc 8.0.2 documentation so am a bit lost.
Could anyone help with this please?
Very many thanks!
Toby