How to get a mail event in the case of job abort

Hi Cylc Community,

I am using a Cylc-based model on the JASMIN HPC resource in the UK. JASMIN has a queuing system for jobs (slurm) and I have a maximum walltime of 1 week. My runs often go over this limit.

It would be very useful if I could get an email if my job goes over the walltime limit. I have the following set up in my suite.rc file:

    script = "rose task-run --verbose"
    [[[events]]]
        mail events = submission failed, submission timeout, failed, timeout, succeeded

however, I get no email when the job aborts because it goes over the walltime (I get an error like “slurmstepd: error: *** JOB 16250455 ON host627 CANCELLED AT 2022-09-28T00:37:45 DUE TO TIME LIMIT ***
2022-09-27T23:37:47Z CRITICAL - failed/EXIT”)

I feel sure I just need to add something to my mailevents list above, but what? I didn’t find a list of potential events at Message Triggers — Cylc 8.0.2 documentation so am a bit lost.

Could anyone help with this please?

Very many thanks!

Toby

The list of events (for Cylc 7) can be found here: https://cylc.github.io/cylc-doc/7.9.3/html/appendices/suiterc-config-ref.html#runtime-name-events-event-handler

I can’t see any problem with how you’ve configured this.
Have you checked that you receive emails sent from the command line on JASMIN?
e.g. try:

echo test | mail -s test $USER

If that works then I suggest you try running in debug mode:

rose suite run -- --debug

You’ll then be able to see in the Cylc log file whether it has sent the email.

I’ve tried a simple test on JASMIN and it’s working for me.