Skip retries on specific message output

Hi!

I have a long-running task that often fails with random read errors. I have to fix this, but for the mean time I simply made it retry up to 3 times and in 99% of the cases it works on the second try.

However, this task can also fail for real reasons, when the input data is invalid. I set up a specific error code and an err-script that detects it and sends a custom message, which emails me.

The issue is that the email is only sent on the 4th run, this is a misuse of resources and time. How can I make the job go into fail state when this error-code/message is triggered ?

Here’s a simplified version of the runtime section:

[long_task]]
        script = """
            # this script might fail with random read errors (exit code 1)
            # or it might hang for no apparent reasons (that's why the time limit)
            # or with invalid data (exit code 66)
            python -m the.long.running.script
        """
        err-script = """
            if [ $CYLC_TASK_USER_SCRIPT_EXITCODE == 66 ]; then
                cylc message -- "${CYLC_WORKFLOW_ID}" "${CYLC_TASK_JOB}" 'Invalid Data'
            fi
        """
        execution retry delays = 3*PT30M
        execution time limit = PT5H
        [[[events]]]
            mail events = invalid
        [[[outputs]]]
            invalid = Invalid Data

Thanks!

Hi,

Yes, we have had this idea ourselves: custom outputs: output specific retry delays · Issue #5652 · cylc/cylc-flow · GitHub

But unfortunately, we haven’t found the time to implement this feature just yet.

Until we have this feature, you should be able to abort the retry chain by manually setting the task to failed:

        err-script = """
            if [ $CYLC_TASK_USER_SCRIPT_EXITCODE == 66 ]; then
                cylc set "${CYLC_WORKFLOW_ID}//${CYLC_TASK_JOB}" --out=failed
            fi
        """
2 Likes