Complex scenario but the short of it is we have some custom PBS hooks written in ksh (with Bash-incompatible usage like typesets) for pre and post handling of jobs which relies on a custom ksh script being imported into jobs at runtime. Cylc jobs are Bash so this breaks and as a result the PBS exit handler reports the job as failed (due to the typeset errors reported by Bash) even though the job script itself succeeds. All our PBS reporting tools reflect the non-0 exit code. However, the Cylc task reports (correctly) a successful run.
My attempt to describe this in stages:
- Cylc job script submitted
- PBS pre-handler works fine; prints header info which will appear in job.out
- Cylc job script completes - exit 0 ← Cylc sees this and is what gets logged by Cylc
- PBS post-handler fails - exit 1 ← PBS sees this and this is what gets logged in PBS and is printed in the job.out
How does Cylc get the job exit code from stage 2 if PBS thinks the job failed in stage 3?