Ok, I was unable to replicate the issue with a minimal test suite (same conda env and same Cylc version) on another cluster so it must be something else. Sorry!
Edit: and the fact the minimal test suite works on the alternate cluster and not on the original cluster further supports the theory that there’s something about the original cluster not playing nice with Cylc 8. I’ll try to double check that all the configs are the same.
Edit 2: Well now it started happening on the alternate cluster but I was able to replicate with my minimal test suite. It seems to be when PBS throws the following error, Cylc doesn’t know what to do with it:
qsub: request rejected as filter hook ‘main_hook’ encountered an exception. Please inform Admin
Also, it doesn’t appear to be spawning/running downstream dependent tasks either, just the affected task disappears from the graph. I tried running with debugging enabled but didn’t get any other details at the job level. Relevant flow log below which shows Cylc is indeed detecting the submit failure but then it removes the task thinking it’s a proxy?
2023-11-09T18:51:09Z INFO - Command actioned: force_spawn_children(['20231110T0000Z/test_B'], outputs=['succeeded'], flow_num=None)
2023-11-09T18:51:09Z INFO - [20231110T0000Z/test_C waiting(queued) job:00 flows:none] => waiting
2023-11-09T18:51:09Z INFO - [20231110T0000Z/test_C waiting job:01 flows:none] => preparing
2023-11-09T18:51:09Z DEBUG - REMOTE INIT NOT REQUIRED for localhost
2023-11-09T18:51:09Z DEBUG - [20231110T0000Z/test_C preparing job:01 flows:none] host=localhost
2023-11-09T18:51:09Z DEBUG - ['jobs-submit', '--debug', '--utc-mode', '--path=/bin', '--path=/usr/bin', '--path=/usr/local/bin', '--path=/sbin', '--path=/usr/sbin', '--path=/usr/local/sbin', '--', '$HOME/cylc-run/test_8/run6/log/job'] ... # will invoke in batches, sizes=[1]
2023-11-09T18:51:09Z DEBUG - 20231110T0000Z/test_C -triggered off ['20231110T0000Z/test_B'] in flow none
2023-11-09T18:51:09Z DEBUG - ['cylc', 'jobs-submit', '--debug', '--utc-mode', '--path=/bin', '--path=/usr/bin', '--path=/usr/local/bin', '--path=/sbin', '--path=/usr/sbin', '--path=/usr/local/sbin', '--', '$HOME/cylc-run/test_8/run6/log/job', '20231110T0000Z/test_C/01']
2023-11-09T18:51:10Z DEBUG - [jobs-submit cmd] cylc jobs-submit --debug --utc-mode --path=/bin --path=/usr/bin --path=/usr/local/bin --path=/sbin --path=/usr/sbin --path=/usr/local/sbin -- '$HOME/cylc-run/test_8/run6/log/job' 20231110T0000Z/test_C/01
[jobs-submit ret_code] 0
[jobs-submit out]
[TASK JOB SUMMARY]2023-11-09T18:51:10Z|20231110T0000Z/test_C/01|32|None
[TASK JOB COMMAND]2023-11-09T18:51:10Z|20231110T0000Z/test_C/01|[STDERR] qsub: request rejected as filter hook 'main_hook' encountered an exception. Please inform Admin
2023-11-09T18:51:10Z ERROR - [jobs-submit cmd] cylc jobs-submit --debug --utc-mode --path=/bin --path=/usr/bin --path=/usr/local/bin --path=/sbin --path=/usr/sbin --path=/usr/local/sbin -- '$HOME/cylc-run/test_8/run6/log/job' 20231110T0000Z/test_C/01
[jobs-submit ret_code] 32
[jobs-submit out] 2023-11-09T18:51:10Z|20231110T0000Z/test_C/01|32|None
2023-11-09T18:51:10Z DEBUG - [20231110T0000Z/test_C preparing job:01 flows:none] (internal)submission failed at 2023-11-09T18:51:10Z
2023-11-09T18:51:10Z CRITICAL - [20231110T0000Z/test_C preparing job:01 flows:none] submission failed
2023-11-09T18:51:10Z INFO - [20231110T0000Z/test_C preparing job:01 flows:none] => submit-failed
2023-11-09T18:51:10Z DEBUG - [20231110T0000Z/test_C submit-failed job:01 flows:none] task proxy removed (finished)