Hi there,
Using cylc
8.1.4 I’m getting this intermittent error, which happens whether or not I use the runN
type info in the workflow ID.
Any ideas as to what the issue might be?
This is only happening for one workflow, the other active running ones seem fine.
> cylc log u-cw073//1/model_monitor2 -f o -m t
Authentication failed.
> cylc log u-cw073/runN//1/model_monitor2 -f o -m t
2023-06-09T13:39:29+12:00 WARNING - You do not need to include runN in the
workflow ID; Cylc will select the latest run if just the workflow name is used
Authentication failed.
> cylc log u-cw073/run128//1/model_monitor2 -f o -m t
Authentication failed.
Cheers,
Jonny
Hi Jonny,
That’s a mystery!
No authentication is needed to view job logs via the CLI. They’re just files on disk, so it’s down to filesystem permissions. The exact phrase “Authentication failed” doesn’t even appear in the cylc-flow codebase.
If you like, I can try to take a closer look with you on Monday.
Hilary
1 Like
thanks @hilary.j.oliver
it seems possible to do with a problem on w-mauivlab02
? i’m also getting consistent submission errors with this workflow too.
> cylc log u-cw073
2023-06-12T04:23:46+12:00 INFO - Workflow: u-cw073/run136
2023-06-11T16:23:46Z INFO - Scheduler: url=tcp://w-cylc02.maui.niwa.co.nz:43077 pid=17917
2023-06-11T16:23:46Z INFO - Workflow publisher: url=tcp://w-cylc02.maui.niwa.co.nz:43036
2023-06-11T16:23:46Z INFO - Run: (re)start number=1, log rollover=1
2023-06-11T16:23:46Z INFO - Cylc version: 8.1.4
2023-06-11T16:23:46Z INFO - Run mode: live
2023-06-11T16:23:46Z INFO - Initial point: 1
2023-06-11T16:23:46Z INFO - Final point: 1
2023-06-11T16:23:46Z INFO - Cold start from 1
2023-06-11T16:23:46Z INFO - New flow: 1 (original flow from 1) 2023-06-12 04:23:46
2023-06-11T16:23:46Z INFO - [1/model_monitor2 waiting(runahead) job:00 flows:1] => waiting
2023-06-11T16:23:46Z INFO - [1/model_monitor2 waiting job:00 flows:1] => waiting(queued)
2023-06-11T16:23:46Z INFO - [1/model_monitor2 waiting(queued) job:00 flows:1] => waiting
2023-06-11T16:23:46Z INFO - [1/model_monitor2 waiting job:01 flows:1] => preparing
2023-06-11T16:23:48Z WARNING - platform: None - Could not connect to w-mauivlab02.
* w-mauivlab02 has been added to the list of unreachable hosts
* jobs-submit will retry if another host is available.
2023-06-11T16:23:48Z CRITICAL - [1/model_monitor2 preparing job:01 flows:1] submission failed
2023-06-11T16:23:48Z INFO - [1/model_monitor2 preparing job:01 flows:1] => submit-failed
2023-06-11T16:23:48Z ERROR - Incomplete tasks:
* 1/model_monitor2 did not complete required outputs: ['succeeded']
2023-06-11T16:23:48Z CRITICAL - Workflow stalled
2023-06-11T16:23:48Z WARNING - P1D stall timer starts NOW
2023-06-11T16:53:47Z INFO - Clearing bad hosts: {'w-mauivlab02'}
aha yeah, this host seems to be the issue @hilary.j.oliver …
> ssh w-mauivlab02
Authentication failed.
1 Like
FYI this is solved now @hilary.j.oliver. it was an internal issue regarding inability to ssh
to a particular host.
That makes sense. Thanks for the update @jonnyhtw.
1 Like