Job logs not showing in WUI

I have Cylc 8.3.1 installed with whichever Cylc UI Server was current with that release.

The Hub and UI Servers run on one host (the WUI host) and the schedulers run on other hosts. All hosts share a file system for home directories. The WUI runs under a local user (jupyterhub) and uses sudo to launch UI Servers. The UI server accounts can run commands through ssh without a password to the scheduler hosts. This bit works fine.

The cylc hub is installed into a local conda environment in the jupyterhub account’s home directory.

The global.cylc file is installed locally in /opt/cylc8/flow on all relevant hosts. It provides a wealth of platforms, most of which are not relevant to this question.

Each user has their own cylc wrapper and cylc environments. However we have a default cylc wrapper and default cylc environment for calls to cylc from the UI Servers to the scheduler hosts that don’t come with enough information for us to determine the correct wrapper to target. This means that the calls to cylc psutil resolve and return successfully.

However, we can’t see Cylc job logs in the UI server, even though they’re right there on the filesystem:

$ ls /home/${user}/cylc-run/${workflow}/log/job/${cycle_point}/${task}/*
[...]/01:
job  job-activity.log  job.err  job.out  job.status  job.xtrace

[...]/02:
job  job-activity.log  job.err  job.out  job.status  job.xtrace

[...]/03:
job  job-activity.log  job.err  job.out  job.status  job.xtrace

[...]/04:
job  job-activity.log  job.err  job.out  job.status  job.xtrace

[...]/NN:
job  job-activity.log  job.err  job.out  job.status  job.xtrace

When I tail the uiserver logs in /home/${user}/.cylc/uiserver/log/log I only see:

2024-08-09T09:04:17 INFO     jacintar: authorized to read

for this access. The hub logs also include:

[I 2024-08-09T09:04:17.643 CylcHubApp log:192] 200 POST /user/${user}/cylc/graphql (jacinta@::ffff:10.19.96.12) 586.03ms

Scheduler logs work just fine:

and in the uiserver logs I see (edited):

2024-08-09T09:08:20 INFO     $ cylc cat-log --mode=tail --prepend-path ${user}/$workflow -f scheduler/01-start-01.log

The same appears in the Cylc hub logs.

I have read the documentation for cylc cat-log a few times, and I can’t work out what the correct command would be to list out all of the job logs for a given cycle point/job/job submission attempt.

This gives me the job output (edited):

$ ./cylc cat-log --mode=tail --prepend-path \~${user}/${workflow}/ -f job/${cycle_point}/${task}/04/job.out
# $host}:/home/${user}/cylc-run/${workflow}/log/job/${cycle_point}/${task}/job.out
Workflow : ${workflow}
Job : ${cycle_point}/${task}/04 (try 4)
[...]

But this doesn’t give me a list of job files:

$ ./cylc cat-log --mode=list-dir --prepend-path \~${user}/${workflow}/ -f job/${cycle_point}/${task}/04
config/01-start-01.cylc
config/20240809T051442+0000-rose-suite.conf
config/flow-processed.cylc
install/01-install.log
scheduler/01-start-01.log

Neither explain why nothing appears in the user’s UI server logs or the hub logs when I try to access these through the UI server.

I’m not sure how a UI Server can successfully call cylc psutil to another host, and not be able to call cylc cat-log locally. I’m sure it’s a configuration issue on my side, but I can’t guess what. Any assistance gratefully accepted.

Each user has their own cylc wrapper and cylc environments.

That sounds a bit fishy, worth making sure all calls are going through the appropriate wrapper.

I’m not sure how a UI Server can successfully call cylc psutil to another host, and not be able to call cylc cat-log locally.

The cylc cat-log command is run locally, but it will re-invoke remotely as required to access logs so it is a similar situation to cylc psutil.

But this doesn’t give me a list of job files:

The command is slightly off, to list files, try:

cylc cat-log workflow//cycle/task/job --list-dir

(the -f and --mode=list-dir options are orthogonal)

FYI Cylc UIServer 1.5.1 will introduce a fix that logs any cat-log errors when trying to list the log files. We can expedite this release if needed

Your provided command didn’t work for me, but I did work out that I can list files with:

cylc cat-log -m list-dir workflow//cycle/job

I look forward to this newer version. It is certainly a challenge to guess why it’s not working in certain circumstances.

To resolve the issue preventing our cylc logs from being found, I needed to ensure that the wrapper was explicitly called to populate our global.cylc file properly.

As per Use full path to Cylc when submitting jobs by ScottWales · Pull Request #6302 · cylc/cylc-flow · GitHub

I think the fix may be to adapt the wrapper you are using to allow it to function correctly without being in the path.

This is one of many Cylc / Rose functionalities which will be broken by the cylc / rose commands not being present in $PATH.

The wrapper is in PATH in most of our situations, but the cylc conda environment’s bin directory is not (because the assumption was that the wrapper would find it, which it usually does).

In the case of

  1. cylc-uiserver/cylc/uiserver/resolvers.py at 7b492b7b7909fdf6a9ef93990276a3cdbf2d3c4b · cylc/cylc-uiserver · GitHub
  2. cylc-uiserver/cylc/uiserver/resolvers.py at 7b492b7b7909fdf6a9ef93990276a3cdbf2d3c4b · cylc/cylc-uiserver · GitHub

the code found the Cylc Hub’s conda environment’s bin/cylc just fine. However, not calling the wrapper meant that the variables required to correctly populate the platforms provided in the global.cylc file, were not properly set, and therefore we couldn’t get log files because the relevant platform definition couldn’t be found.

It might be a bug that Cylc gets hung up on the platform when the files it wants should be right there on disk. We’re still patching resolvers.py to remove force-remote because our WUI host can’t talk to the jobhost. We want it to find the files on disk.

We are exploring adding the path to Cylc’s conda environment bin/ to the wrapper. But that won’t fix this particular issue, which was because the wrong cylc (the one in the conda/bin directory not the wrapper) was being called.

Are you inserting environment variables into your wrapper script for use in the global.cylc file?

Note, we implemented the ability to define environment variables in rose-suite.conf files for use in global.cylc files as requested.

I suspect we’ve complicated things by moving from specifying our localhost configuration via the [local] platform to the [localhost] platform. Unfortunately I haven’t had a chance to test this out, and we’re very close to a point where undoing that is likely to be “too late” and require significant rework etc from our model developers who are already under the pump to get their work into production.

I think you can chalk up this issue to that.