Following the platforms docs, I’m trying to run a workflow from my computer, with tasks executing on a remote platform (let’s call it ada).
All my tasks execute with platform=ada, and global.cylc has
[platforms]
[[ada]]
hosts = ada
install target = ada
job runner = slurm
However the first job fails submission, and runN/log/scheduler/01-start-01.log reads:
2024-02-15T19:42:41Z INFO - Workflow: basic_cycle/run13
2024-02-15T19:42:41Z INFO - Scheduler: url=tcp://[xxx]:43032 pid=2024043
2024-02-15T19:42:41Z INFO - Workflow publisher: url=tcp://[xxx]:43043
2024-02-15T19:42:41Z INFO - Run: (re)start number=1, log rollover=1
2024-02-15T19:42:41Z INFO - Cylc version: 8.2.4
2024-02-15T19:42:41Z INFO - Run mode: live
2024-02-15T19:42:41Z INFO - Initial point: 19790101T0000Z
2024-02-15T19:42:41Z INFO - Final point: 19790105T0000Z
2024-02-15T19:42:41Z INFO - Cold start from 19790101T0000Z
2024-02-15T19:42:41Z INFO - New flow: 1 (original flow from 19790101T0000Z) 2024-02-15 20:42:41
2024-02-15T19:42:41Z INFO - [19790101T0000Z/init_simulation waiting(runahead) job:00 flows:1] => waiting
2024-02-15T19:42:41Z INFO - [19790101T0000Z/init_simulation waiting job:00 flows:1] => waiting(queued)
2024-02-15T19:42:41Z INFO - [19790101T0000Z/init_simulation waiting(queued) job:00 flows:1] => waiting
2024-02-15T19:42:41Z INFO - [19790101T0000Z/init_simulation waiting job:01 flows:1] => preparing
2024-02-15T19:42:41Z INFO - platform: ada - remote init (on ada)
2024-02-15T19:42:43Z ERROR - platform: ada - initialisation did not complete
COMMAND:
ssh -oBatchMode=yes -oConnectTimeout=10 ada env \
CYLC_VERSION=8.2.4 bash --login -c 'exec "$0" "$@"' cylc \
remote-init ada $HOME/cylc-run/basic_cycle/run13
RETURN CODE:
1
STDERR:
Traceback (most recent call last):
File "/usr/share/cylc/bin/cylc-remote-init", line 43, in <module>
from cylc.flow.option_parsers import CylcOptionParser as COP
File "/usr/lib/python3/dist-packages/cylc/flow/option_parsers.py", line 27, in <module>
from cylc.flow.loggingutil import CylcLogFormatter
File "/usr/lib/python3/dist-packages/cylc/flow/loggingutil.py", line 37, in <module>
from cylc.flow.wallclock import (get_current_time_string,
File "/usr/lib/python3/dist-packages/cylc/flow/wallclock.py", line 23, in <module>
from metomi.isodatetime.timezone import (
ModuleNotFoundError: No module named 'metomi'
It seems that for some reason, Cylc is loading metomi on the remote host. However, metomi is not a requirement of “base” Cylc, and isn’t used anywhere in my workflow.
Is this normal ? If so, is there a way around it (cylc is installed via a module by the HPC administrators on ada, I can’t modify it) [and can we update the docs] ?
metomi-isodatetime is a core dependency of cylc-flow.
It will be pulled in from PyPI by pip install or from conda forge by conda install.
So your systems admins must have borked the installation!
If so, is there a way around it (cylc is installed via a module by the HPC administrators on ada, I can’t modify it)
Depending on access to the internet, and allowed working practices etc., you could install Cylc yourself on your own user account (so long as you can do the same on the job platform too, if that’s on a different filesystem).
and can we update the docs
The docs are OK as-is, because metomi-isodatetime is a core dependency that will be handled automatically by pip or conda- is that what you mean?
metomi-isodatetime is a core dependency of cylc-flow.
My bad, I didn’t even consider this possibility, and thought that it was connected to metomi-rose.
Please disregard my following comments about the docs ^^.
I think the issue might come from optional modules that require loading (e.g. module load cylc).
Is there a way to tell cylc to run a command on the remote host right after ssh but before any other cylc command ? Or is the only way to hack something in the .bashrc ?
Isodatetime should be installed in the same environment as Cylc (it will be if installed by pip, conda or mamba) so you shouldn’t need to jump through any hoops after the environment has been activated (e.g. module load, conda activate, etc). Sounds like one push back to the folks who deploy Cylc as it looks like a broken installation.
That said, Cylc subcommands are issued via Bash login shells, so you can use your .bashrc / .profile / .bash_profile to configure things e.g. set environment variables. Note, running subcommands before Cylc commands are issued won’t enable you to modify the environment of the parent process.
I feel like there’s a misunderstanding here, and maybe I’m doing something wrong.
I am on a local machine A, where cylc is directly available.
I want to run the jobs on a slurm cluster B (on my platform ada), where cylc is only available after running module load cylc over there.
If I read correctly, you’re suggesting that cylc should be directly available on the cluster B. Is that correct ?
We bundle a more advanced wrapper script that also supports the deployment of multiple parallel versions of Cylc with the cylc-flow package. See the installation docs for more details:
The Cylc wrapper script works with VirtualEnv, Conda and Mamba environments. It doesn’t work with module load out the box, though I’m guessing the environment you’re activating via module load was actually created via pip, conda or mamba so the module is probably an unnecessary intermediary.