Hi there,
I am trying to get cylc8 to work on archer2. I am very much struggling getting a simple hello world cylc8 workflow to run. The problem is that The home directory is not visible on the worker nodes, see also a similar question about cylc7 from last year. Following the instructions I set the following variables in my global init script:
export DATADIR=/work/n02/n02/n02magi
export CYLC_WORKFLOW_RUN_DIR=${DATADIR}/cylc-run/${CYLC_WORKFLOW_NAME}
export CYLC_WORKFLOW_DEF_PATH=${CYLC_WORKFLOW_RUN_DIR}
Unfortunately, it is still looking for the script in the wrong place:
/var/spool/slurmd/job634423/slurm_script: line 67: /home/n02/n02/n02magi/cylc-run/hello-archer/.service/etc/job.sh: No such file or directory
/var/spool/slurmd/job634423/slurm_script: line 68: cylc__job__main: command not found
Any suggestions?
Thanks
magnus
Hello,
So long as your login and compute nodes share one common filesystem it is possible to make this set up work using the symlink dirs
configuration.
https://cylc.github.io/cylc-doc/8.0b2/html/reference/config/global.html#global.cylc[install][symlink%20dirs]
This is a platform configuration which allows you to move your cylc-run
(or other workflow sub directories) to other locations.
In your global.cylc
, tell Cylc to move the workflow run dir onto the shared filesystem:
[install]
[[symlink dirs]]
[[[archer2]]] # <= the archer2 install target
run = /shared/fs/
When the first task in a workflow runs on a platform, Cylc will set up the symlink and install onto the shared filesystem:
$HOME/cylc-run/
my-workflow -> /shared/fs/cylc-run/my-workflow
Cylc commands will follow this symlink.
I’ve added a cylc-doc issue to document this edge-case - platforms: platform configuration for compute nodes with no $HOME · Issue #315 · cylc/cylc-doc · GitHub.
Let us know how you get on,
Oliver.
Thanks for the quick reply. Unfortunately it is not working yet. So I have:
n02magi@uan01:~> cat ~/.cylc/flow/global.cylc
[install]
[[symlink dirs]]
[[[archer2]]]
run = /work/n02/n02/n02magi
[platforms]
[[archer2]]
job runner = slurm
hosts = localhost
install target = archer2
However, it still installs the workflow into $HOME
Ok, not sure why that isn’t working, ideas:
- This
global.cylc
configuration needs to be on the host where you issue the cylc install ; cylc play
commands.
- Could try running
cylc config
on the host where you issue the cylc install ; cylc play
commands to make sure the config is being picked up properly.
- Symlink dirs changes take effect when the workflow is first installed. If it has already been installed you will need to remove it (in all locations) or create a fresh install to test the changes.
I got it to install into the correct place using
[install]
[[symlink dirs]]
[[[localhost]]]
run = /work/n02/n02/n02magi
[[archer2]]
job runner = slurm
hosts = localhost
install target = localhost
cylc path = /work/n02/n02/n02magi/bin/
However it is trying to run
. "$HOME/cylc-run/hello-archer/.service/etc/job.sh"
cylc__job__main
The variable HOME exists put the file system is just not visible. So I guess I need to modify how that script is generated.
I got it to install into the correct place using
Great. Sorry, yes that should be [install][symlink dirs]localhost
rather than archer2
if
So I guess I need to modify how that script is generated.
You will not be able to modify this without editing the Cylc source code as the global init-script
(documented in the Cylc 7 issue) is run after this job file has been sourced.
I’ve taken a look in, Cylc 7 hardcoded the job file location whereas Cylc 8 uses the $HOME
environment variable to locate it which breaks your setup.
Should be a fairly simple fix, I’ll see if I can work something out.
excellent, thanks. I’ll have a go at trying this out
Unfortunately we’ve found issues with this approach and are looking at alternatives, hopefully will have something soon.
Let me know when there is something I can try out or if I can help in any other way
1 Like
I just noticed the comment in the issue tracker. The use of the home directory for cylc-run has puzzled me. Why is that not configurable and that way it would be a lot easier to find the correct location.
The cylc-run
directory used to be configurable at Cylc 7. On the surface, this seems like a simple feature, however, the distributed nature of Cylc makes this surprisingly hard to implement, the feature was never fully functional.
Hardcoding the path of the cylc-run
directory has greatly simplified Cylc’s internal logic and should work for all Unix systems. Except, unfortunately Archer2.
Note that many Unix systems rely on paths relative to $HOME
such as Bash and SSH.
Although bash/ssh can take command line arguments (or I could source a script to setup an environment) and work without $HOME. I guess I will ask the archer2 team if they would consider setting HOME to the work directory on the workers.
Is there a way I can inject a line into the submission script so that I can set the HOME environment variable?
Thinking about it…
I can set the HOME var using the slurm option --export=HOME=/work/n02/n02/n02magi
. The problem is that cylc starts a login shell which I guess loses this setting. Will a non-login shell help?
At present you can’t insert a line where you’d need to but we’re hoping to propose a change which will allow this to work for you …
For completeness, I’ll note that while ~/cylc-run
is now hardcoded as the top level run directory, the new workflow installation process can be configured to automatically symlink sub-directories to other locations. (In case anyone reading this thinks their home disk quota has to be big enough to handle all workflow output).
Having a standard location for the top level run directory not only simplifies Cylc internals, it’s also important for 3rd party tools that (for instance) need to view workflow outputs, or job logs, or databases.
Ok, we think we have a solution using a combination of symlink dirs
and global init-script
.
If you’re able to try it out the proposed changes are here:
This should make it into the first Cylc 8 release candidate.
1 Like