Cylc8 on archer2

Hi there,
I am trying to get cylc8 to work on archer2. I am very much struggling getting a simple hello world cylc8 workflow to run. The problem is that The home directory is not visible on the worker nodes, see also a similar question about cylc7 from last year. Following the instructions I set the following variables in my global init script:

export DATADIR=/work/n02/n02/n02magi
export CYLC_WORKFLOW_RUN_DIR=${DATADIR}/cylc-run/${CYLC_WORKFLOW_NAME}
export CYLC_WORKFLOW_DEF_PATH=${CYLC_WORKFLOW_RUN_DIR}

Unfortunately, it is still looking for the script in the wrong place:

/var/spool/slurmd/job634423/slurm_script: line 67: /home/n02/n02/n02magi/cylc-run/hello-archer/.service/etc/job.sh: No such file or directory
/var/spool/slurmd/job634423/slurm_script: line 68: cylc__job__main: command not found

Any suggestions?
Thanks
magnus

Hello,

So long as your login and compute nodes share one common filesystem it is possible to make this set up work using the symlink dirs configuration.

https://cylc.github.io/cylc-doc/8.0b2/html/reference/config/global.html#global.cylc[install][symlink%20dirs]

This is a platform configuration which allows you to move your cylc-run (or other workflow sub directories) to other locations.

In your global.cylc, tell Cylc to move the workflow run dir onto the shared filesystem:

[install]
    [[symlink dirs]]
        [[[archer2]]]  #  <= the archer2 install target
            run = /shared/fs/

When the first task in a workflow runs on a platform, Cylc will set up the symlink and install onto the shared filesystem:

$HOME/cylc-run/
    my-workflow -> /shared/fs/cylc-run/my-workflow

Cylc commands will follow this symlink.

I’ve added a cylc-doc issue to document this edge-case - platforms: platform configuration for compute nodes with no $HOME · Issue #315 · cylc/cylc-doc · GitHub.

Let us know how you get on,
Oliver.

Thanks for the quick reply. Unfortunately it is not working yet. So I have:

n02magi@uan01:~> cat ~/.cylc/flow/global.cylc
[install]
   [[symlink dirs]]
      [[[archer2]]]  
         run = /work/n02/n02/n02magi
[platforms]
   [[archer2]]
      job runner = slurm
      hosts = localhost
      install target = archer2

However, it still installs the workflow into $HOME

Ok, not sure why that isn’t working, ideas:

  • This global.cylc configuration needs to be on the host where you issue the cylc install ; cylc play commands.
  • Could try running cylc config on the host where you issue the cylc install ; cylc play commands to make sure the config is being picked up properly.
  • Symlink dirs changes take effect when the workflow is first installed. If it has already been installed you will need to remove it (in all locations) or create a fresh install to test the changes.

I got it to install into the correct place using

[install]
    [[symlink dirs]]
        [[[localhost]]]  
            run = /work/n02/n02/n02magi
  [[archer2]]
  job runner = slurm
  hosts = localhost
  install target = localhost
  cylc path = /work/n02/n02/n02magi/bin/

However it is trying to run

 . "$HOME/cylc-run/hello-archer/.service/etc/job.sh"
 cylc__job__main

The variable HOME exists put the file system is just not visible. So I guess I need to modify how that script is generated.

I got it to install into the correct place using

Great. Sorry, yes that should be [install][symlink dirs]localhost rather than archer2 if

So I guess I need to modify how that script is generated.

You will not be able to modify this without editing the Cylc source code as the global init-script (documented in the Cylc 7 issue) is run after this job file has been sourced.

I’ve taken a look in, Cylc 7 hardcoded the job file location whereas Cylc 8 uses the $HOME environment variable to locate it which breaks your setup.

Should be a fairly simple fix, I’ll see if I can work something out.

excellent, thanks. I’ll have a go at trying this out

Unfortunately we’ve found issues with this approach and are looking at alternatives, hopefully will have something soon.

Let me know when there is something I can try out or if I can help in any other way

1 Like

I just noticed the comment in the issue tracker. The use of the home directory for cylc-run has puzzled me. Why is that not configurable and that way it would be a lot easier to find the correct location.

The cylc-run directory used to be configurable at Cylc 7. On the surface, this seems like a simple feature, however, the distributed nature of Cylc makes this surprisingly hard to implement, the feature was never fully functional.

Hardcoding the path of the cylc-run directory has greatly simplified Cylc’s internal logic and should work for all Unix systems. Except, unfortunately Archer2.

Note that many Unix systems rely on paths relative to $HOME such as Bash and SSH.

Although bash/ssh can take command line arguments (or I could source a script to setup an environment) and work without $HOME. I guess I will ask the archer2 team if they would consider setting HOME to the work directory on the workers.

Is there a way I can inject a line into the submission script so that I can set the HOME environment variable?

Thinking about it…
I can set the HOME var using the slurm option --export=HOME=/work/n02/n02/n02magi. The problem is that cylc starts a login shell which I guess loses this setting. Will a non-login shell help?

At present you can’t insert a line where you’d need to but we’re hoping to propose a change which will allow this to work for you …

For completeness, I’ll note that while ~/cylc-run is now hardcoded as the top level run directory, the new workflow installation process can be configured to automatically symlink sub-directories to other locations. (In case anyone reading this thinks their home disk quota has to be big enough to handle all workflow output).

Having a standard location for the top level run directory not only simplifies Cylc internals, it’s also important for 3rd party tools that (for instance) need to view workflow outputs, or job logs, or databases.

Ok, we think we have a solution using a combination of symlink dirs and global init-script.

If you’re able to try it out the proposed changes are here:

This should make it into the first Cylc 8 release candidate.

1 Like