Cylc workflow path variables do not resolve symlinks

I’ve noticed that in Cylc 8 variables such as $CYLC_WORKFLOW_RUN_DIR do not resolve symlinks that have been configured for the platform.

To double check with a simple example, I set up a config file like this:

[install]
    [[symlink dirs]]
        [[[localhost]]]
            run = ~/scratch

And a workflow like this:

[scheduling]
   [[graph]]
       R1 = "test"

[runtime]
   [[test]]
        script = "echo $CYLC_WORKFLOW_RUN_DIR"

It sets up the symlink correctly, but then the job output reports a path in ~ and not ~/scratch.

On the real system, the symlink is resolved when running via slurm, but not resolved when running a background task. Is this what’s supposed to happen? Have I missed something?

In Cylc 7, the equivalent $CYLC_SUITE_RUN_DIR etc variables always resolved the symlink. The change in behaviour causes a bit of trouble with some of our workflows, which we can get around but I wanted to check how it is supposed to work.

I’m a little confused - if I ln -s /pathA /pathB why does it matter which path I subsequently use?

Do you mean that ~/ isn’t symlinked to ~/scratch in a task run on background but is if it’s run by SLURM? If this is the case that’s a platforms setup issue. [symlink dirs][localhost] is probably setting up the installation procedures for one platform. Does your SPICE platform share a filesystem with your local host?

The symlink is always set up correctly, but the variable $CYLC_WORKFLOW_RUN_DIR does not always resolve the symlink.

On the real system we have this in the config file (the rest is omitted):

[install]
    [[symlink dirs]]
        [[[archer2]]]
            run = $DATADIR

[platforms]
    [[archer2, ln0[1-4] ]]
        install target = archer2

Where archer2 uses slurm and ln0[1-4] run in the background. For these platforms the symlink is always set up correctly to $DATADIR.

With the archer2 platform, the variable $CYLC_WORKFLOW_RUN_DIR is $DATADIR and with ln0[1-4] the variable $CYLC_WORKFLOW_RUN_DIR is $HOME.

With Cylc 7 the equivalent variable is always $DATADIR.

Thanks - that clarifies the situation a little.

For more context, we can’t see the home file system from the batch nodes on archer2. In some of our workflows we run background jobs that set file paths for batch jobs using $CYLC_WORKFLOW_RUN_DIR, relying on this to resolve to $DATADIR and not $HOME.

Of course we can do this in a different way (eg run these set up tasks via slurm), but I just wanted to check how it was supposed to work, and that this was not due to an issue with our setup.

I’m still trying to work out if Cylc is behaving correctly (this setup not being super close in style to any of our systems), but if login nodes and batch nodes don’t share a file system they shouldn’t have a common install target, which is close to being a symonym for file system: I’m fairly sure something will go wrong in this scenario.

For context the idea of an install target is the case where you have two login nodes and some compute nodes sharing a file system so it doesn’t matter which login node you fire installations at.

Yes the file system setup on Archer2 is a bit different from what’s expected by Cylc, and does cause some trouble!

However all of the login and batch nodes can see /work which we set as $DATADIR. This is the main file system we use on Archer2.

I’m not certain but I think this may be related to the workaround you use for the missing home - with Cylc 7 it is applied to all archer2 jobs but with Cylc 8 I think I only configured it for the batch jobs which need it. So, unless you’ve changed the config, that may explain it. Feel free to contact me directly about this.

Thanks Dave, I think that explains it. It’s nothing to do with the symlink as such, it’s because we are overwriting $HOME, but only for batch jobs.