My problem: Symlinks aren’t resolved the same way when running workflows via the UI
Minimal example
Consider the following flow:
[scheduling]
cycling mode = integer
[[graph]]
R1 = do_stuff
[runtime]
[[do_stuff]]
script=". script_real_SYMLK.sh; error"
where the directory structure is
bin
│ └── script_real_SYMLK.sh -> ../../temp/script_real.sh
└── flow.cylc
What happens
When I run via cylc vip
, everything works fine, e.g. script_real_SYMLK.sh
runs, and the task fails on error
.
Next, I go to the UI, click on the buttons to stop the workflow and then start it again. I then re-trigger the task manually from the UI. Now, the symlink doesn’t resolve anymore: script_real_SYMLK.sh: No such file or directory
Discussion
I suppose this may have to do with how the path is initialized via vip
versus via running in the UI ?
Hi,
cylc vip
validates, installs and starts a workflow.
cylc play
starts or restarts an installed workflow.
The UI play button calls the “cylc play” command (it uses the same functionality as the command line).
So what you’re doing is equivalent to:
$ cylc vip
# wait for task to fail
$ cylc stop <id>
$ cylc play <id>
Running cylc play
shouldn’t mess with any symlinks (in an installed workflow), it just restarts the scheduler.
Maybe try restarting the workflow via the command line to rule out UI things. If the command line behaviour differs to the UI behaviour, then it’s worth checking whether the server where you’re running the cylc-ui has the same Cylc configuration as the one on which you’re running the cylc vip
command (i.e. using the cylc config
command). The server name should be displayed in the top-left of the UI.
I’ve replicated the issue using the cmdline, so this indeed isn’t UI-related. However, the original issue still stands.
To sum up the current state:
cylc vip # -> the symlink resolves
cylc stop flow/run1
cylc play flow/run1
cylc trigger flow/run1//1/do_stuff # -> the symlink doesn't resolve
Thanks for checking that. We’ll need a bit more information to investigate. Where is this symlink pointing from and to? And is this a symlink that was created automatically by Cylc using symlink dirs
or is this being created through some other process?
At what stage in your example above does the symlink become broken, cylc stop
, cylc play
?
Here’s the structure:
.
├── flow
│ ├── bin
│ │ └── script_real_SYMLK.sh ⇒ ../../script_real.sh
│ └── flow.cylc
└── script_real.sh
The symlink is created by hand using ln -s ../../script_real.sh ./script_real_SYMLK.sh
So this is a relative symlink which exists in the source directory. It is installed into the workflow’s run directory by cylc install
.
yes, that is correct as far as I understand.
Ah, I’ve just spotted the symlink target is in the directory above the flow.cylc file?
If so, then the target isn’t contained within the workflow source so will not be installed into the run directory by cylc install
(the symlink will be installed but not the thing it’s pointing at)?
If so, I’m confused how this symlink could work with cylc vip
?
Does this symlink resolve on the filesystem from any arbitrary location on the filesystem (note, Cylc may be configured to launch the scheduler on a different host to the one you ran the cylc play
or cylc vip
command on)?
I confirm that it’s indeed in the parent directory of flow.cylc
.
I’m just as confused as you are as to why it works with cylc install
in the first place, but it should be easy to replicate on your end.
Does this symlink resolve on the filesystem from any arbitrary location on the filesystem?
That depends on what exactly you are asking. Since it’s a local reference, it will work no matter from where the original symlink is called, e.g.
> ./flow/bin/script_real_SYMLK.sh # works fine
> cd flow
> ./bin/script_real_SYMLK.sh # works fine
However if you move the symlink around, it won’t work anymore
> mv bin/script_real_SYMLK.sh .
> ./script_real_SYMLK.sh
zsh: no such file or directory: ./script_real_SYMLK.sh
Cylc won’t move the symlink around. If the symlink resolves on the filesystem, but not in a Cylc job, it suggests the job is not seeing the same filesystem (i.e. is being run on a different host to the one you are using to inspect the filesystem).
Two ways this can happen:
- If Cylc was configured to run the job to run on another host e.g.
platform = <other-host>
.
- If Cylc was configured to run the job “locally” but to run the scheduler on another host, e.g.
platform = localhost
(or platform undefined) AND global.cylc[run hosts]available = <other-host>, ...
.
Note, in case the job was configured to run on another host, that host could be “remote” (i.e. have a different file system), in which case Cylc will rsync
the run directory across to the host, during which process the symlink could become broken. This is determined by the install target
in the platform configuration in the Cylc global config.
Using a relative symlink to pull in a resource from outside of the workflow’s source directory might work locally, but might not work for a distributed system. Usually, we either locate the required resources within the workflow’s source directory, or we use a task to install the resources from an external source at runtime, e.g:
[scheduling]
initial cycle point = 2000
[[graph]]
R1 = install => run
P1D = run[-P1D] => run
[runtime]
[[install]]
script = """
cp "${SOURCE}" "${TARGET}"
"""
A common pattern is to locate the external resource in a version control system and use a git
or svn
command to install the resource.
In this case, I’m running the scheduler and the job on the same platform (my laptop).
To be very clear, I’m using the exact files and dir structure described above (which doesn’t contain platform
instructions), and my global.cylc
only contains information for unused platforms:
[platforms]
[[ada]]
hosts = ada
install target = ada
job runner = slurm
[install]
[[symlink dirs]]
[[[ada]]]
run = $SCRATCH/
log = $SCRATCH/
share = $SCRATCH/
work = $SCRATCH/
I’m sure this setup can easily be replicated in your end, and you can check in minutes that it indeed resolves the symlink the first time but not the second.
I’m happy to help but you haven’t yet provided me with enough information to replicate this issue. Perhaps you could create a quick example that I could try?
I don’t understand how what you’re trying to do is supposed to work. The cylc install
command (which is run by cylc vip
) will copy everything in the workflow’s source directory (the directory that contains the flow.cylc
file) into the ~/cylc-run
directory.
So if your workflow source directory looks like this:
├── flow
│ ├── bin
│ │ └── script_real_SYMLK.sh ⇒ ../../script_real.sh
│ └── flow.cylc
└── script_real.sh
Then how is the relative symlink ../../script_real.sh
supposed to work when the workflow is installed into the ~/cylc-run
directory? The file script_real.sh
will not be copied across because it is not in the workflow’s source directory so this relative symlink will be pointing nowhere.
Here’s my attempt to replicate your workflow:
# workflow source directory
.
|-- link-me
`-- workflow
|-- bin
| `-- foo -> ../../link-me
`-- flow.cylc
The symlink never resolves:
$ cylc install ./workflow
INSTALLED workflow/run1 from /.../workflow
$ cat ~/cylc-run/workflow/run1/bin/foo
cat: ../../cylc-run/workflow/run1/bin/foo: No such file or directory
$ cylc vip ./workflow
INSTALLED workflow/run2 from /.../workflow
...
$ cat ~/cylc-run/workflow/run2/bin/foo
cat: ../../cylc-run/workflow/run2/bin/foo: No such file or directory
Can you try with this example: https://we.tl/t-aN9kK7pxJK
After downloading:
> unzip example.zip && cd example/flow/bin
> rm script_real_SYMLK.sh && ln -s ../../script_real.sh script_real_SYMLK.sh
> cd .. && cylc vip
> # wait
> cylc log flow//1/do_stuff
workflow : flow/run1
Job : 1/do_stuff/01 (try 1)
User@Host: [...]
Hello # <-- the symlinked script ran fine
2024-03-08T16:59:33+01:00 INFO - started
It works fine on my end.
PS:
cat ~/cylc-run/workflow/run1/bin/foo
cat: ../../cylc-run/workflow/run1/bin/foo: No such file or directory
I agree with you on that ! But somehow cylc manages to run it, as seen in the job logs…
Unfortunately, I can’t access aibitrary zip files etc from the network I’m on for security reasons. But I can access GitHub repositories or gists, or run bash commands to create the dirs, etc.
From what you’ve shown, the script has run, but this doesn’t necessarily mean that the symlink resolved. My best guess is that script_real.sh
is in your $PATH
so Cylc isn’t actually running it via the symlink.
I think it’s possible that cylc vip
is running the scheduler process from inside the workflow’s source directory, but cylc play
from inside the workflow’s run directory which could explain it.
To confirm if the symlink resolves, try cat ~/cylc-run/example/runN/bin/script_real.sh
.
Does this mean the symlink doesn’t resolve for you?
Absolutely ! It doesn’t resolve on my end either. But still gets executed.
can’t access aibitrary zip files etc from the network I’m on for security reasons. But I can access GitHub repositories or gists, or run bash commands to create the dirs, etc.
Let me turn it into a quick repository then
Absolutely ! It doesn’t resolve on my end either. But still gets executed.
Right! I think that explains it!
When you run cylc vip
the script runs, not because the symlink works, but because the script is in $PATH
.
That wouldn’t explain why it does not run the second time though ? (after stop
play
trigger
)
Yes, I think that the cylc vip
and cylc play
commands are using different working directories. Or preserving $PATH
differently somehow?
1 Like