Symlink resolution when restarting a workflow

Try turning your relative symlink into an absolute symlink and it will work:

.
|-- link-me
`-- workflow
    |-- bin
    |   |-- bar -> /var/tmp/tmp.T8JCH73dFl/workflow/bin/../../link-me
    |   `-- foo -> ../../link-me
    `-- flow.cylc
$ cat ./workflow/flow.cylc
[scheduling]
    [[graph]]
        R1 = foo & bar

[runtime]
    [[foo]]
        script = foo
    [[bar]]
        script = bar

When I run this example foo fails (relative symlink does not resolve) and bar succeeds (absolute symlink does resolve).

In your example the script and symlink have the same name, in my example I’m using different names so the foo script never works.

For the sake of clarity, here’s the script to replicate my case:

mkdir example && cd example
mkdir -p flow/bin
echo "echo Hello" > script_real.sh
chmod +x script_real.sh
cd flow/bin
ln -s ../../script_real.sh script_real_SYMLK.sh
cd ..
cat > flow.cylc <<eod
[scheduling]
    cycling mode = integer
    [[graph]]
        R1 = do_stuff
[runtime]
    [[do_stuff]]
        script=". script_real_SYMLK.sh; error"
eod

then

cd example/flow
cylc vip
# wait
cylc log flow//1/do_stuff

Well that could indeed explain it then ! But that’s very surprising for the user ^^

I tried running your example and it failed for me:

[example] $ cd $(mktemp -d)
[tmp.uZZvZTW10O] $ mkdir example && cd example
[example] $ mkdir -p flow/bin
[example] $ echo "echo Hello" > script_real.sh
[example] $ chmod +x script_real.sh
[example] $ cd flow/bin
[bin] $ ln -s ../../script_real.sh script_real_SYMLK.sh
[bin] $ cd ..
[flow] $ cat > flow.cylc <<eod
> [scheduling]
>     cycling mode = integer
>     [[graph]]
>         R1 = do_stuff
> [runtime]
>     [[do_stuff]]
>         script=". script_real_SYMLK.sh; error"
> eod
[flow] $ cylc vip -N
$ cylc validate /var/tmp/tmp.uZZvZTW10O/example/flow
Valid for cylc-8.2.4
$ cylc install /var/tmp/tmp.uZZvZTW10O/example/flow
INSTALLED flow/run1 from /var/tmp/tmp.uZZvZTW10O/example/flow
$ cylc play -N flow/run1

 ▪ ■  Cylc Workflow Engine 8.2.4
 ██   Copyright (C) 2008-2024 NIWA
▝▘    & British Crown (Met Office) & Contributors

...

2024-03-08T16:24:39Z INFO - [1/do_stuff running job:01 flows:1] => failed
2024-03-08T16:24:39Z WARNING - [1/do_stuff failed job:01 flows:1] did not complete required outputs: ['succeeded']
2024-03-08T16:24:39Z ERROR - Incomplete tasks:
      * 1/do_stuff did not complete required outputs: ['succeeded']
2024-03-08T16:24:39Z CRITICAL - Workflow stalled
2024-03-08T16:24:39Z WARNING - P3D stall timer starts NOW
$ cylc cat flow//1/do_stuff -f e
/.../cylc-run/flow/run1/log/job/1/do_stuff/01/job: line 42: script_real_SYMLK.sh: No such file or directory
2024-03-08T16:24:38Z CRITICAL - failed/ERR

Which makes sense because the symlink does not resolve:

$ cat ~/cylc-run/flow/run1/bin/script_real_SYMLK.sh 
cat: /.../cylc-run/flow/run1/bin/script_real_SYMLK.sh: No such file or directory

I don’t understand how this works for you. I suspect you have something in your .bashrc or .bash_profile that is putting extra things into the $PATH maybe? Inspecting the environment may yield insights.

Well that could indeed explain it then ! But that’s very surprising for the user ^^

After investigation it looks like they are using the same working directory, so it isn’t that. So I’m not sure why this works for vip but not play, however, I don’t actually understand how this is working for you in the first place.

I just tried running it on another machine (my remote cluster), and I get the exact same result (the symlinked script runs).

To reduce incertitude to a minimum, I’m simply running a single script which is just a concatenation of the previous steps.

I have some interesting results !

> cylc vip . # Works fine
# versus
> cylc install .
> cylc play flow # ...SYMLK.sh: No such file or directory

Which explains why it didn’t work in your case

Which explains why it didn’t work in your case

Nope, this does not work for me either way around. This approach should not logically work as the symlink does not resolve as you confirmed.

To explain what cylc install is doing. We write and edit workflow definitions in the “source directory” (~/cylc-src by default), when we run cylc install, Cylc copies the workflow files into the “run directory” (~/cylc-run). This keeps your running workflows separate from their source allowing you to continue working on your workflow source code without affecting any running installations of that workflow.

So if your ~/cylc-src directory looks like this:

 ~cylc-src/
├──  flow
│   ├──  bin
│   │   └──  script_real_SYMLK.sh ⇒ ../../script_real.sh
│   └──  flow.cylc
└──  script_real.sh

The relative symlink is pointing at ~/cylc-src/script_real.sh.

When you install the workflow your ~/cylc-run directory will look like this:

 ~cylc-run/
└──  flow/
     └──  run1/   # "cylc install" copied this from "~/cylc-src/flow"
          ├──  flow.cylc
          └──  bin/
               └── script_real_SYMLK.sh ⇒ ../../script_real.sh

The relative symlink is now pointing at ~/cylc-run/flow/script_real.sh. Since this file does not exist, the symlink is broken.

How to work around this? If the script is only required by the one workflow, then it can be moved into the workflow’s bin directory where it will be copied across by cylc install. If the script is common to multiple workflows (and you don’t want to make multiple copies to make maintenance easier), you could put the script in a central location e.g. ~/bin and ensure that directory is in your $PATH, e.g. by adding export PATH="$HOME/bin:$PATH" in your ~/.bashrc file.

At our site, when multiple workflows share common resources, we typically move these resources to a common version-controlled location so multiple users can access them. Workflows then install these resources (at a specified revision) into the workflow using an install task. These often use Rose file installation for convenience.