Use of CYLC_WORKFLOW_SHARE_DIR

Hi there I have a problem with the use of CYLC_WORKFLOW_SHARE_DIR. In the runtime section of my flow.cylc I have the following (note that the workflow is executed on a remote machine and that my global.cylc file specifies specific paths to softlink the work and share dirs to)

[ runtime ]

    [[ DATES ]]

        script = """
            ${CYLC_WORKFLOW_RUN_DIR}/env/init_dyn_vars.py
        """

The python script calculates a number of environment variables and then is supposed to output them to a file in the CYLC_WORKFLOW_SHARE_DIR:

# Write out file with dynamic env variables
output = os.path.join(os.environ['CYLC_WORKFLOW_SHARE_DIR'],'dyn_env_vars.sh')
f = open(output, "w")
f.write('#!/bin/bash\n')
for k, v in dict_exp.items():
    f.write(f"export {k}='{v}'\n")
f.close()

This file is never generated however. What am I missing?
Thanx
Gaby

Hi,

We’re going to need a bit more information to understand why this example isn’t working as you expect it to.

Here’s a quick example showing how the share dir works:

[scheduling]
    [[graph]]
        R1 = one => two

[runtime]
    [[one]]
        # write to a file in the share dir in one task
        script = """
            echo 'Hello World!' > "${CYLC_WORKFLOW_SHARE_DIR}/message"
        """

    [[two]]
        # read from a file in the share dir from another task
        script = """
            cat "${CYLC_WORKFLOW_SHARE_DIR}/message"
        """
$ cylc vip -n myworkflow

$ # wait for the workflow to finish
$ cylc cat myworkflow//1/two
Workflow : myworkflow/run1
Job : 1/two/01 (try 1)
User@Host: me@localhost

Hello World!
2024-09-25T14:57:02+01:00 INFO - started
2024-09-25T14:57:05+01:00 INFO - succeeded

$ tree ~/cylc-run/myworkflow/runN/share
...myworkflow/runN/share
|-- cycle -> .../myworkflow/run1/share/cycle
`-- message

So, I’ve taken your example and ran it. It runs fine locally, but when I specify the remote platform it does not.

Aha.

The share directory is specific to the platform you are using (unless your platform has been configured to use a shared filesystem).

Cylc doesn’t have any fancy logic to monitor the share directory, detect any newly created or modified files, then synchronise them to other platforms. You have to move the data yourself, e.g. using rsync or scp.

Got it, so it may make sense to will switch to the work directory instead for my use

The work directory is local too so that won’t work either (without an explicit rsync/scp).

One option that is sometimes used for examples like this is to make use of Cylc’s automatic installation functionality. When the first task runs on a remote platform, Cylc will copy across various files that the remote jobs might need, e.g. the bin/ directory.

You can see what files Cylc will install by default here (note you can’t configure the work/ or share/ directories).

Here’s an example that puts the file into the lib/ directory (which gets installed by default):

[scheduling]
    [[graph]]
        R1 = one => two

[runtime]
    [[one]]
        script = """
            echo 'Hello World!' > "${CYLC_WORKFLOW_RUN_DIR}/lib/message"
        """

    [[two]]
        script = """
            cat "${CYLC_WORKFLOW_RUN_DIR}/lib/message"
        """
        platform = my-remote-host

Taking a look at your script above, it looks like you are trying to get one task to write out a file that sets a bunch of environment variables for other tasks to use?

If so, you might want to look into broadcasts as a possible solution. Broadcasts are implemented as messages to the Cylc scheduler, rather than as files on the filesystem. As a result no files or synchronisation are required.

Here’s an example:

[scheduling]
    [[graph]]
        R1 = configure => run

[runtime]
    [[configure]]
        script = """
            cylc broadcast \
                "${CYLC_WORKFLOW_ID}" \
                -p "${CYLC_TASK_CYCLE_POINT}" \
                -n 'TASK_GROUP' \
                -s "[environment]NUMBER=$(( RANDOM ))" \
                -s "[environment]DATE=$(date)" \
                -s "[environment]FOO=BAR"
        """

    [[TASK_GROUP]]

    [[run]]
        inherit = TASK_GROUP
        script = """
            echo "NUMBER=$NUMBER"
            echo "DATE=$DATE"
            echo "FOO=$FOO"
        """
        platform = myremoteplatform

Any tasks which inherit from TASK_GROUP will have access to these environment variables.

Note, the configure task will need to run before the TASK_GROUP tasks.

Yes, your example with lib works for me, but it the same mods in my code do not work… It does not look like the script is being run at all
Here’s my flow.cylc in its entirety:

#!Jinja2

[meta]
    title = "GLO12V4 prototype cylc workflow"
    descripton = """
        Prototype to reproduce a GLO12V4 ease pipeline
        as a cylc workflow. No archiving or storage tasks to begin with
    """

{% from "d2j" import d2j %}
{% set CYCLE_START_DATE = '20201001' %}
{% set CYCLE_END_DATE = '20201002' %}
{% set JUL_START_DATE = d2j(CYCLE_START_DATE) %}
{% set JUL_END_DATE = d2j(CYCLE_END_DATE) %}
{% from "datetime" import datetime as dt %}
{% set LCYCLE = (dt.strptime(CYCLE_END_DATE, '%Y%m%d').date() - dt.strptime(CYCLE_START_DATE, '%Y%m%d').date()).days -1 %}

[scheduler]
    install = env/, script/, config/

[task parameters]
    recup = obs, atmf, bdy, statics, postfiles, assimparam
    archi = mkoutputdir, mkdirstorage

[ scheduling ]

    initial cycle point = {{ CYCLE_START_DATE }}
    final cycle point = {{ CYCLE_END_DATE }}

    [[ graph ]]

        P1D = """ recup<recup> & archi<archi> => recup_mfiles => run_model"""

%include './env/environment.cylc'

[ runtime ]

    [[ DATES ]]

        script = """
            ${CYLC_WORKFLOW_RUN_DIR}/env/init_dyn_vars.py
        """

        [[[ environment ]]]
            MOI_julstart = {{ JUL_START_DATE }}
            MOI_julstop = {{ JUL_END_DATE }}
            MOI_lcycle = {{ LCYCLE }}
            MOI_dstart = {{ CYCLE_START_DATE }}
            MOI_dstop = {{ CYCLE_END_DATE }}

    [[ TRANSFERT ]]
        [[[ directives ]]]
            --partition = transfert
            --nodes = 1
            --ntasks = 1
            --time = 0:10:00

    [[ NORMAL ]]
        [[[ directives ]]]
            --partition = normal256
            --nodes = 1
            --ntasks = 128
            --time = 0:30:00
            --mem = 247000

    [[ PREP ]]
        inherit = DATES, TRANSFERT
        platform = belenos
        env-script = """
            conda activate glo12_ease
        """

    [[ recup<recup> ]]
        inherit = PREP

        script = ${CYLC_WORKFLOW_RUN_DIR}/script/prep/recup_${CYLC_TASK_PARAM_recup}.sh

    [[ archi<archi> ]]
        inherit = PREP, DATES

        script = ${CYLC_WORKFLOW_RUN_DIR}/script/prep/archi_${CYLC_TASK_PARAM_archi}.sh

        [[[ directives ]]]
            --time = 0:30:00

        [[[ environment ]]]
          MOI_ensemble_start = ${MOI_model_ensemble_start}
          MOI_ensemble_end = ${MOI_model_ensemble_end}
          MOI_tagcycle = R${MOI_dstop}M${MOI_ensemble_start}_${MOI_ensemble_end}

    [[ MODEL_RUN ]]
        platform = belenos

        env-script = """
            module purge
            export MODULEPATH=/home/ext/mr/smer/soniv/SAVE/modulefiles:$MODULEPATH
            module load gcc/9.2.0 intel/2018.5.274 intelmpi/2018.5.274 phdf5/1.8.18 netcdf_par/4.7.1_V2 xios-trunk_rev2134
            # include machine dependent file with specific env vars for mpich implementation
            # and function to excute the specific mpich commanda (mpirun, aprun, srun etc).
            # variable HOST is defined in the suite definition file and is the name of the
            # host running the parallel job
            . ${CYLC_WORKFLOW_RUN_DIR}/env/mpich_belenos.sh
        """

    [[ MODEL_RUN_CONDA ]]
        platform = belenos
        env-script = """
            module purge
            export MODULEPATH=/home/ext/mr/smer/soniv/SAVE/modulefiles:$MODULEPATH
            module load gcc/9.2.0 intel/2018.5.274 intelmpi/2018.5.274 phdf5/1.8.18 netcdf_par/4.7.1_V2 xios-trunk_rev2134
            # include machine dependent file with specific env vars for mpich implementation
            # and function to excute the specific mpich commanda (mpirun, aprun, srun etc).
            # variable HOST is defined in the suite definition file and is the name of the
            # host running the parallel job
            . ${CYLC_WORKFLOW_RUN_DIR}/env/mpich_belenos.sh
            conda activate glo12_ease
        """

    [[ recup_mfiles ]]
        inherit = PREP
        script = "${CYLC_WORKFLOW_RUN_DIR}/script/model_run/recup_mfiles.sh"

        [[[ environment ]]]
            MOI_model_freqout = 1
            MOI_dir_tmprun = ${MOI_dir_calc_tmp}/TMPRUN/${MOI_ENSMEMBER_DIR}
            MOI_DIR_CALCU_PARAM = ${MOI_dir_calc_param}
            MOI_dirout_modelsshbudget = ${MOI_dir_calc_tmp}/MODEL_SSH/${MOI_TYPERUN}/${MOI_ENSMEMBER_DIR}

    [[ run_model ]]
        inherit = DATES, MODEL_RUN_CONDA, NORMAL

        script = '${CYLC_WORKFLOW_RUN_DIR}/script/model_run/model_run.sh'

        [[[ directives ]]]
            --nodes = 20
            --time = 0:20:00

        [[[ environment ]]]
            # Directory paths
            MOI_ioserver_program = xios_server.exe
            MOI_model_program = /home/ext/mr/smer/ruggierog/TOOLS/ease_lib_51b653ddfc/branch_4.2_nemoi_1_stochastic_perturbations/cfgs/iORCA025_ICE/BLD/bin/nemo.exe
            MOI_model_procs_node = 125
            MOI_model_ntasks = 2500
            MOI_ioserver_procs_node = 3
            MOI_ioserver_ntasks = 42

Which script? Why do you think this is the case?


Note this line from above:

When the first task runs on a remote platform, Cylc will copy across various files that the remote jobs might need, e.g. the bin/ directory.

If you are relying on Cylc’s remote installation to install your file, then you need to ensure that your file exists before Cylc attempts to install the workflow onto the remote platform. This happens when the first task is submitted to that platform.


Have you had a look at the “broadcast” solution above? If this fits your requirement it’s a much cleaner solution than writing the variables to a file.

Broadcast doesn’t work in this case because all the env vars produced by the script are computed in a non trivial matter.

I think what I need to do is add an initial remote task that runs the init_dyn_vars.py that creates the file. It is only needed by a couple of tasks. I suppose I could also run it as part of the tasks themselves, but I want to avoid duplication if I can.

I don’t think the complexity of the computation should pose a barrier to using cylc broadcast.

You can even build the cylc broadcast command from within your Python script if it helps e.g:

# untested

workflow_id = os.environ['CYLC_WORKFLOW_ID']
cmd = ['cylc', 'broadcast', workflow_id]

for key, value in compute_env():
    cmd.extend(['-s', f'[environment]{key}={value}'])

call(cmd)

Alternatively, the cylc broadcast command can use “broadcast files” which use Cylc format which will be a very similar format to the Bash environment files your script is currently generating.

Example broadcast file:

[environment]
  AAA=42
  BBB=true
  CCC=abcdef

Then issue the broadcast like so:

cylc broadcast <workflow> -p <cycle> -n <namespace> -F broadcast_file.cylc

Thanx for all these ideas. Once we have this workflow working, I plan to do some serious refactoring leveraging as many cylc’s capabilities, including broadcasting. Right now we need to get a working workflow to test against the shell script driven operational one, and the devil is in the environment variables…