Dealing with long list of environmental variables

Hello there, I am a bit stuck on the question of how to handle long lists of environment variables.
I am porting some existing suites that currently run as a ginormous shell script which has an equally ginormous list of environment variables that go with it. What would be the best way to handle this in cylc, in order to make sure the variables get passed on to the execution env (usually slurm). Listing all the vars in the flow script is unthinkable in this situation.
Thanx!
Gaby

Can I clarify:

  1. Are you converting a workflow run by a shell script to a Cylc workflow for the first time?
  2. Where are these environment variables stored at the moment?
  3. Do you want to store them in a separate file?
  4. Do you have Rose installed?

Hello @wxtim, in answer to your questions:

  1. Yes, I am doing this for the first time. I have worked with cylc (7) a while ago when I was working @ NIWA, but it was a very simple suite.
  2. The env variables are a mix of variable set in the script and sourced from an file appropriately called “environment.sh”. The shell script then creates on the fly other scripts that are submitted to slurm. :face_vomiting:
  3. Yes, because it’s such a huge list I’d like to store them in a separate file
  4. I don’t have rose installed, but I’ve been considering it.

Thanks for that information.

The shell script then creates on the fly other scripts that are submitted to slurm. :face_vomiting:

The fun of workflows implemented in shell scripts!

The most amusing ones submit themselves to the queue in an endless chain of submissions.

Cylc Task Environment Variables

In Cylc you can define environment variables like this:

# flow.cylc

[runtime]
    [[mytask]]
        script = my-script
        # these env-vars will be avilable to "my-script" when it runs
        [[[environment]]]
            ENV_VAR_1 = value-1
            ENV_VAR_2 = value-2
            ENV_VAR_3 = value-3
            # and so on

You can set environment variables for groups of tasks, or for all tasks by using the appropriate “family” helping to cut down on duplication, e.g:

# flow.cylc

[runtime]
    [[root]]
        # these env vars will be available to all tasks:
        [[[environment]]]
            ENV_VAR_1 = value-1
            ENV_VAR_2 = value-2
            ENV_VAR_3 = value-3
            # and so on

    [[mytask1]]
        script = my-script-1
    [[mytask2]]
        script = my-script-2

When workflow configuration files get large, they can be broken down into smaller parts using include files e.g:

# flow.cylc

%include environment.cylc

[runtime]
    [[mytask]]
        script = my-script
# environment.cylc

[runtime]
    [[root]]
        # these env vars will be available to all tasks:
        [[[environment]]]
            ENV_VAR_1 = value-1
            ENV_VAR_2 = value-2
            ENV_VAR_3 = value-3
            # and so on

You can continue with using a script to define environment variables if desired:

[runtime]
    [[mytask]]
        env-script = source environment.sh
        script = my-script

Rose Applications

Configuring environment variables like this works fine for simple tasks, but as the complexity of the problem grows, you might end up with large numbers of environment variables, as well as other “resources” such as:

  • Namelists (fortran).
  • Static files.
  • External resources (e.g. in Git / SVN repos).
  • CLI arguments.
  • etc.

At this point, you might consider bundling all this up into an “application”. Rose is a tool which can do all that enabling you to bundle up those dangling resources and variables into a single isolated unit.

A simple Rose application might look like this:

# apps/mytask/rose-app.conf

[command]
default=my-script

[env]
ENV_VAR_1 = value-1
ENV_VAR_2 = value-2
ENV_VAR_3 = value-3

It can be run by itself like so:

$ rose app-run -C /path/to/app

Or run by a Cylc task like so:

[runtime]
    [[mytask]]
        # this will look for an app in "apps/mytask"
        script = rose task-run

Everything in a Rose application can be assigned metadata as desired, e.g. a title & description. Environment variables can be assigned a type as well as rules which govern what values the variable can be set to.

Rose applications are “configurations”, Rose provides a GUI called rose edit which makes use of this metadata and can be used to edit these configurations. Note, the rose edit GUI is currently only available in the legacy Rose 2019 release, but is not required. Porting work for the GUI will begin soon.

So, using a Rose application allows you to pull environment variables (and other things) out of the flow.cylc file and into a dedicated file for that particular task.

Rose is entirely optional, but has good integration with Cylc. Cylc workflows which use Rose applications will often also provide a rose-suite.conf file to configure the workflow itself, again entirely optional.

For more information, this Rose tutorial takes you through the process of turning an over-complex Cylc task into a Rose application.

Installing Rose

Via conda/mamba:

conda install cylc-flow=8.3 cylc-rose

Via pip:

pip install cylc-flow=='8.3.*' cylc-rose

If you are interested in installing the legacy rose edit GUI, the installation instructions are here, it is possible to use this in combination with the newer version of Rose by using a wrapper script to manage the environments (can provide further details if desired).

1 Like

Thanx Oliver, all this is very helpful. I will def. need to use rose, as - indeed - there is a nightmarish list of other resources that are used by this suite. Plugging away at it a little bit at a time :0)
Unfortunately due to the restrictions imposed on us by our HPC provider we are unable to use any GUI or browser UI solutions :rage: cylc-tui is magnificent for our situation. I have also managed (I think) to bypass this for the cylc-uiserver hub, but I can’t claim victory until I manage to install it properly with the right permissions and test it.

:tada: indeed it is!