Hello there, I am a bit stuck on the question of how to handle long lists of environment variables.
I am porting some existing suites that currently run as a ginormous shell script which has an equally ginormous list of environment variables that go with it. What would be the best way to handle this in cylc, in order to make sure the variables get passed on to the execution env (usually slurm). Listing all the vars in the flow script is unthinkable in this situation.
Thanx!
Gaby
Can I clarify:
- Are you converting a workflow run by a shell script to a Cylc workflow for the first time?
- Where are these environment variables stored at the moment?
- Do you want to store them in a separate file?
- Do you have Rose installed?
Hello @wxtim, in answer to your questions:
- Yes, I am doing this for the first time. I have worked with cylc (7) a while ago when I was working @ NIWA, but it was a very simple suite.
- The env variables are a mix of variable set in the script and sourced from an file appropriately called “environment.sh”. The shell script then creates on the fly other scripts that are submitted to slurm.
- Yes, because it’s such a huge list I’d like to store them in a separate file
- I don’t have rose installed, but I’ve been considering it.
Thanks for that information.
The shell script then creates on the fly other scripts that are submitted to slurm.
The fun of workflows implemented in shell scripts!
The most amusing ones submit themselves to the queue in an endless chain of submissions.
Cylc Task Environment Variables
In Cylc you can define environment variables like this:
# flow.cylc
[runtime]
[[mytask]]
script = my-script
# these env-vars will be avilable to "my-script" when it runs
[[[environment]]]
ENV_VAR_1 = value-1
ENV_VAR_2 = value-2
ENV_VAR_3 = value-3
# and so on
You can set environment variables for groups of tasks, or for all tasks by using the appropriate “family” helping to cut down on duplication, e.g:
# flow.cylc
[runtime]
[[root]]
# these env vars will be available to all tasks:
[[[environment]]]
ENV_VAR_1 = value-1
ENV_VAR_2 = value-2
ENV_VAR_3 = value-3
# and so on
[[mytask1]]
script = my-script-1
[[mytask2]]
script = my-script-2
When workflow configuration files get large, they can be broken down into smaller parts using include files e.g:
# flow.cylc
%include environment.cylc
[runtime]
[[mytask]]
script = my-script
# environment.cylc
[runtime]
[[root]]
# these env vars will be available to all tasks:
[[[environment]]]
ENV_VAR_1 = value-1
ENV_VAR_2 = value-2
ENV_VAR_3 = value-3
# and so on
You can continue with using a script to define environment variables if desired:
[runtime]
[[mytask]]
env-script = source environment.sh
script = my-script
Rose Applications
Configuring environment variables like this works fine for simple tasks, but as the complexity of the problem grows, you might end up with large numbers of environment variables, as well as other “resources” such as:
- Namelists (fortran).
- Static files.
- External resources (e.g. in Git / SVN repos).
- CLI arguments.
- etc.
At this point, you might consider bundling all this up into an “application”. Rose is a tool which can do all that enabling you to bundle up those dangling resources and variables into a single isolated unit.
A simple Rose application might look like this:
# apps/mytask/rose-app.conf
[command]
default=my-script
[env]
ENV_VAR_1 = value-1
ENV_VAR_2 = value-2
ENV_VAR_3 = value-3
It can be run by itself like so:
$ rose app-run -C /path/to/app
Or run by a Cylc task like so:
[runtime]
[[mytask]]
# this will look for an app in "apps/mytask"
script = rose task-run
Everything in a Rose application can be assigned metadata as desired, e.g. a title & description. Environment variables can be assigned a type as well as rules which govern what values the variable can be set to.
Rose applications are “configurations”, Rose provides a GUI called rose edit
which makes use of this metadata and can be used to edit these configurations. Note, the rose edit
GUI is currently only available in the legacy Rose 2019 release, but is not required. Porting work for the GUI will begin soon.
So, using a Rose application allows you to pull environment variables (and other things) out of the flow.cylc
file and into a dedicated file for that particular task.
Rose is entirely optional, but has good integration with Cylc. Cylc workflows which use Rose applications will often also provide a rose-suite.conf
file to configure the workflow itself, again entirely optional.
For more information, this Rose tutorial takes you through the process of turning an over-complex Cylc task into a Rose application.
Installing Rose
Via conda/mamba:
conda install cylc-flow=8.3 cylc-rose
Via pip:
pip install cylc-flow=='8.3.*' cylc-rose
If you are interested in installing the legacy rose edit
GUI, the installation instructions are here, it is possible to use this in combination with the newer version of Rose by using a wrapper script to manage the environments (can provide further details if desired).
Thanx Oliver, all this is very helpful. I will def. need to use rose, as - indeed - there is a nightmarish list of other resources that are used by this suite. Plugging away at it a little bit at a time :0)
Unfortunately due to the restrictions imposed on us by our HPC provider we are unable to use any GUI or browser UI solutions cylc-tui is magnificent for our situation. I have also managed (I think) to bypass this for the cylc-uiserver hub, but I can’t claim victory until I manage to install it properly with the right permissions and test it.
indeed it is!