Suite and task environment variables

OK, for topical reasons, I’m wondering if I should start another thread . . .

So, I’ve been checking out the subsuite approach. I would like to make it work, because it does seem less crufty than using parametrized tasks. When using simple stubs for the tasks in the subscript, everything works wonderfully. But the effort to create the tasks for the subsuite is a bear: the subsuite approach is much more elegant at the suite definition level, but it’s proving really challenging at the task script level.

The problem is that the (Python-based) task scripts for the parent suite need to use a great many functions which are common among several different scripts; so they’re contained in utility modules and imported from those modules by the various task scripts. The task scripts for the tasks in the subsuite will need these functions too. So I either have to have two copies of them – one in the utility modules for the parent suite, and a second copy for the subsuite’s tasks – OR, I need the subsuite task scripts to import from the parent suite’s utility modules. The former is a bad idea because these functions get tweaked a lot, and having functions duplicated just makes it likely that some change won’t get propagated between them. OK, I can do the latter: I can have the subsuite’s tasks import functions from the parent suite’s utility modules. But the problem is that the utility modules, in turn, depend heavily upon the environment. When imports are done from these modules by the subsuite task, many of the environment variables set as constants in the utility modules won’t be defined. There’s ways around that, too; but this starts to become epicycles after a while. So it’s not clear anymore what the cleanest approach would be.

The only way I know of to pass portions of the environment from a parent task to a subsuite it kicks off is through the “–set” switch in the “cylc run” command – either specifying the environment variables individually, or by constructing a JSON string – and then using the Jinja2 variables so created to set new environment variables in the subsuite’s definition file. Is there any other way for a subsuite to inherit environment from the parent?

Done, as think others will be interested in how suite environment affects (or not) task jobs.

Here’s a simple example to test environment inheritance.
~/suites/main/suite.rc

[scheduling]
  [[dependencies]]
    graph = "echo"
[runtime]
  [[echo]]
    script = """
      echo "NAME is ${NAME:-alice}"

      rm -rf ~/cylc-run/sub
      cylc register sub ~/suites/main/sub
      cylc run --no-detach sub
    """

and ~/suites/main/sub/suite.rc

[scheduling]
  [[dependencies]]
    graph = "echo"
[runtime]
  [[echo]]
    script = "echo NAME is ${NAME:-alice}"

If I export NAME=bob in my terminal and then run the main suite with cylc-7.8.3 (your version), both the main and sub-suite tasks print NAME=alice, which shows that even background jobs do not inherit the suite environment.

(Incidentally, changes to the job submission mechanism mean Cylc 8 background tasks do see the job submission environment by default, but we recommend using a new config item to prevent that).

So, I’m a bit surprised that your main- and sub-suite tasks don’t have exactly the same problem, i.e., how do your main-suite tasks even see the environment that the modules need? Perhaps you are submitting tasks to PBS (e.g.) with a “copy environment” directive … but then presumably you could run the sub-suite launcher task, and the sub-suite tasks, in the same way, and the environment would get copied all the way through?

Even if suite environment was inherited by local jobs in this way, I wouldn’t recommend relying on that. You’d have a suite that will break if you try to submit jobs to a different platform or batch system. There are other ways to do it, e.g. put your module environment config in a bash file, install that with the suite, and have all your tasks source it at run time. Or use Jinja2 (try the --set-file option, instead of --set).

On where to put your Python modules - we recommend (e.g. see the Suite Design Guide) that suites should be as self-contained as possible, to isolate them from changes to the external environment. If you have more than one suite using the same scripts or modules that are still in development it is very easy to make changes for one suite that inadvertently breaks another.

Avoiding this generally means installing files (scripts, Python modules, etc.) into the suite run directory. You can still edit those files in-place if you need to, or better, deliberately re-install new versions of them, but otherwise the suite is safe from unintended changes. It probably is reasonable for a sub-suite to share the same files as its main-suite, in which case you could install your module library into the main suite run-directory, and pass the location to the sub-suite via Jinja2 inputs.

(Cylc 8 has built-in support for workflow installation from source- to run-directory; at Cylc 7, rose suite-run does it, for Rose users).

Hilary