Using environment variables that contain shell metacharacters

I want to pass some YAML (that further contains some HTML) into a task through an environment variable. However when I try the job fails to run as internal " characters end up breaking the job shell script.

The relevant bit of my flow.cylc is reproduced below, and breaks on the " in the HTML bit.

[runtime]
    [[run_cset_recipe]]
        [[[environment]]]
        CSET_RECIPE = """
title: Plot Instant Air Temperature
description: |
  <p>Extracts and plots the instantaneous 1.5m air temperature from a file and writes
  it to a new one.</p>
  <p>These descriptions are HTML, so you can do things like <b>bold</b>,
  <i>italics</i>, <span style="color: red; font-family: cursive;">and more</span>.
  </p>
... More YAML ...
"""

While it can be triple quoted in the flow.cylc file, when it is added to the cylc__job__inst__user_env() function in the job script it ends up having issues.

How would I go about safely passing in an environment variable that might contain shell quoting characters? I’m using cylc 8.

If you look at the job script, the problem is that Cylc just assigns the value of the environment config item, verbatim, to an environment variable, wrapped in double quotes. So I think you just need to escape the double quotes inside the value. The fact that it is a multiline value doesn’t make any difference.

[scheduling]
   [[graph]]
      R1 = foo
[runtime]
   [[foo]]
      script = "echo $MYVAR"
      [[[environment]]]
          MYVAR = "The \"quick\" brown fox"

Run this, then:

 $ cylc log -f j fro//1/foo
...
cylc__job__inst__user_env() {
    # TASK RUNTIME ENVIRONMENT:
    export MYVAR
    MYVAR="The \"quick\" brown fox"
...
}

$ cylc log fro//1/foo
Workflow : fro/run8
Job : 1/foo/01 (try 1)
User@Host: oliverh@NIWA-1022450

2023-06-21T10:17:18+12:00 INFO - started
The "quick" brown fox   # <-------------------------------- yay!
2023-06-21T10:17:18+12:00 INFO - succeeded
2 Likes

You could potentially use a Jinja2 global/filter to handle this e.g:

[[[environment]]]
   MYVAR = """{{ escape('''
title: Plot Instant Air Temperature
description: |
  <p>Extracts and plots the instantaneous 1.5m air temperature from a file and writes
  it to a new one.</p>
  <p>These descriptions are HTML, so you can do things like <b>bold</b>,
  <i>italics</i>, <span style="color: red; font-family: cursive;">and more</span>.
  </p>
... More YAML ...
   ''') }}"""
# jinja2Globals/escape_quotes.py

def escape_quotes(string): ...

That’s an interesting way to do it. Currently I’ve just manually escaped the few cases I had, as Hilary suggested, however I would really like a less manual way to do it as my usecase here is that end users will write their own YAML inside separate cylc files that get included into the main workflow.

What are you trying to do with this YAML, it looks like documentation for the task?

If so we have a better solution on the way…

1 Like

The YAML is a CSET recipe file, which is essentially a list of python functions and parameters for them. It also has a little bit of documentation that can appear on the output, though to be honest we are thinking of moving that out of there.

To be honest, most recipe files shouldn’t contain quotes, which are the problem here, but I would really like this to be robust, as they are valid syntax and these are user facing config.

The idea is that users define their particular use case in a cylc file that gets included into the main flow.cylc, as illustrated below.

For task metadata (title, description, etc), you’ll probably be better off using the Cylc task [meta] section (which allows arbitrary keys if needed). This will become visible in the GUI in the future, you can assume markdown and probably ReStructuredText formats will be supported. Sadly I can’t give a timeframe for this work at this point.

For task configuration, environment variables are always going to be a limitation for this approach, e.g. no syntax highlighting or validation, potential interaction with special shell characters, etc. Normally this sort of configuration is rolled up into a standalone application such as a Rose application. These applications can then be developed, validated and tested in isolation from the larger workflow and use whatever configuration formats they desire. Rose doesn’t support YAML, but your choice is not restricted to Rose and YAML files can be included with Rose applications, just not edited with rose edit. E.G. a good option for these YAML configs might be to define a JSON schema for the config, you could then use tools such as this VSCode plugin to assist with developing these configurations. (Note, you could get Rose to perform this validation which could be useful if you’re already invested in Rose for other purposes).

I’m guessing the convenience of being able to define the application and workflow configuration in the same file has guided this approach, which makes sense, but the screenshot above suggests you are also using Rose applications too so the logic is already split between multiple files?