Using environment variables that contain shell metacharacters

jfrost-mo · June 20, 2023, 3:56pm

I want to pass some YAML (that further contains some HTML) into a task through an environment variable. However when I try the job fails to run as internal " characters end up breaking the job shell script.

The relevant bit of my flow.cylc is reproduced below, and breaks on the " in the HTML bit.

[runtime]
    [[run_cset_recipe]]
        [[[environment]]]
        CSET_RECIPE = """
title: Plot Instant Air Temperature
description: |
  <p>Extracts and plots the instantaneous 1.5m air temperature from a file and writes
  it to a new one.</p>
  <p>These descriptions are HTML, so you can do things like <b>bold</b>,
  <i>italics</i>, <span style="color: red; font-family: cursive;">and more</span>.
  </p>
... More YAML ...
"""

While it can be triple quoted in the flow.cylc file, when it is added to the cylc__job__inst__user_env() function in the job script it ends up having issues.

How would I go about safely passing in an environment variable that might contain shell quoting characters? I’m using cylc 8.

hilary.j.oliver · June 20, 2023, 10:21pm

If you look at the job script, the problem is that Cylc just assigns the value of the environment config item, verbatim, to an environment variable, wrapped in double quotes. So I think you just need to escape the double quotes inside the value. The fact that it is a multiline value doesn’t make any difference.

[scheduling]
   [[graph]]
      R1 = foo
[runtime]
   [[foo]]
      script = "echo $MYVAR"
      [[[environment]]]
          MYVAR = "The \"quick\" brown fox"

Run this, then:

 $ cylc log -f j fro//1/foo
...
cylc__job__inst__user_env() {
    # TASK RUNTIME ENVIRONMENT:
    export MYVAR
    MYVAR="The \"quick\" brown fox"
...
}

$ cylc log fro//1/foo
Workflow : fro/run8
Job : 1/foo/01 (try 1)
User@Host: oliverh@NIWA-1022450

2023-06-21T10:17:18+12:00 INFO - started
The "quick" brown fox   # <-------------------------------- yay!
2023-06-21T10:17:18+12:00 INFO - succeeded

oliver.sanders · June 21, 2023, 1:58pm

You could potentially use a Jinja2 global/filter to handle this e.g:

[[[environment]]]
   MYVAR = """{{ escape('''
title: Plot Instant Air Temperature
description: |
  <p>Extracts and plots the instantaneous 1.5m air temperature from a file and writes
  it to a new one.</p>
  <p>These descriptions are HTML, so you can do things like <b>bold</b>,
  <i>italics</i>, <span style="color: red; font-family: cursive;">and more</span>.
  </p>
... More YAML ...
   ''') }}"""

# jinja2Globals/escape_quotes.py

def escape_quotes(string): ...

jfrost-mo · June 21, 2023, 2:12pm

That’s an interesting way to do it. Currently I’ve just manually escaped the few cases I had, as Hilary suggested, however I would really like a less manual way to do it as my usecase here is that end users will write their own YAML inside separate cylc files that get included into the main workflow.

oliver.sanders · June 21, 2023, 4:50pm

What are you trying to do with this YAML, it looks like documentation for the task?

If so we have a better solution on the way…

jfrost-mo · June 21, 2023, 4:56pm

The YAML is a CSET recipe file, which is essentially a list of python functions and parameters for them. It also has a little bit of documentation that can appear on the output, though to be honest we are thinking of moving that out of there.

To be honest, most recipe files shouldn’t contain quotes, which are the problem here, but I would really like this to be robust, as they are valid syntax and these are user facing config.

The idea is that users define their particular use case in a cylc file that gets included into the main flow.cylc, as illustrated below.

oliver.sanders · June 22, 2023, 9:22am

For task metadata (title, description, etc), you’ll probably be better off using the Cylc task [meta] section (which allows arbitrary keys if needed). This will become visible in the GUI in the future, you can assume markdown and probably ReStructuredText formats will be supported. Sadly I can’t give a timeframe for this work at this point.

For task configuration, environment variables are always going to be a limitation for this approach, e.g. no syntax highlighting or validation, potential interaction with special shell characters, etc. Normally this sort of configuration is rolled up into a standalone application such as a Rose application. These applications can then be developed, validated and tested in isolation from the larger workflow and use whatever configuration formats they desire. Rose doesn’t support YAML, but your choice is not restricted to Rose and YAML files can be included with Rose applications, just not edited with rose edit. E.G. a good option for these YAML configs might be to define a JSON schema for the config, you could then use tools such as this VSCode plugin to assist with developing these configurations. (Note, you could get Rose to perform this validation which could be useful if you’re already invested in Rose for other purposes).

I’m guessing the convenience of being able to define the application and workflow configuration in the same file has guided this approach, which makes sense, but the screenshot above suggests you are also using Rose applications too so the logic is already split between multiple files?

Topic		Replies	Views
Coherence between inline and external script calling Cylc Support	3	150	February 8, 2024
Setting runtime env variables from a script Cylc Support	5	37	January 24, 2025
Task parameter templates Cylc Support	2	353	July 22, 2021
Dealing with long list of environmental variables Cylc Support	5	51	August 4, 2024
Migration: unbound variable Cylc 8 Migration	2	24	March 7, 2025

Using environment variables that contain shell metacharacters

Related topics