EmPy -- Import Shell Environment Variables

Hi there, I’m hoping to make some Cylc templates using EmPy where I will import shell environment variables from a configuration file and use these throughout my EmPy templates. I’ve noticed a minor issue with the approach that I’ve taken so far and I want to check with the community if:

  1. My approach is the correct / reasonable approach to achieve the above goal, and
  2. if this is the case, if I should submit a bug report about the Cylc EmPy integration.

My Cylc environment is installed with Micromamba from the following yaml definition file

name: cylc-8.3
channels:
  - conda-forge
dependencies:
  - python =3.9.19
  - cylc-flow =8.3
  - cylc-uiserver
  - cylc-rose
  - metomi-rose

As a basic example I’ve constructed, let’s say that I have a shell configuration file configure_cylc.sh written as

# Root directory of Cylc installation                                                                                 
export CYLC_ROOT="${HOME}/cylc"                                                                                       
export PATH="${CYLC_ROOT}:${PATH}"
  
# Location of Micromamba cylc environment                                                                             
export CYLC_HOME_ROOT_ALT="${CYLC_ROOT}/Micromamba/envs"                                                              

# Cylc environment name
export CYLC_ENV_NAME="cylc-8.3"

# Set Cylc global.cylc configuration path to template
export CYLC_CONF_PATH="${CYLC_ROOT}"

that I source in order to set up my Cylc environment and to, e.g., provide some additional environment variables to my global.cylc file and workflow templates.

My test-case workflow is defined as follows

#!EmPy                                                                                                                
@{ CYC_STRT = '2022-12-23T00' }                                                                                       
@{ CYC_STOP = '2022-12-24T00' }                                                                                       
@{ CYC_INC = 'PT24H' }                                                                                                
@{ ENS_MAX = 2 }                                                                                                      
                                                                                                                      
[scheduler]                                                                                                           
    UTC mode = True                                                                                                   
    allow implicit tasks = True                                                                                       
[scheduling]                                                                                                          
    initial cycle point = @(CYC_STRT)                                                                                 
    final cycle point = @(CYC_STOP)                                                                                   
    [[graph]]                                                                                                         
        @(CYC_INC) = """                                                                                              
        @[ for mem in range(0,ENS_MAX) ]                                                                              
            @{ idx = str(mem).zfill(2) }                                                                              
            test_@(idx)                                                                                               
        @[ end for ]                                                                                                  
        """                                                                                                           
                                                                                                                      
[runtime]                                                                                                             
    @[ for mem in range(0,ENS_MAX) ]                                                                                  
        @{ idx = str(mem).zfill(2) }                                                                                  
        [[test_@(idx)]]                                                                                               
            platform = localhost                                                                                      
            script = echo "Test @(idx) at cycle point ${CYLC_TASK_CYCLE_POINT}"                                       
    @[ end for ]

and I have defined my global.cylc file in two ways, where one raises an error and the other works fine. In particular, this workflow completes without issue when my global.cylc is defined without reference to the configure_cylc.sh as follows

[platforms]
    [[localhost]]
        hosts = localhost
        install target = localhost
        
[ install ]
    source dirs = ~/cylc/cylc-src

However, when I reference the configure_cylc.sh file using the Python os module within the EmPy template as

#!EmPy
    
@{ import os }
    
[platforms]
    [[localhost]]
        hosts = localhost
        install target = localhost
    
[ install ]
    source dirs = @(os.environ['CYLC_ROOT'])/cylc-src

I get the following error message when I run cylc play test-case

[cgrudzien@login01 cylc]$ cylc play test-case

 ▪ ■  Cylc Workflow Engine 8.3.2
 ██   Copyright (C) 2008-2024 NIWA
▝▘    & British Crown (Met Office) & Contributors

INFO - Extracting job.sh to /home/cgrudzien/cylc-run/test-case/run1/.service/etc/job.sh
--- Logging error ---
Traceback (most recent call last):
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/site-packages/cylc/flow/loggingutil.py", line 167, in emit
    self.do_rollover()
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/site-packages/cylc/flow/loggingutil.py", line 234, in do_rollover
    os.dup2(self.stream.fileno(), sys.stdout.fileno())
AttributeError: 'ProxyFile' object has no attribute 'fileno'
Call stack:
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/bin/cylc", line 10, in <module>
    sys.exit(main())
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/site-packages/cylc/flow/scripts/cylc.py", line 703, in main
    execute_cmd(command, *cmd_args)
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/site-packages/cylc/flow/scripts/cylc.py", line 334, in execute_cmd
    entry_point.load()(*args)
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/site-packages/cylc/flow/terminal.py", line 282, in wrapper
    wrapped_function(*wrapped_args, **wrapped_kwargs)
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/site-packages/cylc/flow/scheduler_cli.py", line 661, in play
    return asyncio.run(scheduler_cli(options, id_))
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
    self.run_forever()
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/site-packages/cylc/flow/scheduler_cli.py", line 435, in scheduler_cli
    ret = asyncio.run(
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
    self.run_forever()
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/site-packages/cylc/flow/scheduler_cli.py", line 644, in _run
    await scheduler.run()
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/site-packages/cylc/flow/scheduler.py", line 752, in run
    await self.start()
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/site-packages/cylc/flow/scheduler.py", line 741, in start
    await self.configure(params)
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/site-packages/cylc/flow/scheduler.py", line 422, in configure
    LOG.info(f"Workflow: {self.workflow}")
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/logging/__init__.py", line 1446, in info
    self._log(INFO, msg, args, **kwargs)
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/logging/__init__.py", line 1589, in _log
    self.handle(record)
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/logging/__init__.py", line 1599, in handle
    self.callHandlers(record)
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/logging/__init__.py", line 1661, in callHandlers
    hdlr.handle(record)
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/logging/__init__.py", line 952, in handle
    self.emit(record)
  File "/home/cgrudzien/cylc/Micromamba/envs/cylc-8.3/lib/python3.9/site-packages/cylc/flow/loggingutil.py", line 174, in emit
    self.handleError(record)
Message: 'Workflow: test-case/run1'
Arguments: ()
test-case/run1: login01.expanse.sdsc.edu PID=2495246

This error actually doesn’t prevent the workflow from executing, and it appears to be simply related to how EmPy overrides sys.stdout with a proxy object. I’ve tried to suppress this by, e.g., exporting the EMPY_NO_PROXY option but this hasn’t solved the issue. My questions are again, is this the correct approach to access a shell environment variable with EmPy for templating and, if so, should this become a bug report for the Cylc EmPy integration?

Thanks for your consideration.

Cheers,
Colin

Hi @cgrudz

Thanks for that bug report!

From some investigation, you’re right about the root cause - EmPy replacing sys.stdout.

In my tests, simply adding #!EmPy to my flow.cylc causes the traceback at start-up, but only in detaching (i.e. daemon) mode. After that the scheduler runs fine, but any stdout from the scheduler (normally only used by developers during debugging) does not get directed to the log but still appears in the terminal.

So it’s nothing to do with your accessing environment variables via EmPy, except that you aren’t using EmPy at all in your “good” global config.

It’s surprising that no one has noticed this before, but I suspect that’s because:

  • our automated tests typically run in non-detaching mode
  • most Cylc users use Jinja2 templating, not EmPy, for historical reasons (although every time I look at EmPy I think it might well be a better option - it’s basically just Python)

Anyhow, Cylc has EmPy pinned to version 3.4 for some reason. I’ve just test EmPy 4.1 and it does not exhibit the problem :tada:

You can force an EmPy upgrade with pip, but unfortunately version 4 requires a small code change to Cylc itself due to a change in the EmPy API. Here’s the Cylc code change:

Support EmPy version 4 by hjoliver · Pull Request #6248 · cylc/cylc-flow · GitHub

I guess your options now are:

  • live with the traceback until the next Cylc release (soon, no doubt)
  • or attempt to upgrade your current installation of Cylc for EmPy 4

(If you want to try the latter, just ask if you need any help with the process).

@hilary.j.oliver thanks for your rapid and detailed reply!

In my tests, simply adding #!EmPy to my flow.cylc causes the traceback at start-up, but only in detaching (i.e. daemon) mode.

This is an interesting point and something that I think I can definitely live with until EmPy 4.x is fully supported. I have noticed, in fact, that the error message that I mentioned doesn’t arise when I call EmPy in my example workflow only in the flow.cylc file and not in the global.cylc file. Specifically, with the global.cylc file defined as

[platforms]
    [[localhost]]
        hosts = localhost
        install target = localhost
        
[ install ]
    source dirs = ~/cylc/cylc-src

and with my flow.cylc file defined as

#!EmPy
@{ import os }
@{ CYC_STRT = '2022-12-23T00' }
@{ CYC_STOP = '2022-12-24T00' }
@{ CYC_INC = 'PT24H' }
@{ ENS_MAX = 2 }

@{ test_name = 'variable_scope' }
@{ test_case = 'empy'}
@{ test_conf = test_case + '/' + test_name }

[scheduler]
    UTC mode = True
    allow implicit tasks = True
[scheduling]
    initial cycle point = @(CYC_STRT)
    final cycle point = @(CYC_STOP)
    [[graph]]
        @(CYC_INC) = """
        @[ for mem in range(0,ENS_MAX) ]
            @{ idx = str(mem).zfill(2) }
            test_@(idx)
        @[ end for ]
        """

[runtime]
    @[ for mem in range(0,ENS_MAX) ]
        @{ idx = str(mem).zfill(2) }
        [[test_@(idx)]]
            platform = localhost
            script = cd @(os.environ['CYLC_ROOT']); echo "Test @(test_conf) index @(idx) at cycle point ${CYLC_TASK_CYCLE_POINT}"
    @[ end for ]

this doesn’t give me any error running

[cgrudzien@login01 cylc]$ cylc play test-case-empy

 ▪ ■  Cylc Workflow Engine 8.3.2
 ██   Copyright (C) 2008-2024 NIWA
▝▘    & British Crown (Met Office) & Contributors

INFO - Extracting job.sh to /home/cgrudzien/cylc-run/test-case-empy/run2/.service/etc/job.sh
test-case-empy/run2: login01.expanse.sdsc.edu PID=3270613

If the bug is only an issue in detaching (i.e. daemon) mode, and I can, e.g., use Jinja2 for my global.cylc and EmPy for my flow.cylc files without throwing error messages currently, I think this will suit my current needs. However, I definitely appreciate any consideration for 1st tier integration for EmPy as this would be familiar to many of my team members and colleagues who are Python users, and would allow us to use a single, familiar templating package.

Cheers,
Colin