Module load/swap in pre-script breaks python module import in flow.cylc

hi there,

in the workflows at niwa we need to manually assign the Slurm account code inside the workflow definition, for example…

#!Jinja2
{% from 'hpc_project' import get_hpc_project %}  # <---- import the function
.
.
.
[runtime]
  [[root]]
    script = sleep 10
    [[[directives]]]
       --account = {{ get_hpc_project() }}  # <---- find & print project code!

this has always worked fine for me and others but i have just found out that, somehow, module load or module swap commands are breaking this.

for example, in one task with this particular pre-script…

pre-script = """
        module swap NeSI NIWA
        module load PrgEnv-cray craype-x86-skylake cray-netcdf FCM
        module swap PrgEnv-cray PrgEnv-intel/6.0.10
        module swap intel intel/19.1.3.304
        module swap cray-netcdf cray-netcdf/4.9.0.3
        module swap FCM FCM/2019.09.0
        module load eccodes/2.8.0-CrayIntel-23.02-19
        module load shumlib/2018.06.1-CrayIntel-23.02-19-no-openmp
        """

i get this error in the job.err immediately after it starts running…

[FAIL] hpc_project
[FAIL] File /home/williamsjh/cylc-run/u-df773/run37/flow.cylc
[FAIL]   #!jinja2
[FAIL]
[FAIL]   {% from 'hpc_project' import get_hpc_project %}        <-- TemplateNotFound
2024-04-30T08:18:07Z CRITICAL - failed/ERR

but with no pre-script it starts to run fine. in addition, the cylc view -j [workflow] does indeed show the correct account code!

@hilary.j.oliver fyi this is the issue we were having just before easter.

it seems as though the module commands are somehow resetting the environment so that the get_hpc_project thing can’t be imported.

any ideas appreciated!

thanks

jonny

If this is going to job.err, then I presume you are running subworkflows (i.e. one workflow which runs another, or even itself as a job)? Also, those [FAIL] lines suggest Rose may be involved.

You will need to provide the whole script (and Rose app if present) for us to diagnose.

But yes, module commands do change the environment, so you might want to try running these module commands in a terminal and see whether you can still import this Python function.

However, also note that changing the environment that the Cylc scheduler is run in does not change the environment that jobs are run in [1], so if this is a subworkflow example, I’m not sure what these module loads are achieving.

[1] Except a few niche examples which you shouldn’t rely on.

1 Like

Pretty sure there are no sub-workflows involved here. But I can see why you asked if there were.

@jonnyhtw - your job.err indicates that your flow.cylc is evidently being parsed again in the task job, on the job platform. That’s not supposed to happen! It gets parsed by the scheduler once, at start-up, not by the jobs, so use of environment module commands in job scripts cannot have any effect on that. So it looks to me like your task is doing something it should not be doing. I’ll take a look with you on-site tomorrow if possible.

1 Like

If you’re running a Rose application, it might be worth checking the cylc-rose version (run cylc version --long).

An issue was fixed in version 1.3.3 which could potentially cause something like this: platforms: use processed workflow config by oliver-sanders · Pull Request #302 · cylc/cylc-rose · GitHub

1 Like

Ah, that rings a bell :man_facepalming:. I even commented on the bug fix PR that it fixed exactly this problem. Maybe it was another user, not @jonnyhtw, that hit the bug back then.

1 Like

thank you very much @oliver.sanders and @hilary.j.oliver! sounds like it should be a relatively straightforward fix with any luck

i’ve spoken to @hilary.j.oliver offline so will report back when i’ve had a go!

all the best

it was another user, but i had hand-edited the code to get round this so i could concentrate on something more pressing at the time so i’d kinda side-stepped it!

1 Like