Hi there,
I have been struggling with a problem for quite sometimes, and since I can’t find my way out, let’s see if you can help me.
In our workflow, one of the components is a bash script running a Fortran executable. I have this strange behavior: when I submit the script on its own, it takes 1h; if I launch it with cycl (and identical slurm parameters) it takes ~4h.
I have been investigating this for months (litterally), but I arrived at the conclusion that the only difference is in the cylc job shebang, which includes the -l
#!/bin/bash -l
[...]
cylc__job__inst__script() {
# SCRIPT:
my_script.sh
}
I removed this manually, and re-submitted the cylc job, and suddenly timing goes back to normal.
I saw that in the platform config file there is a parameter you can set
use login shell
Path:global.cylc[platforms][<platform name>]use login shell
Type:boolean
Default:True
But this is not what I need, since it decides wether to use a login shell or not for remote command invocation. While what I would need is to change the shebang in the job submission’s script.
While this is clearly not a cylc bug, but a problem on our side (the ‘-l’ opens a login shell, sourcing several env files, and there must be something wrong with one of them), I am wondering why do you use this option, and if there is any way of disabling it?
Many thans in advance,
Stella
Hi,
We use the -l option for job submission as well as for some of the subcommands which Cylc launches.
Some Cylc deployments depend on users being able to set environment variables in their shell profile files (in the extreme, they might be adding Cylc into the $PATH so that the cylc command can be found when called).
It also helps us to avoid situations where a command works for a user when run in a terminal, but fails when run by Cylc (by making the two equivalent).
There is no option to turn -l off (it’s hardcoded), sorry.
To help with debugging, you can generate a summary of the differences between the login and non-login environments by running these commands:
$ env -i HOME="$HOME" bash -c 'env' > no-login
$ env -i HOME="$HOME" bash -l -c 'env' > login
$ diff no-login login
Then try adding in each of the changes in one by one until the script slows down to determine the problematic entry. Though at an hour a run that’ll take a while 
Cheers,
Oliver
Hi Oliver,
thanks for your answer!
Unfortunately, I already went through all the diff env you can imagine (three months are long
), and of course there are differences, it’s just that I can’t point out which one is the culprit (because probably it’s just a mix of them).
Many thanks anyway and have nice holidays,
Stella
Hi @sparonuz
@oliver.sanders has pretty much said it all, but just to add you could try disabling the option by tweaking the Cylc source code slightly - let us know if you need advice on how to do that.
It is possible that we could make it optional, although we’d need to think through the implications (Oliver has give some reasons why we have it, already).
Also, probably stating the obvious, but if you can generate a minimal job script that works both ways (good and bad) without Cylc being involved at all, that might help isolate the problem.
Hilary