Hi all,
Nooby query on expected behaviour of a script’s stdout, and how this interacts with the parent cylc task’s own stdout, as I’m running into something which looks like the two streams are getting slightly mixed up.
Have had a quick look through cylc docs / tutorial / discourse, but haven’t spotted anything - do please point me in the right direction if I’ve missed something!
I’ve got a workaround, but wondering if there’s anything else clever I could be doing. Or indeed if having something “clever” is something which could be an enhancement / documentation improvement.
Details below - happy to provide any more needed!
Thanks,
Edmund
Goal
I’m basically trying to look at how the contents of a model directory get changed as various stages of the model get run. This is for debugging - to isolate the stage where something gets created in the wrong place, and hence work out why.
Setup
CYLC_VERSION=7.8.3
ROSE_VERSION=2019.01.2
I’m tackling my goal by running a (parametrised) task locally (submitting via background) - this runs a tree
command on the model directory - a (read-only) NFS export of the HPC where the model is run.
Specifically, my suite.rc contains a family and task along following lines:
[[LOCAL_JOB]]
# Settings for job to be submitted on local machine
[[[remote]]]
# set current host (where "rose suite-run" cmd was invoked)
host = {{ ROSE_ORIG_HOST }}
[[[job]]]
batch system = background
[[model<stage>_list]]
inherit = LOCAL_JOB
script = """
set -e
echo "Listing after ${CYLC_TASK_PARAM_stage} stage:"
tree -Difanqpugs --dirsfirst ${LIST_DIR}
"""
[[[environment]]]
LIST_DIR = "{{ HPC_HOME_NFS }}/{{ MODEL_DIR }}"
Result
As I’ve not directed the script’s stdout anywhere separate, so this is going to the job.out.
In the job.out, this results in something like the following, where the cylc task’s <utctimenow> INFO - started
message has got mixed up (interleaved) with the tree
output from the task’s script:
Suite : mi-su173
Task Job : 20190501T0000Z/model_copy_list/01 (try 1)
User@Host: me@mypc
Listing after copy stage:
/hpc/path/to/model
[drwxr-xr-x me users 4096 Jun 4 2019] /hpc/path/to/model
<snip ~50 lines>
[-rw-r--r-- me users 34497 Jun 2 2019] /hpc/path/to/model/src/code.f90
[-rw-r--r-- me users 1158 Jun 2 2019] /hpc2020-03-07T14:10:45Z INFO - started
/path/to/model/src/Makefile
[-rw-r--r-- me users 1906 Jun 2 2019] /hpc/path/to/model/src/morecode.f90
<snip loads>
500 directories, 5000 files
2020-03-07T14:10:49Z INFO - succeeded
I’m guessing this mixing of the tree output with the task messages is because the two stdouts (task and script) aren’t explicitly {un}buffered or otherwise handled to avoid this sort of mixing.
And the size of the tree stdout here (/some other property of tree/background/other) may also contribute.
Workaround 0:
I was fine with this tree output being in the job.out: I’m just doing a visual inspection (and don’t expect to need to do this much in future - just as I debug this case), so could ignore (irrelevant) task message timestamp differences when I come to review diffs later.
I can definitely be lazy and do the same “just ignore it” approach with this stdout “collision” - inspect visually / hand-edit if really needed (there aren’t that many stages to diff).
Workaround 1:
I’m probably going to go for a slightly better approach - put my script = """<as above>"""
contents into a list.sh
script in the suite’s bin
dir, and within that list.sh
direct the tree output into its own file*.
This to better separate concerns: effectively treat the tree output as “data” worth separating from stdout (as it basically is for my purposes). A viewpoint with helpful side-effect that it avoids this sort of collision!
* and in process expand out all those opaque tree flags!
Questions
What I’m wondering is if there are alternative approaches I could have used?
E.g. could I redirect the script stdout within the [[model<stage>_list]]
task in suite.rc
? Something like:
script = list.sh 1> ${CYLC_TASK_WORK_DIR}\list.stdout 2>${CYLC_TASK_WORK_DIR}\list.stderr
Can’t see anything in docs/tuts indicating this sort of script stdout/err redirection within suite.rc is possible - I may be missing something though.
The nearest I can see is the note covering PBS stdout in https://cylc.github.io/doc/built-sphinx-single/index.html#task-stdout-and-stderr-logs, but don’t think this applies to my (background-submitted) job here, unless I’m (again) missing something?
Alternatively, is there some cylc-level setting I can set to handle the buffering of the script/task stdout, such that if script is writing to stdout, I can avoid these sorts of collisions between the script and task stdouts? E.g. to flush the task’s own stdout, so that the <utctimenow> INFO - started
message always comes before* the script’s stdout in the job.out?
* or after - am not that fussed as long as it’s deterministic!
Or is there another standard way of handling this (other than my workarounds!) which I’ve missed?
Or is this something which isn’t currently provided / is but isn’t documented?