I am launching a climate model monitoring suite from a Rose suite ‘post-script’ (in the ‘housekeeping task’) at the end of each cycle point…
post-script = "cd /my/suite/directory ; rose suite-run --no-gcontrol"
This works fine but doesn’t show up in my
cylc scan output even though it is being launched from the correct machine.
This also means that if the monitoring suite is still running from the previous cycle point then it will start running again when it shouldn’t.
Any ideas on…
1- why this might be?
2- how to only launch the monitoring suite if it isn’t already running?
Thanks a lot!
I’ll post how I did it in a sec…
I put the commands that I wanted to run in a Bash script…
if [ ! -f "$FILE" ]; then
echo "$FILE doesn't exist, running monitoring suite my-suite"
rose suite-run --no-gcontrol
if [ -f "$FILE" ]; then
echo "$FILE exists and so my-suite is already running!"
… and then called it from the post-script as follows…
post-script = "cat /the/script/above |ssh me@hostname-of-the-machine"
The script uses the fact that
~/cylc-run/my-suite/.service/contact only exists when the suite is running.
Took a couple of goes but I got there in the end!
A couple of comments:
You’re running a “sub-suite” - which is a suite inside another suite’s task.
If the sub-suite is supposed to be launched in every cycle, the usual thing to do is start it with
cylc run --no-detach. This stops the suite server program from daemonising (detaching from the parent process, sort of) so that the launching task will remain in the running state so long as the sub-suite is running.
However, if you’re just launching a one-off monitoring suite and checking if it is still alive in every cycle (and relaunching it if not) then a normal detaching suite is fine (in which case the top-suite task will be running only while launching the sub-suite, not for the duration of the sub-suite run). But then I’d still ask: why do you have to check the sub-suite is running in every cycle? (is there something wrong with it so that it often dies unexpectedly?). And it is very strange that the sub-suite is not seen by
cylc scan - are you sure it really is running when it is not seen? And finally, you have figured out a valid way of determining if a suite is running (check for a contact file) but it isn’t entirely foolproof: if the suite got killed it might leave behind a contact file. Better to use
cylc ping my-suite, which will return success status if my-suite is running, otherwise an error message.
p.s. the sub-suite may not show up in
cylc scan if you are somehow running it on an “illegal” host (not one of the designated cylc nodes on the HPC).
cylc scan will only look at configured hosts by default, but you should still see it with explicit
cylc scan HOSTNAME.
thanks for all the additional info!