cylc-flow 8.4.2 pyh3a29b38_0 conda-forge
cylc-flow-base 8.4.2 pyh707e725_0 conda-forge
cylc-ui 1.0.0 ha770c72_0 conda-forge
Hi cylc wizards,
This is not a problem with cylc whatsoever, but this community is best equipped to answer it. Consider 3 failure modes:
- cylc suite stops;
- cylc daemon dies;
- machine dies;
For #1 the solution is within cylc, and cylc has great capabilities to set notification triggers to catch these events.
For #3 “the machine went down and nobody noticed” is an edge case unless you are using disposable VMs in a cloud environment, and in any case it’s easy to make/find a dashboard to ping the machine.
I have been having issues in the #2 category. I don’t know the root causes, and they aren’t necessarily important– what matters is it can and does happen that the cylc daemon is not running, so cylc produces no notifications, and how can we notify… someone… to intervene when this happens?
I’ve heard of people implementing solutions using cylc ping commands in cron jobs, which sounds clunky but it’s better than nothing.
Does anyone here have an opinion on the “right” way to check if cylc is running and set off alarms if it’s not?