Hi all,
I thought it would be good to get some advice from around the world on this topic. How do other people do on-demand models in Cylc, where an external something (message? file?) triggers the suite to run?
First, I should say I (so far) think of two types of on-demand systems:
- Completely on demand, can run anytime anyday, and it is not linked to any particular cycle point
- On demand within a window, and tied to a cycle point/basetime
Type1
I can see several ways of handling this, depending on how the infrastructure and security is setup
- Cycle that runs every X minutes. First task checks for a file in a remote location. If file exists, cycle runs, otherwise it suicides the rest of the tasks. I presume this needs to be a datetime cycling suite, as I don’t think integer cycling suites can have time triggers?
pro: Good visibility that it is doing something. Uses a cyclepoint which is fairly standard for all other suites we use
con: more overhead, arbitrarily tied the suite to a cyclepoint which has no meaning - Integer cycling suite. Xtrigger which polls for a file or event from a message broker. Same xtrigger will just continue indefinitely until one is found. This could take days, weeks, or months to be satisfied. I’m working on the assumption xtriggers don’t expire (it would be nice if you could make an xtrigger expire by the way, but the implementation of that is a bit awkward to do well - I certainly don’t have a vision of how to do that in a generic way).
pro: less overhead in the suite
con: limited visibility to people monitoring that the suite is doing anything - Integer cycling suite. Launches a task somewhere which can check the status of something else, fail/retry approach.
pro: I can’t really think of any
con: I don’t think you can fail/retry indefinitely. I personally really dislike the fail/retry mechanism for polling as it just makes the suite always look like its in a state of failure
Are there other approaches I haven’t considered? Have people tried different approaches like above and have thoughts on which worked best? I have used approach 1 before, but I do find it clunky.
Type 2
Slight variations on the above. All instances assume a datetime cyclepoint suite.
- One cyclepoint at each basetime that the suite is linked to. Long lived polling task looking for file/event.
pro: Really easy to implement
con: Sysadmins tend to hate long lived polling tasks - One cyclepoint at each basetime that the suite is linked to. Fail/retry mechanic to poll for files. If you alert upstream about no triggering file being sent (in case they forgot), add in a task with a wallclock which gets removed from the suite if it isn’t needed.
pro: Simple to implement. Won’t annoy sysadmins too much
con: I don’t like fail/retry mechanics for polling - One cyclepoint at each basetime that the suite is linked to. Create an xtrigger to poll for the event/file. Add in an expiry offset to the xtrigger, which will succeed after a certain time with a dictionary construction that the next task will check to determine whether to run anything in the cycle or not. Add an alerting task as above.
pro: Polling gets masked behind the scenes. Shouldn’t annoy sysadmins too much.
con: I can’t think of too many off of the top of my head. Assumes access from the Cylc server to the message or file. A bit more awkwardness in the xtrigger, but not much.
So, from the ideas I’ve got above, 3 makes the most sense to me, then the other two are a bit of a toss up.
Ok, that is long enough I think. Any advice that people have from their experience would be greatly appreciated.
- Tom