Xtriggers functionality


Some functionality I created in the xtriggers I’ve created is to only start really running it after a certain interval after the cycle point, and also to end after a certain time period (whether that be time from the first time it ran, or time from the cycle point). It works fine, but I feel some of this could be baked into xtrigger syntax and supplied by default by Cylc and could use similar syntax to cycles in graphs.

For example, something like MY_TRIGGER = something(...):PT2M/PT5H/PT6H could be poll eveyr 2 minutes starting 5 hours after the cycle point until 6 hours after the cycle point. If only one interval is supplied, its the polling interval, if two options, polling interval and delay start.

If its an integer cycling suite, then it could be that only two items are allowed, PT2M/PT5H could then mean poll every two minutes until 5 hours after the first time it polled.

I’m not sure the best way the return information could be handled in Cylc for downstream tasks, my xtrigger/decorator approach forces the success to be true if the timeout period has expired, and the result dictionary would have a 'success':False key/value pair so tasks can know that whilst the task was triggered, it was a timeout that caused it. Some other forced in variable could be used instead though (e.g. 'CYLC_XTRIGGER_TIMEOUT'=True which tasks could check for/value of).

Why did we need this sort of functionality?

  1. Delaying start - if you’re polling for files in a remote system (delivered there by another system which cannot talk to Cylc or any message broker you have access to), then you may not want to start until closer to the time you expect the data to arrive if its a routine schedule. Its low network load, but it is not required load.
  2. Timeout after an interval from a cycle starting - We have cyclic ondemand systems. They can only be activated during certain time windows each day. Whilst you can have a task/trigger looking for the request and another as a wallclock which triggers a task which suicides the other one, we felt that was unnecessary graph bloat when it could be handled simply in an xtrigger and a small if block in a script.
  3. Timeout period after first poll - this was just me being paranoid. If you’re again, polling a remote system, and something is wrong with the polling (e.g. maybe SSH keys have expired), you won’t know there is a problem as the xtrigger hides the information. So, expire the xtrigger periodically so the following task can check the access to the remote host is still there, and if not fail and alert support staff.

Whilst on this topic, I don’t know if/how xtriggers have changed in Cylc8, but obviously the | operator would be nice to have available, but also being able to do task[-P1] => @my_trigger => task would be useful too. In this case, what I’m trying to suggest is that task should only run if the preceeding task has finished, but, the xtrigger which triggers task should also only run once the previous task has finished. Why? Ondemand systems and making sure the xtrigger doesn’t accidently pick up on the same request as the previous cycle.

Anyway, just some thoughts, it would be good to see what others think, or if some of the issues we’ve worked through have already been resolved/improved in Cylc8.


I don’t know if/how xtriggers have changed in Cylc8

Xtriggers have not really changed at Cylc 8 (other than Python 2 → 3).

We do have some ideas for improving Xtriggers but do not currently have the resource to work on them. Here’s a quick summary of some of the things we currently have in mind:

Note: Some of the GitHub issues date from pre-Cylc 8 and are a little out of date.

  • Convert Xtriggerss to “async” functions.
    • Rather than calling Xtriggers via subprocesses they would be called via Python’s “asyncio” from a thread of the Scheduler process.
    • This will make calling Xtriggers much more efficient.
    • This will also open the door to “push” Xtriggers rather than pull ones.
    • I.E. Rather than repeatedly polling the existence of a file we could have a long-lived xtrigger which uses OS level polling e.g. select or kqueue via async libraries such as aiofiles or pyuv.
  • Plugin Framework & Xtrigger plugins
    • This would make it easier to install Xtriggers and allow Xtriggers to specify Python dependencies.
  • Misc. Xtrigger enhancements
    • This includes exgtending the OR (|) syntax to Xtriggers.
    • This would provide a timeout mechanism e.g. @trigger | @wall_clock(PT1H) => task which would essentially add a one hour timeout (as an offset from the cycle point) to the @trigger trigger.
  • Allow Xtriggers to be reset
    • I.E. you should be able to reset or cancel an Xtrigger from the web GUI and CLI.
  1. Delaying start

This seems like a good idea to me. It could be that extending conditional syntax into the graph provides a mechanism for this by allowing a custom Xtrigger to be combined with a wall_clock trigger so that polling only begins after a configurable offset from the cycle point.

  1. Timeout after an interval from a cycle starting

I don’t quite understand this use case, might need a little extra information.

Cylc 8 brings us “graph branching” which largely replaces the need for suicide triggers which might be relevant.

  1. Timeout period after first poll

Hopefully conditional syntax e.g. @trigger | @wall_clock(<offset>) will provide a solution for this. I think we would need to make Xtriggers “resetable” before attempting this.

(e.g. maybe SSH keys have expired)

Interesting. I think for this case rather than returning True/False the Xtrigger could raise an exception so that the Scheduler knows it has failed.

This may require converting Xtriggers to “async” functions so that the Scheduler is able to intercept this exception.

This may also require making Xtriggers “resetable” for failure recovery.

Getting a little more futuristic there may be the potential to make Xtriggers more task-like by extending task outputs to Xtriggers something like this:

# note the question mark allows Cylc to follow one path OR the other
# without the use of suicide triggers
@trigger? => success_pathway
@trigger:fail? => fail_pathway

We are currently fully committed to getting Cylc 8.0.0 out of the door, then developing the web GUI so won’t be able to turn our attention to Xtriggers in the immediate future but we’re always open to collaboration.

Let me know what you think.

1 Like
  1. Delay start

Another use case for this was just raised. There are some external files we get which always have the same filename. You can’t poll for file existing, you can poll for a file existing and modified within the last time period, but if that poll started shortly after the previous one ended, it will just succeed. So you need to delay polling until close to the expected arrival time to avoid succeeding incorrectly.

  1. Timeout after an interval from a cycle starting

The extension to add | can provide this functionality. I’m just wanting to avoid messy graphs like below.

@wallclock => timeout => !conditions_met
@trigger => conditions_met => !timeout
timeout | conditions_met => start
  1. Timeout after first poll

This one I think of as similar to the previous one, but for integer cycling suites. I don’t think you can use a wallclock for integer cycling can you? If you can, and it implies offset since cycle started, that would work for this use case.

I think one member of my team just added functionality to my remote file polling xtriggers to capture things like broken SSH keys or remote hosts being down, so the main other purpose for this that I had was more a testing/CI type thing. Imagine you have a suite which runs very rarely, but when it does, its of critical importance. In my mind, it would be useful to ensure the suite is exercised regularly, using the full operational set of systems and pathways. So, I would like the integer cycling suite to force itself to run, using canned trigger files and not delivering data externally, to make sure nothing weird has caused it to stop working. The only way I can think to do this would be to have it force itself to cycle via a timeout mechanism.