External triggers (xtriggers) and performance

Cylc allows you to trigger tasks off of arbitrary external conditions: just write a Python function to check your external condition, and the workflow server program can call it repeatedly until satisfied before triggering dependent tasks - see the User Guide for details.

(Yes, this is a polling mechanism. Push-based event-driven triggering might be more elegant but it requires the external system to know about Cylc and your specific workflow.)

From recent evidence it is not hard to throttle your workflow with heavy use of xtriggers. Cylc executes job submissions, event handlers, xtriggers, and so on, asynchronously in a size-restricted pool of subprocesses (xtrigger functions are automatically wrapped into minimal Python programs to allow execution in a subprocess). The maximum number of concurrent subprocesses is configurable but defaults to only 4. And the xtrigger calling interval - also configurable - defaults to only 10 seconds.

So, if you have (say) 50 distinct xtriggers in a workflow, that results (by default) in Cylc trying to execute 50 Python programs every 10 seconds, with 4 at once at best. Job submissions and event handlers are not normally called repeatedly like xtriggers, but they are queued to the same subprocess pool, so you may find your workflow throughput dropping significantly. The most obvious symptom is delayed job submission with tasks appearing to get stuck in the “ready” state for a while (“ready” means ready to run and queued to the subprocess pool).

If this happens, you can restore sanity by:

  • increasing xtrigger intervals, for less frequent xtrigger calls
  • increasing the maximum subprocess pool size, e.g. to the number of CPUs
  • (and if really desperate you could replace some use of built-in xtriggers with persistent polling tasks that execute on a remote host)

Informatively (I hope!),
Hilary

2 Likes