Continuous integration available in cylc?

Hi All,

I’m looking into prospects of using cylc for ecological forecasting (including what if scenario).

Ecological forecasting is new emerging field. Some of the workflow required is not so much triggered by dates but by updates to databases or changes in code like in continuous integration (CI) tools. A paper describes an example framework of ecological iterative forecasting here: https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.13104. Travis is the CI being used for that framework.

A search online for cylc and continuous integration came back with: “no continuous event watcher in cylc" (https://cylc.github.io/doc/built-sphinx-single/index.html).

Question:

  • Is it feasible to set-up cylc to have a similar workflow to that of the ecological “iterative” forecast framework above?
  • Including re-running splat for re-compiling code when it’s updated in repository?
  • If not, do you know if there are plans to include this in the future?

Many thanks.
Celine

1 Like

Hi Celine,

Interesting question! I will try to find time to read your reference sometime soon, but at first glance the workflow is very simple by Cylc standards, but (as you note) the critical thing is the event-driven and non-date-time-cycling nature of it.

So, for arbitrary iterative workflows Cylc supports integer cycling - no need to have a date-time cycle.

And, with Cylc’s “xtrigger” (external trigger) mechanism you can trigger tasks off of anything in the external world (database updates, code changes, or whatever) so long as you can write a Python function to check for the condition or event. The function should return True or False (condition satisfied or not) along with an arbitrary dictionary of data to pass to dependent tasks. Cylc will then call it repeatedly on a configurable interval until satisfied, at which point dependent tasks will trigger.

Technically xtriggers are a “polling” solution (repeated checking on a regular interval), which is generally considered to be less elegant and less efficient than “event-driven” architectures whereby (in the Cylc context) the task would just block until the event occurs, without any explicit regular checking going on - that’s what the “continuous event watcher” note in the user guide refers to. However, polling has the advantage of being completely general and not requiring any modifications to the other end of the system (i.e. you can do it all within your workflow, without needing to modify the external system to send “events” of some kind that Cylc understands). Note also the “old-style external triggers” which the User Guide says (wrongly, as it happens) are deprecated by xtriggers, do not involving polling, but they do require the external system to call a Cylc command to notify the workflow server program of the event.

(As an aside, a NIWA colleague talked with me yesterday about using a non-polling xtrigger: the Python function should just never return until the condition is satisfied … this is untested as yet, but it should work so long as the Cylc subprocess pool timeout is configured to be long enough for the function to return).

Long story short, I believe what you’ want could be done pretty easily with Cylc, but perhaps we’d need to look at the details. It would be easy enough to mock up a workflow to test the concepts involved.

Hilary

To add to what Hillary said, I’m aware of a couple of Cylc suites which are being used for CI purposes, reacting to code changes and running tests.

Travis is a pretty simple workflow engine, it supports integer cycling where cycles are triggered by external events (pull requests, code events, chron), and simple dependencies between build stages. This is all quite simple in Cylc.