Case for Cylc to switch away from SQLite-only

I’ve moved this comment to its own thread to avoid cluttering the UI Feedback thread with unrelated discussion.

My original comment:

The UI design [for Cylc 8] looks fantastic for how we monitor our run and I’m very excited for it. That said, I don’t like seeing “SQLite” in the architecture diagram… I would suggest switching to an ORM (such as SQLAlchemy) or at least allow configuration of a proper DB so we can easily tap into the Cylc databases without relying on a lot of disk I/O to read a bunch of SQLite files. That would really help us integrate Cylc into our end-to-end monitoring.

Edit: and yes, I’ve seen bug #2800 on github and I disagree with the resolution! :slight_smile: SQLite incurs an I/O penalty when you’re querying 25+ suites on say GPFS, or at least our variant of GPFS. LustreFS wasn’t much better… Storing the cylc-run dir on a local disk was helpful but we don’t like the risk of that box dying and taking our entire run state with it.

Reply from Hilary (Cylc users: what do you think about the current, & proposed new, Cylc (G)UI?):

Thanks, it’s encouraging to hear people excited for the new UI.

On the database side, perhaps bump this to another thread to avoid sidetracking the GUI discussion but quickly…

I’m curious as to why you are querying the DBs of multiple suites. Are you trying to gather stats for multiple workflows to create a “total system” Cylc GUI? If so it would be great to capture these requirements for Cylc UI development. This sort of usage may intercept with some of our plans.

Otherwise:

Cylc provides event “hooks” for suites (e.g. startup, shutdown) and tasks (e.g. started, failed). Some use these hooks to drive end-to-end monitoring, I would have thought this would be the best solution (push vs pull).
Alternatively the Cylc UI Server in the new architecture may provide a good DB proxy for your usage. The query language is GraphQL rather than SQL, through the UI Server you could write requests which gather data from multiple suites.

There are a few things I think using a central database would help with:

  1. It could improve suite_state xtrigger performance. This is the primary reason for my raising this concern. When we’re running 20+ suites in Cylc 7.8.3, the suite server (the actual machine Cylc is running on) becomes nearly unresponsive unless we increase the suite_state check interval to an unhelpfully large number (5 minutes or more). Our run is very time sensitive and 5 minutes can be the difference between on time and late.
  2. It would allow external reporting tools easy access to Cylc metrics. This would take the workload off the Cylc community to try and write something from scratch. Just give us access to the data and let us use our own tools (Splunk, in our case). Otherwise we’re probably still going to have to write wrappers, no matter now nice your solution might be. Getting software approved for our use is not fast or easy so using something standard like a PostgreSQL database lets us piggy back on things we already have.
  3. It would allow Cylc to be less centralized. We could run the UI and server components in a cloud web service, have the database on a cloud DB, and be able to run jobs literally anywhere with the run states being well protected and backed up. It would also keep us from having to maintain a separate run database based on event triggers defined in each and every suite for our run metrics.

Edit: I reread Hilary’s response and to speak directly to his two suggestions:

Use events to push state to monitoring tools: We would just use whatever UI Cylc provides for monitoring. My suggestion is based more on analytics which run a fixed schedule and may not need to be that up-to-date.

Use GraphQL interface of the Cylc UI server: This might be a possible solution. We’d prefer to just tap into PostgreSQL but we could write a scraper in GraphQL to ingest data from the Cylc UI server…

Hi @russbnavy,

(aside: you seem to be atttributing @oliver.sanders reply to me - not hard to do, as his first name is equal to my last name!)

If you’re referring to the Cylc 8 Architecture diagram the SQLite noted there is actually not the suite databases, it’s a small database used by the Hub component (JupyterHub in fact) to keep track of logged in users. The diagram doesn’t say much about the back-end suite server programs because they don’t change “much” from Cylc 7 (apart from migration to Python 3 and an entirely new network layer etc.!).

But in any case you’re still right that we have not been planning to move away from SQLite for suite databases. Why we use sqlite: zero maintenance cost (for cylc installers, admins, and users) - it (usually!) “just works”. Plus decentralization - every suite server program maintains its own small DB - is easy with sqlite.

Your post is timely though as there has been some discussion about this (in our monthly project video conference yesterday, for example). The context is occasional problems with sqlite that we do not understand, rather than performance for analytics. There has been one report of unexplained database locking corruption? (from @jinlee at BOM I think?) and we’re currently seeing intermittent “unable to open database” errors on a new platform at NIWA. These may be something to do with use of sqlite on shared filesystems, which the docs warn about, but it sounds like that should only be dangerous for concurrent access by multiple processes - and we don’t have any of that happening for private databases so it is still mysterious.

You’ve added some weight to the idea that we should consider alternative DBs. Unfortunately
Cylc 8 (Python 3, web tech, and web UI) has to remain the highest priority for us at the moment though (impending death of Python 2 and PyGTK etc.!) But perhaps you or your colleagues would like to pitch in on the DB front, since it is clearly very important to your site?

Hilary

Actually I think @oliver-sanders did mean analytics here, rather than live monitoring. The idea is you can use event handlers to push suite event data (from many suites) to a central DB of your choosing. Then you can use that DB for analytics. You could even use it for inter-suite triggering, with a custom xtrigger function that knows how to read the central DB.

This is quite like what Altair Engineering have done at BOM with Cylc. Suites push events to a central message broker, and thereby to a central DB for full-system analytics. And inter-suite triggering is done via custom xtrigger than looks at the message broker for “data availability” events. Cylc’s own suite databases are then only for internal use (by Cylc, e.g. for loading state at restart). The suites and the DB can be located anywhere, so long as they all see the message broker.

Can you give us any more information about how much inter-suite triggering is going on here? (how many xtriggers on how many tasks etc.?) and what exactly is causing the load? (is it definitely reading the target suite DBs during xtrigger function execution?).

For an easy fix, have you considered running your various suites on a small pool of VMs that see the same filesystem (instead of just one VM)? Cylc now has built-in support for that mode of operation (and even for self-migration of suite server programs to other VMs for scheduled maintenance etc.). Or do you think it is filesystem load that that is slowing down your server? (Seems unlikely for a bunch of sqlite reads, doesn’t it?)

Ah, sorry @oliver.sanders! I misquoted…

We were actually considering using xtriggers and a database to replace the suite_state slow-down we were seeing for our current transition (and for other, custom triggers for during the transition). The slow-down wasn’t just for analytics; it was literally everything on that suite host, including Cylc console commands.

I’ll look at using event handlers as well for job analytics, though I would still strongly prefer direct queries that don’t require extra code on our part. The GraphSL suggestion was a good one and I look forward to playing with it in Cylc 8 once it’s released!

We were running 26 suites, all pinging a single external suite for task completion via the suite_state xtrigger every 5 seconds. I assume it’s I/O related because when I increase the interval the general system slow-down is reduced (the console commands are responsive but the GUI is still very laggy). The same behavior is seen when replacing the suite_state xtrigger with a wall_clock xtrigger.

Changing the cylc-run location to a local disk also helped but again the GUI was performing poorly.

We got the same results running on our local HPC as well as a remote HPC that uses a different distributed filesystem, so it seems (at least initially) that using suite_state in many suites at once, on distributed filesystems, is not great.

I’ll have to talk to our systems folks to see if they’d be willing to support a distributed solution but I count on it so I likely won’t be able to test that solution.

OK I can imagine that a lot of suite-state xtriggers repeating at 5 second intervals might generate high load. You have to be aware that each xtrigger check involves running a small Python program in the suite subprocess pool (which in the case of suite-state triggers is going to read the target-suite sqlite DB).

If you have a lot of these running concurrently you might have to increase the maximum size of the subprocess pool for each suite (else you’ll see tasks delayed in the “ready” state for some time before job submission, while they wait for the subprocess pool to clear) … at a cost of higher load on the machine (which, if high enough, will of course cause everything to slow down).

It sounds to me that you have a large enough number of suites with enough action going on in them that you really should be using a small pool of Cylc VMs rather than just one. It is actually just as easy to use Cylc like this as it is on a single VM. There’s just a small amount of global configuration to tell cylc run where to put new server programs at start-up (based on simple load metrics on the pool of hosts) and where cylc scan and cylc gscan should look for running suites. Then it is pretty much transparent to users at run time.

By the way, the primary purpose of the Cylc suite DBs is to record workflow status snapshots to allow restart from previous state - i.e. internal use. I suppose we only have ourselves to blame for providing DB-reading suite-state xtriggers for inter-suite triggering, but nevertheless these should be used heavily with caution! (as it seems you’ve discovered). This is one of the reasons why the Altair/BOM solution goes through an intermediary message broker and a central DB (but that functionality is not built in to Cylc, at least not at this stage).

I think it’s important to establish what is causing the system to become unresponsive. I’d be a little surprised if the I/O was causing this - are you sure it’s not memory? Presumably you’re seeing very high sytem load during the slowdown?

1 Like

Sorry for the absence (been in training).

I’ve decided to go the BOM route and implement a central DB and utilize task events to do the triggers we want. This will support transition to a messaging mechanism which is planned in the near future. It’s also the path of least resistance as getting more VMs spun up is an uphill battle here for a variety of internal reasons.

Regarding the question of if it’s truely I/O, @dpmatthews, you’re right. I don’t have any direct evidence. I’ve only observed that using many and frequent suite_state triggers slows the machine that was hosting all our suites way down. It could very well be network or something else.

@russbnavy - note that each task event that you attach a handler to, also results in an executable (a script or program of your choice, to update your central DB in this case) being executed in the Cylc server program’s subprocess pool. So that will naturally create some load on the machine too, and you may need to consider increasing the default subprocess pool size - depending on how many of these processes there are, and how long they take to execute.