Behaviour of a long-running Cylc suite

Hello,

At our site we run Cylc 7.8.3. One of my colleagues has been running a suite which has thousands of cycles. He has been running it without stopping it for a long time: 2-3 years. We noticed recently that the suite was using up unusual percentage of CPU time. So much so that our VM where Cylc runs came to a near standstill. Is there any reason why a long-running suite would need to use so much CPU?

Thank you in advance.

Regards,

Jin

without stopping it for a long time: 2-3 years

Impressive!

That’s the longest I’ve heard of a single suite run lasting for.

Is there any reason why a long-running suite would need to use so much CPU?

There’s no good reason for increased CPU usage, though, for such a long-lived scheduler it is possible that it may have slowly accumulated some internal state. I would expect this to be associated with a gradual increase in memory use. Is it using more memory than you would expect?

Since it’s been running for such a long time it will be using an old version of Cylc so it’s hard to say. The simple solution may be to stop and restart the suite, perhaps taking the opportunity to change to a more recent Cylc 7 bug-fix release at the same time.

Hi Oliver,

Thank you for your reply.

About the length of time the suite has been running: I misunderstood my colleague’s remark. He said it had been running 15 years of 6-hourly cycles for about 3 weeks. Sorry, I should’ve checked with my colleague before posting my message.

About memory usage: yes, the suite seemed to use more memory than other suites.

We did what you suggested: stopped the suite and restarted it. The resource usage by the newly restarted suite is now pretty modest. We’ll keep an eye on whether the resource use goes up as the suite runs for a longer period.

Thank you for your suggestion and possible explanation.

Cheers,

Jin

I’m not sure how Cylc interacts with the DB, but it could be related to the DB size? Perhaps you needed to trim it down?