Run host ranking algorithm extensions?

TomC · January 20, 2025, 2:48am

Hi, looking at Global Configuration — Cylc 8.4.0 documentation - I was wondering if there was any way to extend the ranking algorthim to be a bit smarter. From how I read the documentation, the load limiting is based on current memory, load etc. But, if you launch a workflow at a temporary quiet period on that VM, the results could be a bit misleading.

I wondered if there was some way to give weights to workflows for example, or other things outside of the psutil attributes. For example,

VM1 = [a, b,c]  # all large workflows
VM2 = [z, y, x]  # all small workflows

Run cylc vip, and its quiet, so it launches on VM1, but, then, the workflows there start running and the VM is flooded, causing slowness due to resource contention. But, if there was a way to say a and x has a weight of 9, b of 6, c of 8, whereas z,y are weight 1 and 3. So, the new workflow should ideally end up on VM2 as it will on average have lower load.

More smarts would be expected/planned resource load profiles and Cylc could compare against expected peak loads and launch workflows in such a way to smooth out expected/planned load.

oliver.sanders · January 20, 2025, 10:52am

Hi,

We can use any of the functionalities of the psutil module, do basic maths with any numbers returned and hardcode thresholds, etc. The expressions you provide are parsed by Python in a restricted environment which only allows certain operations (e.g. you cannot import modules). This isn’t presently pluggable.

On our site, we use the following:

ranking = """
    getloadavg()[2] < 20
    virtual_memory().available > 2000000000
    -1 * virtual_memory().available
"""

This ranks hosts by available memory, but excludes hosts with high server load. We find ranking by available memory works pretty well, our servers are about as evenly loaded as could be hoped for. Here’s a screenshot of today’s memory loading (0-100%), the servers are all within 5% of each other:

If you’re worried about erratic spikes in workflow activity hitting your servers hard, two things you can do:

Look at using time averaged metrics from psutils, e.g. load average comes in 5, 10 and 15 min bins.
Scale more vertically, the more workflows you run on a server, the more stable the load on will be on the server.

TomC · January 20, 2025, 11:19am

Thanks. One last question on this, is there a default ranking approach used in cylc, is the default an empty string?

oliver.sanders · January 20, 2025, 11:24am

The default (ranking = ) is random selection (will clarify that in the docs).

Topic		Replies	Views
Dynamic load balancing Cylc Support	13	78	March 18, 2025
Slow load of Cylc workflows, disconnects Cylc Support	27	556	January 31, 2024
Is cylc 8 able to interact with workflows launched on different hosts? Cylc Support	3	326	April 12, 2023
Running on remote Cylc Support	8	765	September 28, 2019
Cylc UI server memory requirements Cylc Support	5	161	April 23, 2024

Run host ranking algorithm extensions?

Related topics