Use of 'run hosts'

What are the requirements for using run hosts? My understanding it that this launches the cylc scheduler process itself on a separate server to where you run cylc install and cylc play.

I get errors saying the flow.cylc file cannot be found on the remote server when I try using this option, is it expected that the [run host] should share a disk with where you’re running cylc play?

Hello,

this launches the cylc scheduler process itself on a separate server to where you run cylc install and cylc play .

Yes, if run hosts are configured, the cylc play command will pick one of the hosts (according to any configured ranking) and re-invoke itself on that host via SSH.

The run hosts must:

  1. Share a common $HOME directory.
  2. Share a common Cylc global config ( global.cylc ).
  3. Be set up to allow passwordless SSH between them.

The documentation for this was written recently so has not yet been published yet but you can get a sneak peak via the “nightly” build of the documentation:

https://cylc.github.io/cylc-doc/nightly/html/user-guide/writing-workflows/scheduler.html#submitting-workflows-to-a-pool-of-hosts

Not sure it’s clear from Oliver’s response, or the documentation (we’ll fix that), but yes - you need to run cylc commands on a host that sees your workflow run directories.

Typical setup, as at NIWA: users log into particular HPC nodes for interactive work, including starting and interacting with Cylc workflows. And the run hosts global config ensures that all the Cylc schedulers start on a small pool of dedicated “Cylc nodes”, with basic load balancing at start up. (And all the nodes are on the shared filesystem).

Hello @oliver.sanders I am quite new to Cylc eco-system and discovering it to set it up on our HPC platform. We would like to setup a dedicated node to use it as Cylc scheduler and we are interested in using run hosts. All our nodes share a common HOME and SCRATCH. But cylc is not on PATH by default and we provide an environment module to be able to use it. When I test with run hosts with a remote node (with passwordless SSH), cylc play <workflow> fails saying cylc command not found, which makes sense.

Is there a way to manipulate PATH before invoking scheduler without having to mess up user’s .profile or .bashrc? Preferably using global.cylc config file. Something like cylc path that is provided for platforms

Cheers!!

Hi,

The way we handle this is to use a “wrapper script” called cylc which activates the required environment, then runs the Cylc command and put that “wrapper script” in the PATH.

A basic wrapper script might look like this:

#!/usr/bin/env bash
module load cylc
exec cylc "$@"

The wrapper script we use at our site is designed to work with Conda and virtualenv environments and is built into the Cylc package, you can extract it with this command:

$ cylc get-resources cylc /somewhere/in/the/system/path

This wrapper script doesn’t need to be updated for newer versions of Cylc so only needs to be installed once. It supports parallel Cylc installations using an environment variable to switch versions e.g:

$ cylc version
8.1.3
$ export CYLC_VERSION=8.0.4
$ cylc version
8.0.4
$ export CYLC_VERSION=7
$ cylc version
7.8.12

This makes it easier to upgrade environments because you don’t have to shut down any workflows running with that environment first.

The CYLC_VERSION (and a derived variable called CYLC_ENV_NAME) are automatically forwarded to all Cylc commands (including remote commands) so that different versions are completely parallel even for a distributed installation.

There are a couple notes on the wrapper script here:

https://cylc.github.io/cylc-doc/stable/html/installation.html#managing-environments

Thanks a lot @oliver.sanders for such a quick response. I checked the wrapper script and what you said makes sense. Our current environment module will add wrapper script to PATH and we can create multiple modules for multiple versions by simply changing CYLC_VERSION in each module which is pretty neat. I dont know if we will set the wrapper script on PATH “by default” as there are only subset of users that will use Cylc on our HPC platform.

But I figured out the issue and you made a PR recently to address this. I just need to ensure to have a localhost section in platforms and point cylc path to this wrapper script. Worked like a charm!!

This is my relevant test config

[scheduler]
    [[run hosts]]
        available = cylc-scheduler-node
[install]
    max depth = 4
    [[symlink dirs]]
      [[[localhost]]]
         run = ${WORK}/cylc
         log = ${WORK}/cylc
         share = ${WORK}/cylc
         work = ${WORK}/cylc
[platforms]
    [[hpc_platform]]
        hosts = localhost
        job runner = slurm
        shell = /bin/bash
        cylc path = /path/to/wrapper_script
        install target = localhost
    [[localhost]]
        cylc path = /path/to/wrapper_script

So when I execute cylc play <> on a login node, it picks up config from localhost platform and it will use full path specified in cylc path when setting up scheduler on cylc-scheduler-node via SSH. This way we will not have to add any Cylc related binaries to default PATH.

But I figured out the issue and you made a PR recently to address this.

You beat my memory to it :+1:!

1 Like