I am trying to rewrite some old Cylc v7 suite.rc files into v8.3.4 format and I am having trouble understanding how to define platforms and job runners. In Cylc7, I had the following in a inc/platform.rc file that would then be included in the suite.rc and inherited by certain tasks to define if that task should run in the background vs the node:
[runtime]
[[BG_TASK]]
[[[job]]]
batch system = background
I went through the 8.x tutorial and see that [[[job]]] is gone now and has been replaced with the (I think) [platforms] utility in $HOME/.cylc/flow/8.3.4/global.cylc. So I wrote the following in my global.cylc file:
which can then be inherited by tasks that I want to run in the background. The jobs wonāt submit. The only output I get from running the workflow is a job-activity.log file that says:
My understanding was that you define custom named platforms in your global.cylc file which can then be referenced in your flow.cylc file using [runtime][]platform. Where did I go wrong here?
Thatās right. At a site with many Cylc users youād expect platforms to be defined centrally, but if not you can do it in your user global.cylc file.
Youāll get more information from the scheduler log. Run your workflow again with --no-detach or use cylc cat-log <workflow-id> (or look in ~/cylc-run/<workflow-id>/log/scheduler/log):
So: Cylc could not connect to your platform because āUnable to find valid host for BG_TASKā. You did not list any hosts in your platform definition, so Cylc tried the default (host-name = platform-name), which didnāt work.
You need to define a list of hosts and an install target, as well as job runner. The minimal platform definition for local background jobs (which, by the way, is the default if you donāt specify a platform at all):
thanks for the response, that makes sense and thank you for linking the documentation associated with platform configuration.
I am trying to think how to adapt this methodology to one of our suites where we include a multitude of platform files (i.e. HPC1_include.rc, HPC2_include.rc, HP3_include.rc) that the user has to choose from based on what machine the user is running on. Each of these files will have a different version of, say, job runner (i.e. pbs vs slurm).
Now that the definition of these platforms is delegated to the global.cylc file, it seems like the user would have to know to correctly change the specifications of their platforms in their global.cylc file prior to using our version controlled Cylc workflow.
A workaround might be to use $CYLC_SITE_CONF_PATH and hardwire that in flow.cylc[runtime][root] to be $CYLC_WORKFLOW_RUN_DIR/etc. In $CYLC_WORKFLOW_RUN_DIR/etc, I would have something like HPC1.cylc, HPC2.cylc, etc. and prior to installing the workflow, the user must symlink global.cylc to one of these files. Am I over thinking this?
You should not need to use CYLC_SITE_CONF_PATH for this.
You can define all the platforms in the same global.cylc, and just select the right ones as needed in workflow task definitions.
Platform definitions can overlap - e.g. you could define a platform to run background jobs on one particular host of several that also appear in a platform with PBS as a job runner (note that platform hosts are where Cylc interacts with the job runner, not the compute notes managed by the job runner itself).
I think you could keep the same set of flow.cylc include files, but just change their content to set the right platform name for the task family.
Is there anyway to do this without a global.cylc? Iād prefer everything needed to run the workflow to be self contained in one git clone checkout without the need for the user to go into their $HOME/.cylc/flow and change/create a file.
No, we donāt support platform definitions inside a workflow configuration, because platforms are inherently not workflow-specific.
Ideally, normal users shouldnāt even need to understand how to define platforms, they should just choose from the centrally-defined ones.
If that has not been done (i.e., no central definitions), you can define your own platforms, but the principle is the same - all of your workflows should select from the same platforms, no need to redefine them in every workflow.
Is it not feasible to have a central global.cylc for all users on your workflow scheduler host(s)?
Note thereās likely to be other global config needed too, not just platforms.
Okay that makes sense. It seems like we will have to figure out a way to create a centrally-defined global.cylc for each of our HPC systems that all users can reference via the *_include.rc files included in the workflow checkout
Our lab hasnāt made the full transition to using Cylc8 so there isnāt much of an infrastructure yet. There is no central global.cylc yet, but this seems like the direction our development should move in as we progress in this transition
Slight aside (handy if you are upgrading from Cylc 7) -
If you have the platform setup correct and you run a workflow in compatibility mode (i.e. use Cylc8 on a workflow defined in a suite.rc) Cylc will select a platform from the old settings if one can be found which matches.
Can you clarify the difference between scheduler hosts and job platforms? Is the scheduler host for BG_TASK (as defined above in the previously discussed global.cylc) the localhost and is the job platform equivalent to the job runner (i.e. ābackgroundā)?
Thanks for the steps. Iāll probably use these when updating my big suite to a flow!
The scheduler host is the server or VM where your Cylc scheduler runs, to manage your workflow.
Job platforms are where the scheduler submits jobs to run.
The scheduler host - usually called localhost (because most things in Cylc are relative to the scheduler) - is often also a job platform (in fact it is the default job platform, if you donāt specify one) if you need to run task jobs locally on the scheduler host.
The global.cylc file configures how Cylc schedulers behave, including telling them what job platforms are available to submit jobs to - so it has to be readable by the scheduler on the scheduler host. You donāt need a global.cylc file on the job platforms (if you run local jobs on scheduler host the global.cylc file will of course happen to be there - but itās not used by the jobs).
That doesnāt quite make sense.
BG_TASK is a job platform. A job platform does not āhaveā a scheduler host - thatās not a property of job platforms. A job platform is just where a scheduler, running on a scheduler host, can submit jobs.
the job platform is not equivalent to the job runner, but a job platform has a job runner. E.g. on platform āHPC1ā (say) you might have Slurm as a job runner. The ābackgroundā job runner is just how we tell Cylc to run jobs āin the backgroundā - i.e. as a direct subprocess - rather than submitting them to a proper resource manage like Slurm or PBS.
Okay, it is becoming a bit more clear. Let me see if I understand. The scheduler host is essentially the machine that Cylc scheduler will run on and this is, by default, localhost. If the need arises, is this default changed in global.cylc[scheduler][run hosts]available? Is the scheduler host also where the
Localhost is also a default defined job platform, but you can add in custom job platforms in global.cylc with different job submission settings (i.e. job runner).
The global.cylc needs to be on localhost because the scheduler host will look for it there (unless otherwise specified, by what i am guessing is global.cylc[scheduler][run hosts]available). I have always had the global.cylc - or the global.rc in Cylc7 - on my login node without understanding why but this clears that up)
Couple more questions regarding the global.cylc[platforms]:
When you said āthe scheduler hostā¦ is often also a job platformā, are you referring to global.cylc[platforms]hosts?
In an example on how I run the suite, if I am running Cylc on the login node of an HPC and I want the suite to run by submitting via pbs to be run on the compute nodes, what would be my scheduler host and my job platform?
Is global.cylc[platforms]install target used by Cylc to define where the job.err/.out files will go? Are those files what the documentation refers to as āremote file installationā?
Is there a way I can check to see if there are central definition of job platforms?
Yes. Also known as the āscheduler run hostā, as per the global config ārun hostsā settings.
It is āby default, localhostā in the sense that if you run cylc play on the command line, it will by default start the scheduler locally. If a pool of ārun hostsā are configured, it will start the scheduler on one of those instead.
The name ālocalhostā is also used in some configuration settings to refer to the host that the scheduler is running on (because those settings configure the scheduler program, and the host it is running on is ālocalhostā as far as it is concerned).
Yes.
Yes-ish. More specifically, the global.cylc file has to be on a filesystem that is visible from the scheduler run host. If you have a pool of run hosts, they all have to be on the same shared filesystem, so thereās still only one central global.cylc file.
Well, you can put explicit platform settings for localhost in global.cylc if you like, but note those settings have to be valid for all scheduler run hosts (again, these settings are interpreted by schedulers, so ālocalhostā refers to the scheduler host).
However, my point there was really that we do often run jobs, as well as schedulers, on scheduler hosts. A single user installation on a laptop, for instance, likely runs everything on the same host (the laptop) and probably runs all jobs as simple background jobs (no PBS or Slurm).
Even on an HPC cluster, the default if you do not specify a platform in a task definition is to run the job as a local background job - and that does not require any global.cylc ālocalhostā platform settings.
The scheduler host (and therefore ālocalhostā in Cylc configs, is the login node. And you should define, in global.cylc, a job platform that specifies (a) pbs as the job runner; (b) localhost as the install target; and (c) hosts = localhost. The platform definition does not need to list compute nodes as hosts. It just lists the host(s) that Cylc can use to interact with the job runner - to submit, poll (query), or kill jobs.
If your system has multiple login nodes from which you can interact with PBS, and other āinteractive nodesā you can use for e.g. manual postprocessing work, it would probably be better to run the Cylc scheduler there (on an interactive node) and define a job platform that lists the multiple login nodes as hosts - thatās more robust as Cylc can continue to run and manage jobs even if one of the login nodes goes down.
No, an install target represents a filesystem, and is used to install workflow source files on job platforms. A platform may have multiple hosts, and multiple platforms might be on the same filesystem. Cylc only needs to install files once, for all the hosts that see the same filesystem.
Yes! Just type cylc config in your terminal. It parses and prints the global config by default (i.e., if you donāt give it a specific workflow ID). It even has a special option to print just the platform definitions. See āPlatform printing optionsā in cylc config --help.
Platform printing options:
--platform-names Print a list of platforms and platform group names
from the configuration.
--platforms Print platform and platform group configurations,
including metadata.