On our cray-based systems the compute nodes see the login nodes with a different IP address than what the login node self identifies with. This was dealt with in Cylc v7 by using the hardwired option in [suite host self-identification]. It worked, but was limiting because we could only launch Cylc from the single login node that we hardwired into the global.rc file. I never investigated if we could add more hardwired options.
In Cylc 8 (specifically 8.3.6), I was hopeful that platforms was a way around this, but I am having trouble understanding the relationship between [platforms] and [[host self-identification]].
For example, if I have a platform setup like this:
[scheduler][host self-identification] determines how a scheduler self-reports its own location (i.e., the host it is running on) to its jobs, so that those jobs can successfully communicate back to it.
Platforms (a.k.a. “Job Platforms”) represent job hosts, so there’s not really any direct relationship between the two concepts.
However, if you want to launch schedulers on several hosts that see the same global.cylc file (on a shared filesystem) I think you could use Jinja2 to select the right hardwired hostname at runtime.
Thank-you…this makes sense. I’m still wrapping my head around [platforms], but I think that at least for now where everything we do is on a shared filesystem, I understand it.
Follow-up question about the global.cylc file. Is that only read at play time, or does the workflow continue to read it for the duration of the workflow? Also, is it copied to the install directory and read from there or is it read in place?
I hear that quite a lot, which makes me wonder if we haven’t explained it well. If you find the documentation confusing, let us know and we’ll try to fix it.
Basically a job platform represents a cluster on a shared filesystem, with a job runner such as PBS. The platform “hosts” are the hosts that Cylc schedulers can use to interact with the job runner to submit, poll, and kill its jobs. Typically that might be e.g. the interactive or login nodes of the cluster. There can be more than one such host, which makes Cylc 8 job platforms more robust than the old singular job hosts.
The above seems pretty straightforward to me. Perhaps the potential for confusion comes in when your scheduler host also belongs to a job platform, and with the “install target” setting which allows Cylc to avoid redundant or clashing installs to multiple hosts on the same filesystem? If so, feel free to ask more questions, and we can consider documentation tweaks.