How to specify listening IP address for CYLC execution host

Mauro_Tridici · August 27, 2020, 4:37pm

Dear Users,

I recently deployed Cylc on a virtual machine and on the login node of an HPC cluster.
So, users can create their suites on CYLC vm and submit/execute them from the cluster login node.

CYLC vm has two IP address (172.16.1.37, 192.168.118.37)
cluster login node has two IP address (172.16.1.3, 192.168.118.3)

Unfortunately, I noticed that, on the login node side, CYLC is running listening on the wrong IP address (172.16.1.3 instead of 192.168.118.3)

In the attached picture, you can see that communication starts in the right way but login answer is routed on a wrong network path.

Is there a way to specify the listening address for cylc on login node side?

Thank you in advance,
Mauro

hilary.j.oliver · August 28, 2020, 10:43pm

Hi Mauro,

The recommended way to do this to have one or more Cylc VMs for running suites, and those suites just submit their jobs to the HPC. For this to work, users need to be able to configure non-interactive ssh to the HPC login node, and the Cylc ports need to be open to route job status messages back to the Cylc VMs (or if you have to, Cylc can poll job status from the VM side).

Your screenshot shows Cylc hasn’t been installed quite right. [/opt/cylc] should be showing the Cylc version, e.g. [7.9.1].

However, if you really have to run Cylc schedulers on the login node, let’s see if we can figure out the problem:

*** listening on https://login3.cluster.net:43063 ***

I think this comes from Python’s socket.getfqdn() on the suite host. If that is producing the “wrong” address, does it indicate something is wrong with your network config?

The main use for this is for task jobs, which may be running on remote hosts, to send status messages back to the server. So the server has to self-identify as an address that is visible to remote jobs (the address will be written by Cylc into job scripts).

If all users are running Cylc on the same login node, you can hardwire the value via Cylc global config:

[suite host self-identification]
    method = hardwired
    host = 192.168.118.13

Worst case scenario, you could modify the fairly straightforward Python code in lib/cylc/hostuserutil.py. (But let us know if you have to do that; it would be good to know what if anything is unusual about your system compared to others).

I hope that is somewhat helpful (but feel free to ask more questions if it isn’t).

Hilary

Mauro_Tridici · August 29, 2020, 1:12am

Hi Hilary,

thank you very much for the detailed answer.

Unfortunately, I’m a newbie and may be that, misunderstanding Cylc installation and configuration guide, I made some mistakes.

Anyway, in order to be more clear (I hope…), I would like to describe how I deployed Cylc:

I created a virtual machine and I installed Cylc engine (Cylc VM should be used to create, manage and execute Cylc suites jobs/tasks on HPC cluster login nodes). I don’t know if it is the correct terminology, but we can say that “Cylc server” is running on the virtual machine;
I installed VNC server on the same virtual machine mentioned above in order to allow users to start, using a VNC client, the “cylc gui” (graphical mode) in a faster way;
I installed Cylc (let’s say Cylc client) on HPC cluster login nodes in order to be able to communicate with Cylc VM;
Cylc VM mounts via NFS the HPC cluster shared filesystem containing users home directories (so, cylc-run, suites directory, etc… can be seen from both the systems);

Moreover, CylC VM (let’s say the “server”) and HPC cluster login node ( that in my mind should be the client ) have two IP addresses: one IP address for management network(172.16.0.0/22) and one IP address for “Cylc network” (192.168.118.0/24).

In particular:

1- Cylc users can log into the VM (via VPN) pointing to the 192.168.118.37 IP address.

2- Cylc users can “run” the Cylc suites from Cylc VM pointing to 192.168.118.13 IP address (HPC cluster login node)

3- Cylc tasks/jobs will be executed on HPC cluster login node (it is in charge of submitting the jobs on compute nodes by means of LSF scheduler).

Points 1, 2 and 3 are ok (from my point of view), but I noticed the "*** listening on https: login3(dot)cluster(dot)net:43063***” message.

Both on Cylc VM and login node, the /etc/hosts files contain the following lines:

172.16.0.3 login3(dot)cluster(dot)net login3

192.168.118.13 zeus03

192.168.118.37 cylc

So, my question was: “Why, on HPC login node, Cylc is “using” 172.16.0.3 IP address instead of “192.168.118.13”!? How can I set the right IP!?”

In other words, I would prefer that all network traffic related to Cylc will be routed to “cylc network”.

So, I followed your suggestion and it did the trick:

[suite host self-identification]

method = hardwired
host = 192.168.118.13

Now, the "*** listening on https://login3.cluster.net:43063 ” message changed to " listening on https://zeus03:43063***”

[/opt/cylc] is a symbolic link to the real Cylc directory /opt/cylc-7.8.4

I followed the official installation guide instructions:

"To install Cylc, unpack the release tarball in the right location, e.g. /opt/cylc-7.8.2, type make inside the release directory, and set site defaults - if necessary - in a site global config file (below).

Make a symbolic link from cylc to the latest installed version: ln -s /opt/cylc-7.8.2 /opt/cylc"

But, if I made some mistake or if I misunderstood something, please, correct me!

Thank you very much for the time you spent to help me.

Best Regards,

Mauro

D_Sutherland · September 1, 2020, 10:39am

@hilary.j.oliver - This isn’t a target run host thing? i.e. at NIWA we set the following locally (and a similar one site wide):

[ecox_test@w-ec-admin01 ~]$ more .cylc/global.rc
process pool size = 16
[suite servers]
    run hosts = w-ec-cylc01.maui.niwa.co.nz, w-ec-cylc02.maui.niwa.co.nz, w-ec-cylc03.maui.niw
a.co.nz
    run ports = 43001 .. 43100
    scan hosts = w-ec-cylc01.maui.niwa.co.nz, w-ec-cylc02.maui.niwa.co.nz, w-ec-cylc03.maui.ni
wa.co.nz, w-ec-admin01.maui.niwa.co.nz
    scan ports = 43001 .. 43100
    #condemned hosts = w-ec-cylc02.maui.niwa.co.nz
    [[run host select]]
        rank = random
        thresholds = memory 1048576; load:15 7.75

So cylc run from our login/admin machine starts the suites and listen on the run/cylc hosts/VMs…

Too simple?

hilary.j.oliver · September 1, 2020, 10:47am

My impression from the description above is @Mauro_Tridici isn’t configuring Cylc to use a pool of VMs, just a single one that users log in to, and the Cylc schedulers there submit jobs to LSF on the HPC login node.

Rather this is about the scheduler “self-identifying” (to its jobs) with a hostname that is not visible (or is wrong) from the job host. (Which I think means Python socket.getfqdn() on the suite host is not returning an FQDN that is visible from the job host).

hilary.j.oliver · September 1, 2020, 11:25am

I missed your question about this @Mauro_Tridici

Your directory location and cylc symlink look fine. Did you do this bit: “type make inside the installation directory”? That creates a VERSION file containing the version number extracted from the unpacked directory name. (Cylc 8 will have a more sensible installation procedure, BTW). Then if you put /opt/cylc/bin in $PATH, the command cylc --version should return 7.8.2 (e.g.) and not /opt/cylc (which it seems to be returning, based on your suite start-up transcript above).

Regards,
Hilary

Mauro_Tridici · September 2, 2020, 7:58am

Yes, Hilary, you are right. this is my case.
My english is not very good, but you understood my environment.

Thank you,
Mauro

Mauro_Tridici · September 2, 2020, 8:12am

Hi Hilary,

thank you very much for your answer
Yes, I did make inside installation directory.
And this is the output of cylc --version

[root@zeus-login3][/zeus/opt/cylc-flow-7.8.4]> cylc --version
7.8.4

Just a last question:
Taking a look at my CYCL environment, what is the correct terminology for CYLC virtual machine (that only dispatches the jobs to HPC login node)? Job host? Suite host?
And what is the correct terminology for HPC login node? Job host?

Sorry, but I’m little bit confused about this and, for this reason, I’m not able to correctly set some parameters defined in CYLC configuration file.

Thank you,
Mauro

oliver.sanders · September 18, 2020, 8:23am

Taking a look at my CYCL environment, what is the correct terminology for CYLC virtual machine (that only dispatches the jobs to HPC login node)? Job host? Suite host?

Yep this is the suite server, these can be configured using the [suite servers] section of the global.rc file.

And what is the correct terminology for HPC login node? Job host?

Any host you run jobs on is referred to as a job host, these are configured using the [hosts] section.

Mauro_Tridici · September 18, 2020, 9:48am

Hi Oliver,

many thanks for your help and kind explanation.

Bets Regards,
Mauro

Topic		Replies	Views
Running on remote Cylc Support	8	776	September 28, 2019
Failed (remote) cylc task fails but server fails to notice Cylc Support	5	489	July 28, 2022
How to avoid workflow submission from "client" node Cylc Support	5	169	September 10, 2023
Https messaging fails, communication error between HPC and cylc hosts Cylc Support	9	1547	October 8, 2020
Running cylc tasks on compute nodes that cannot see /home Cylc Support	14	1155	October 29, 2020

How to specify listening IP address for CYLC execution host

Related topics