I have a system which primarily uses pbs, but has a subset of nodes that use slurm. These nodes are not binary compatible with the cylc host nodes. The job submits and runs correctly but cylc tries to run on the compute node and crashes. Is there a way to set up polling for just that one job? Using [[hosts]] in the global config file doesn’t seem to do what I want since that seems to want to ssh to the remote to submit the job.
[meta]
title = "The cylc Hello World! suite"
[scheduling]
[[dependencies]]
graph = "hello => hello_casper"
[runtime]
[[hello]]
script = "sleep 10; echo Hello World!"
[[[job]]]
batch system = pbs
batch submit command template = qsub -q regular -l walltime=01:00:00 -A NCGD0042 '%(job)s'
[[[directives]]]
-r = n
-j = oe
-V =
-S = /bin/bash
-l = select=1:ncpus=36:ompthreads=36
[[hello_casper]]
script = "sleep 10; echo Hello World from casper!"
[[[job]]]
batch system = slurm
[[[directives]]]
--ntasks=1
--cpus-per-task=8
--patition=dav
The job submits and runs correctly but cylc tries to run on the compute node and crashes.
I presume you mean the cylc message command crashes on the job host? (when it tries to send back job status messages).
Cylc is written in Python so it should not matter that your Slurm host is not binary compatible with the Cylc host.
So: is cylc message just failing because on the job host because it can’t send messages back (in which case, check network configuration), OR is cylc generally non-functional there because some Python library is missing, or something like that?. Your job.err file should reveal which.
Using [[hosts]] in the global config file doesn’t seem to do what I want that seems to want to ssh to the remote to submit the job.
What are you setting under the [[hosts]] heading exactly? If you configure a task with a remote host and batch system Slurm, Cylc will ssh to the host and submit the job to Slurm there, but that is via the task (or family) remote section, not global config hosts section.
If you have a local Slurm client (i.e. on the Cylc host) then don’t specify a remote in the suite configuration - then it is a local job as far as Cylc is concerned, regardless of where Slurm executes the job. (however note in this case you must have a shared filesystem between the local and job hosts).
Is there a way to set up polling for just that one job?
ERROR:root:code for hash md5 was not found.
Traceback (most recent call last):
File "/glade/u/apps/ch/opt/python/2.7.16/gnu/8.3.0/lib/python2.7/hashlib.py", line 147, in <module>
globals()[__func_name] = __get_hash(__func_name)
File "/glade/u/apps/ch/opt/python/2.7.16/gnu/8.3.0/lib/python2.7/hashlib.py", line 97, in __get_builtin_constructor
raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type md5
also this
Traceback (most recent call last):
File "/glade/u/apps/ch/opt/cylc/7.8.3/gnu/8.3.0/cylc-7.8.3/bin/cylc-message", line 140, in <module>
main()
File "/glade/u/apps/ch/opt/cylc/7.8.3/gnu/8.3.0/cylc-7.8.3/bin/cylc-message", line 136, in main
return record_messages(suite, task_job, messages)
File "/glade/u/apps/ch/opt/cylc/7.8.3/gnu/8.3.0/cylc-7.8.3/lib/cylc/task_message.py", line 82, in record_messages
'messages': messages})
File "/glade/u/apps/ch/opt/cylc/7.8.3/gnu/8.3.0/cylc-7.8.3/lib/cylc/network/httpclient.py", line 275, in put_messages
results = self._call_server(func_name, payload=payload)
File "/glade/u/apps/ch/opt/cylc/7.8.3/gnu/8.3.0/cylc-7.8.3/lib/cylc/network/httpclient.py", line 338, in _call_server
return self.call_server_impl(url, method, payload)
File "/glade/u/apps/ch/opt/cylc/7.8.3/gnu/8.3.0/cylc-7.8.3/lib/cylc/network/httpclient.py", line 369, in call_server_impl
return impl(url, method, payload)
File "/glade/u/apps/ch/opt/cylc/7.8.3/gnu/8.3.0/cylc-7.8.3/lib/cylc/network/httpclient.py", line 477, in _call_server_impl_urllib2
import ssl
File "/glade/u/apps/ch/opt/python/2.7.16/gnu/8.3.0/lib/python2.7/ssl.py", line 98, in <module>
import _ssl # if we can't import it, let the error propagate
ImportError: libssl.so.1.0.0: cannot open shared object file: No such file or directory
Okay I now have a proper cylc build on those nodes. How do I let cylc know the correct path to
that build? I am submitting from localhost, so using some [[remote]] option doesn’t seem to be the correct solution.
Thanks, that did it. But it doesn’t seem that cylc is sourcing my .bashrc,
I had to include that explicitly in my job-init-env.sh. The documentation your link points to seems to suggest that that should happen automatically. Any idea why it doesn’t?
That’s right - task job scripts invoke bash login shells, so you need to use .bash_profile. There are several other ways you can configure the environment for Cylc on job hosts too, but bash login scripts is the best way these days.