Cylc and Singularity container

Has anyone tried to integrate the Singularity container within Cylc environment? I guess the Cylc task job script has to do the following steps,

  1. Submit a task job script to an HPC queue
  2. The task job script starts a Singularity container
  3. Within the container the task job script executes the commands under pre-script, script, etc.

Step 3 is a tricky part. If anyone has managed to find a solution I would love to hear about it. Thank you.

Cheers,

Jin

Hi Jin,

I tried to use Docker and Singularity for running a complete workflow within a container (e.g. GitHub - kinow/cylc-singularity: Singularity containers for Cylc). I havenā€™t executed that on our HPC yet, but it should work (may require some experimentation and some changes).

And I think I talked with Hilary some weeks ago about how other workflow engines support running parts of a workflow (or a sub-workflow, a cwl workflow, etc) within a container. This is much harder for us right now as it would require changes in how Cylc executes commands.

But an alternative would be to just write shell commands in script, pre-script, that run the singularity container.

Within the container the task job script executes the commands under pre-script, script, etc.

Right now I think you would have to take control of the contents of script, pre-script, and use a container to run your workflow steps.

Kinda like this I think?

# ~/cylc-run/singularity1/suite.rc
[scheduling]
cycling mode = integer
initial cycle point = 1
[[dependencies]]
graph = "foo"

[runtime]
[[foo]]
pre-script = "singularity exec --no-home --no-privs docker://python:2.7.18-stretch python --version"

script = "singularity exec --no-home --no-privs docker://python:2.7.18-stretch python -c 'for i in range(10):print i'"

And running it

$ cylc run singularity1
...
$ cylc cat-log singularity1 foo.1
Suite    : singularity1
Task Job : 1/foo/01 (try 1)
User@Host: kinow@ranma

2020-09-28T10:59:07+13:00 INFO - started
0
1
2
3
4
5
6
7
8
9
2020-09-28T10:59:13+13:00 INFO - succeeded

I used my development version of Cylc 8 to run these examples, but it should work with Cylc 7. This workflow was executed with Python 3.8, but the pre-script and script were executed with Singularity+Python 2.7.

So in theory you could use it to run any container image with docker/singularity/etc, just taking care to use any options your environment require (e.g. singularity exec --no-home --nonet --workdir /u01/scratch/some-user ...), bind paths, use variables like $CYLC_SUITE_WORK_DIR, $CYLC_TASK_WORK_DIR, etc.

Hope this helps.

Bruno

p.s: my pre-script ended up in my job.err script, which was a surprise to me, so Iā€™m debugging it, but shouldnā€™t be a problem here as Iā€™m using Cylc 8

2 Likes

Hi Bruno,

Thanks a lot! Iā€™m new to containers and so your explanation gives me a very good starting point.

I just tried your example using a Singularity container image I have (it is called JEDI which comes from Joint Center for Satellite Data Assimilation, JCSDA),

pre-script = ā€œsingularity exec --no-home /g/data/dp9/jtl548/source/jedi/jedi_singularity_image/jedi-gnu-openmpi-dev_latest.sif python --versionā€

and Cylc correctly executes the Python command.

However, for executing my task what I have is (Iā€™m using Rose instead of calling Cylc directly),

script = singularity exec --no-home --no-privs /g/data/dp9/jtl548/source/jedi/jedi_singularity_image/jedi-gnu-openmpi-dev_latest.sif rose task-run

This command fails with the following error message,

/.singularity.d/actions/exec: 21: exec: rose: not found

because the JEDI Singularity container does not include Rose. So what I think should happen is that ā€œrose task-runā€ has to be called first and then within Rose application configuration file there has to be a call to start the container, followed by a call to execute the main body of the Rose app config. Iā€™m not sure if this is possible. Do you have any thoughts or advice?

Glad it was a bit helpful Jin!

because the JEDI Singularity container does not include Rose. So what I think should happen is that ā€œrose task-runā€ has to be called first and then within Rose application configuration file there has to be a call to start the container, followed by a call to execute the main body of the Rose app config. Iā€™m not sure if this is possible. Do you have any thoughts or advice?

Unfortunately I am not very familiar with Rose. From what I understood, from your description above, it looks like you are on the right path, calling the container with rose, and not the other way around.

Maybe you could build a container with Rose I guess, or add rose to the JEDI image if possible?

You could also try binding the directory with Rose from your computer, to inside the container. But as rose is a Python utility, I suspect it would complain about either missing configuration files, or missing Python modules. But hereā€™s how youā€™d do it:

$ singularity exec --no-home --no-privs --bind /path/to/rose/bin:/opt docker://python:2.7.18-stretch /opt/rose

The --bind takes a src and a dest location. The src is in your computer. The dest in the container (very similar to Docker volumes). So virtually the contents of /home/kinow/bin would be copied in /opt/ inside the container.

This may lead in confusing and hard-to-debug problems, plus making your container less portable (and with Docker, if you donā€™t take care, it may end up modifying file permissions, not sure about Singularity). But itā€™s a good trick to know :slight_smile:

If you manage to run your workflow with Singularity, Iā€™d be interested in learning how you achieved it. Either a simple reply here, a GitHub gist, or even a short post (I think we could copy it somewhere in cylc.github.io, or in our docs).

Bruno

Hi Bruno,

Thanks for suggesting ā€˜ā€“bindā€™. I tried it but it looks like the user bind control is disabled in the Singularity configuration on our HPC. I sent a query to our HPC support and am waiting for a reply.

I will definitely let this discourse group know when I make any progress.

1 Like

Didnā€™t know Singularity allowed you to customize what was allowed or not. Good feature, and good to know (NIWA is using Singularity too in our HPC, so probably soon I will be learning more about it).

Thanks!

I think that should work. Just modify the command you have specified within your Rose app config.
https://metomi.github.io/rose/doc/html/api/configuration/application.html#rose:conf:rose-app.conf[command]default

1 Like

Hi David,

Hereā€™s my suite definition,

[[hello_venus]]
script = singularity exec --no-home --no-privs /g/data/dp9/jtl548/source/jedi/jedi_singularity_image/jedi-gnu-openmpi-dev_latest.sif /scratch/access/apps/rose/2019.01.2/bin/rose task-run
[[[directives]]]
-q = express
-l = ā€˜walltime=00:01:00,ncpus=1,mem=1Gā€™

Our sysadmin tells me that all filesystems are bind-mounted into the Singularity container automatically so I simply call rose using an absolute path. However the task, hello_venus fails because of incompatibility of the Python used by the Rose installation outside and the Python interpreter within the container. I donā€™t think the call sequence used above is really satisfactory as it assumes the software stack inside and outside the container are compatible, which in a lot of cases would not be.

I donā€™t see any way of getting around the problem other than include Rose and Cylc in the container image. But the image is supplied by JCSDA and I doubt they will be keen to allow Rose and Cylc to be added to their image.

Do you have any suggestion?

Hi Jin

You are presumably running a rose app named hello_venus.
In the rose-app.conf file for that app there will be a setting like this:

[command]
default=my-command

Try changing this to

[command]
default=singularity exec --no-home --no-privs /g/data/dp9/jtl548/source/jedi/jedi_singularity_image/jedi-gnu-openmpi-dev_latest.sif /path/to/my-command

Does that work?

1 Like

Hi David,

Yes, that works!

The fact that the command, ā€˜singularity execā€™ passes all environment variables means that outside variables - such as the shell the environment variables, the Rose and Cylc environment variables as well as those variables defined under [environment] section of the suite definition - are all available to the container. This is terrific!

I now have the basic Rose/Cylc call pattern to enable tasks to run within a Singularity container.

Thank you very much to Bruno and to David. Your help is very much appreciated.

Cheers,

Jin

3 Likes
using the Singularity definition file by kinow (above)
with some modifications,
a simple suite was started and stopped 
with the container

key changes:

  1. add python python-requests
  2. python -m pip install pip==19.3.1
  3. pyopenssl==18.0.0
  4. git clone --branch 7.9.8