Platfrom setup in global.cylc

Sateesh_Masabathini · August 14, 2025, 9:13am

Hi group,

My job, job.status are available in /scratch/masabas/bandwidth/cylc-run/…… and the job.err, job.out files are available here /scratch/masabas/cylc-run/… . My cylc is 8.5.1.

ls /scratch/masabas/bandwidth/cylc-run/workforecastmain6/run1/log/job/20250813T1200Z/unGrib/11/*
/scratch/masabas/bandwidth/cylc-run/workforecastmain6/run1/log/job/20250813T1200Z/unGrib/11/job
/scratch/masabas/bandwidth/cylc-run/workforecastmain6/run1/log/job/20250813T1200Z/unGrib/11/job.status

ls /scratch/masabas/cylc-run/workforecastmain6/run1/log/job/20250813T1200Z/unGrib/11/*
/scratch/masabas/cylc-run/workforecastmain6/run1/log/job/20250813T1200Z/unGrib/11/job.err
/scratch/masabas/cylc-run/workforecastmain6/run1/log/job/20250813T1200Z/unGrib/11/job.out

And the work dir is /scratch/masabas/bandwidth/cylc-run/workforecastmain6/run1/work/20250813T1200Z/unGrib/

My global.cylc is like this.

[platforms]
    [[shaheennew]]
        hosts = shaheennew
        job runner = slurm
        retrieve job logs = True
        cylc path = /scratch/masabas/iops/sw/miniconda3-amd64/envs/envPy311Sat/bin
        global init-script = """
            export HOME=/scratch/masabas/bandwidth
            export CYLC_RUN_DIR=/scratch/masabas/bandwidth/cylc-run
        """
        communication method = poll
        submission polling intervals = PT1M
        execution polling intervals = PT1M
[install]
    [[symlink dirs]]
        [[[shaheennew]]]
            run = /scratch/masabas/bandwidth
            log = /scratch/masabas/bandwidth
            work = /scratch/masabas/bandwidth
            share = /scratch/masabas/bandwidth

Can anyone guide me the $HOME setup for my platform ?

wxtim · August 14, 2025, 9:22am

$HOME is a Bash (linux) built-in. You should never need to change it.

Can you explain what you want to be different?
Is this your installation of Cylc?

oliver.sanders · August 14, 2025, 9:26am

The $HOME variable is set by default on Linux platforms. However, a couple of unusual HPC setups have decided not to do this, causing issues with some tools, including cylc.

I think this might be a case of a platform with no $HOME directory?

If that is the case, the notes here should help:

(you just need to configure CYLC_RUN_DIR)

Sateesh_Masabathini · August 14, 2025, 9:42am

The job.err and job.out are writing at /scratch/masabas/cylc-run/…. and the rest of the job, job.status files are writing at /scratch/masabas/bandwidth/cylc-run/…..

masabas(login4): /home/masabas>ll /scratch/masabas/cylc-run/workforecastmain6/run1/log/job/20250813T1200Z/unGrib/17/
total 12
-rw-r--r-- 1 masabas g-masabas 4322 Aug 14 12:35 job.err
-rw-r--r-- 1 masabas g-masabas 1303 Aug 14 12:35 job.out
masabas(login4): /home/masabas>ll /scratch/masabas/bandwidth/cylc-run/workforecastmain6/run1/log/job/20250813T1200Z/unGrib/17/
total 12
-rwxr-xr-x 1 masabas g-masabas 6009 Aug 14 12:35 job
-rw-r--r-- 1 masabas g-masabas  167 Aug 14 12:35 job.status

The recent change in my global.cylc is

[install]
    [[symlink dirs]]
        [[[shaheennew]]]
            run = /scratch/masabas/bandwidth
            log = /scratch/masabas/bandwidth
            work = /scratch/masabas/bandwidth
            share = /scratch/masabas/bandwidth

before is

[install]
    [[symlink dirs]]
        [[[shaheennew]]]
            run = /scratch/masabas/
            log = /scratch/masabas/
            work = /scratch/masabas/
            share = /scratch/masabas/

Why it is consider two paths? /scratch/masabas/ and /scratch/masabas/bandwidth/even my symlink dirs are defined?

Sateesh_Masabathini · August 14, 2025, 9:44am

Yes. It’s homeless. And I am launching from my workstation.

Sateesh_Masabathini · August 14, 2025, 9:50am

I am not able to view the logs from my workstation. Because the file is not available in their paths.

oliver.sanders · August 14, 2025, 9:53am

Hi,

Aah, so it sounds like the workflow is running fine, but you’re having issues viewing the log files?

Sateesh_Masabathini · August 14, 2025, 9:56am

Yes. Workflow is fine.

The slurm job also shows a my workstation HOME + CYLC_JOBPATH.

Sateesh_Masabathini · August 14, 2025, 10:12am

I Added CYLC_TASK_SHARE_CYCLE_DIR in global.cylc.

[platforms]
[[shaheennew]]
hosts = shaheennew
job runner = slurm
retrieve job logs = True
cylc path = /scratch/masabas/iops/sw/miniconda3-amd64/envs/envPy311Sat/bin
global init-script = “”"
export CYLC_RUN_DIR=/scratch/masabas/bandwidth/cylc-run
export CYLC_TASK_SHARE_CYCLE_DIR=/scratch/masabas/bandwidth/cylc-run
“”"
communication method = poll
submission polling intervals = PT1M
execution polling intervals = PT1M

Job is running in both the cases. Log view issue is still exist.

oliver.sanders · August 14, 2025, 10:16am

Ok,

The setup outlined here is correct, if your workflow is running fine, there’s no need to change the platform configuration.

The cylc cat-log command (which is used by the GUI and Tui to display log files) will need to be changed in order to work for platforms with no $HOME directory (it is currently looking for the symlink to the files in $HOME).

I have opened an issue to fix this issue:

github.com/cylc/cylc-flow

cat-log: allow command to work for remote job logs on platforms with no $HOME directory

opened 10:13AM - 14 Aug 25 UTC

oliver-sanders

bug

Some platforms do not have a `$HOME` directory, this messes up `cylc cat-log`. …The `cylc cat-log` command is used by the GUI and Tui to display log files. See https://cylc.discourse.group/t/platfrom-setup-in-global-cylc/1230/5 ### The problem Some platforms do not have a `$HOME` directory. Cylc can work with these platforms via a [special setup](https://cylc.github.io/cylc-doc/stable/html/reference/config/writing-platform-configs.html#platform-with-no-home-directory). However, the `cylc cat-log` command is not compatible with this setup (it is hardwired to follow the symlink from `$HOME`) so cannot view logs on the remote platform. ### The solution The remote invocation of the `cylc cat-log` command will need to be provided with the value of the `[symlink dirs]log` configuration as determined from the local host where the `cylc cat-log` command is run in order to locate these log files (this is similar to how remote-init/remote-fileinstall work). However, note that following the symlink is actually the "correct" behaviour as it is robust against changes to the global config. So perhaps we should attempt to follow the symlink, then fallback to the configured location if it's not present, or vice-versa? ### The workaround Configure [`job log retrieval`](https://cylc.github.io/cylc-doc/stable/html/reference/config/global.html#global.cylc[platforms][%3Cplatform%20name%3E]retrieve%20job%20logs). With this config, after the job has completed, the logs will be copied back to the host where the workflow is running. The `cylc cat-log` command will then locate them on the local filesystem.

Until we have a fix, you can work around the problem by configuring retrieve job logs for the platform. This will tell Cylc to copy the job logs from the HPC onto the local host once the job has finished.

You won’t be able to view the logs via Cylc whilst the job is running, but you will be able to view them once the job finishes and the logs have been coppied.

Sateesh_Masabathini · August 14, 2025, 1:02pm

retrieve job logs = True did not bring the job.err, job.out to my workstation after the job finish.

[jobs-poll ret_code] 0
[jobs-poll out] 2025-08-14T14:54:17+03:00|20250813T1200Z/unGrib/23|{“job_runner_name”: “slurm”, “job_id”: “7466609”, “run_status”: 0, “time_submit_exit”: “2025-08-14T14:44:09+03:00”, “time_run”: “2025-08-14T14:44:24+03:00”, “time_run_exit”: “2025-08-14T14:53:26+03:00”}
[((‘job-logs-retrieve’, ‘succeeded’), 23) ret_code] 1
[((‘job-logs-retrieve’, ‘succeeded’), 23) err] File(s) not retrieved: job.out

oliver.sanders · August 14, 2025, 2:56pm

Hmm, that suggests it retrieved some of the files (e.g. job.err) but not the job.out?

Try taking a look at the job’s job-activity.log file, it might contain some clues about what went wrong.

On some PBS HPCs the job.out and job.err files are written to a temporary location whilst the job is running, then PBS moves these files to the configured location once the job has succeeded. This might be the cause of the error? If so, Cylc can support this, configure retries for the job log retrieval using retrieve job logs retry delays and Cylc will keep retrying until successful.

We configure retrieve job logs retry delays = PT10S, PT30S, PT3M for our PBS HPC.

Sateesh_Masabathini · August 14, 2025, 6:25pm

If I am understand correctly, This issue is exist in previous versions also for slurm jobs.

This is a homeless compute node slurm job submission script. Here, the - -output, - - error is writing homelessly (I am suspecting this is still missing HOME). If you submit a small job in login node, then - -output is directed with HOME/…..

As, you said the logs are coming retrieve job logs = True or retrieve job logs retry delays with scp/rsync.

Previously, the [symlink dirs] /scratch/masabas which is equivalent to homeless HOME. So, no issue is raised. It works fine. Because, homeless HOME, symlink dirs are directing same area.

This time I am trying with [symlink dirs] /scratch/masabas/bandwidth .This is not matching with the homeless HOME. So, not able to sync from remote to host.

The job.out and job.err files are written to a temporary location, which is defined as CYLC_RUN_DIR could be a good solution.

Sateesh_Masabathini · August 17, 2025, 1:22pm

Hi Oliver,

The job.err file in remote look like this.

cat /scratch/masabas/cylc-run/cylctest/run2/log/job/20250421T1200Z/testwork/01/job.err

/var/spool/slurmd/job7522798/slurm_script: line 61: /scratch/masabas/bandwidth/cylc-run/cylctest/run2/.service/etc/job.sh: No such file or directory
/var/spool/slurmd/job7522798/slurm_script: line 62: cylc__job__main: command not found

job-activity.log

cat ~/cylc-run/cylctest/run2/log/job/20250421T1200Z/testwork/01/job-activity.log
[jobs-submit ret_code] 0
[jobs-submit out] 2025-08-17T14:09:27+03:00|20250421T1200Z/testwork/01|0|7522798
2025-08-17T14:09:27+03:00 [STDOUT] Submitted batch job 7522798
[jobs-poll ret_code] 0
[jobs-poll out] 2025-08-17T14:10:28+03:00|20250421T1200Z/testwork/01|{“job_runner_name”: “slurm”, “job_id”: “7522798”, “job_runner_exit_polled”: 1, “time_submit_exit”: “2025-08-17T14:09:27+03:00”}

Hope, this will helpful to trace the bug.

oliver.sanders · August 19, 2025, 8:19am

Hope, this will helpful to trace the bug.

There may be a bug in cylc cat-log (hard for me to test without access to a platform with no $HOME directory).

However, the traceback in the job.err file you reported is probably a setup issue.

Make sure you’ve configured Cylc to symlink the entire workflow run directory onto /scratch (if that is where you want to store your workflows).

Instructions:

Sateesh_Masabathini · August 25, 2025, 11:58am

Issue is resolved with - - chdir=/scratch/masabas/bandwidth in the directives for slurm jobs.

Rest of the global.cylc settings are same.

Thank you very much.

Topic		Replies	Views
Setting up a Platform Config for machines that share a home directory Cylc 8 Migration	35	1184	March 17, 2022
Running cylc tasks on compute nodes that cannot see /home Cylc Support	14	1160	October 29, 2020
Cylc8 on archer2 Cylc Support	17	604	November 26, 2021
How to set dynamic workflow run directory location for each unix user? Cylc Support	12	385	August 31, 2023
Cylc8 migration issues Cylc 8 Migration	36	654	October 5, 2023

Platfrom setup in global.cylc

Related topics