Platfrom setup in global.cylc

Hi group,

My job, job.status are available in /scratch/masabas/bandwidth/cylc-run/…… and the job.err, job.out files are available here /scratch/masabas/cylc-run/… . My cylc is 8.5.1.

ls /scratch/masabas/bandwidth/cylc-run/workforecastmain6/run1/log/job/20250813T1200Z/unGrib/11/*
/scratch/masabas/bandwidth/cylc-run/workforecastmain6/run1/log/job/20250813T1200Z/unGrib/11/job
/scratch/masabas/bandwidth/cylc-run/workforecastmain6/run1/log/job/20250813T1200Z/unGrib/11/job.status

ls /scratch/masabas/cylc-run/workforecastmain6/run1/log/job/20250813T1200Z/unGrib/11/*
/scratch/masabas/cylc-run/workforecastmain6/run1/log/job/20250813T1200Z/unGrib/11/job.err
/scratch/masabas/cylc-run/workforecastmain6/run1/log/job/20250813T1200Z/unGrib/11/job.out

And the work dir is /scratch/masabas/bandwidth/cylc-run/workforecastmain6/run1/work/20250813T1200Z/unGrib/

My global.cylc is like this.

[platforms]
    [[shaheennew]]
        hosts = shaheennew
        job runner = slurm
        retrieve job logs = True
        cylc path = /scratch/masabas/iops/sw/miniconda3-amd64/envs/envPy311Sat/bin
        global init-script = """
            export HOME=/scratch/masabas/bandwidth
            export CYLC_RUN_DIR=/scratch/masabas/bandwidth/cylc-run
        """
        communication method = poll
        submission polling intervals = PT1M
        execution polling intervals = PT1M
[install]
    [[symlink dirs]]
        [[[shaheennew]]]
            run = /scratch/masabas/bandwidth
            log = /scratch/masabas/bandwidth
            work = /scratch/masabas/bandwidth
            share = /scratch/masabas/bandwidth

Can anyone guide me the $HOME setup for my platform ?

$HOME is a Bash (linux) built-in. You should never need to change it.

Can you explain what you want to be different?
Is this your installation of Cylc?

The $HOME variable is set by default on Linux platforms. However, a couple of unusual HPC setups have decided not to do this, causing issues with some tools, including cylc.

I think this might be a case of a platform with no $HOME directory?

If that is the case, the notes here should help:

(you just need to configure CYLC_RUN_DIR)

The job.err and job.out are writing at /scratch/masabas/cylc-run/…. and the rest of the job, job.status files are writing at /scratch/masabas/bandwidth/cylc-run/…..

masabas(login4): /home/masabas>ll /scratch/masabas/cylc-run/workforecastmain6/run1/log/job/20250813T1200Z/unGrib/17/
total 12
-rw-r--r-- 1 masabas g-masabas 4322 Aug 14 12:35 job.err
-rw-r--r-- 1 masabas g-masabas 1303 Aug 14 12:35 job.out
masabas(login4): /home/masabas>ll /scratch/masabas/bandwidth/cylc-run/workforecastmain6/run1/log/job/20250813T1200Z/unGrib/17/
total 12
-rwxr-xr-x 1 masabas g-masabas 6009 Aug 14 12:35 job
-rw-r--r-- 1 masabas g-masabas  167 Aug 14 12:35 job.status

The recent change in my global.cylc is

[install]
    [[symlink dirs]]
        [[[shaheennew]]]
            run = /scratch/masabas/bandwidth
            log = /scratch/masabas/bandwidth
            work = /scratch/masabas/bandwidth
            share = /scratch/masabas/bandwidth

before is

[install]
    [[symlink dirs]]
        [[[shaheennew]]]
            run = /scratch/masabas/
            log = /scratch/masabas/
            work = /scratch/masabas/
            share = /scratch/masabas/

Why it is consider two paths? /scratch/masabas/ and /scratch/masabas/bandwidth/even my symlink dirs are defined?

Yes. It’s homeless. And I am launching from my workstation.

I am not able to view the logs from my workstation. Because the file is not available in their paths.

Hi,

Aah, so it sounds like the workflow is running fine, but you’re having issues viewing the log files?

Yes. Workflow is fine.

The slurm job also shows a my workstation HOME + CYLC_JOBPATH.

I Added CYLC_TASK_SHARE_CYCLE_DIR in global.cylc.

[platforms]
[[shaheennew]]
hosts = shaheennew
job runner = slurm
retrieve job logs = True
cylc path = /scratch/masabas/iops/sw/miniconda3-amd64/envs/envPy311Sat/bin
global init-script = “”"
export CYLC_RUN_DIR=/scratch/masabas/bandwidth/cylc-run
export CYLC_TASK_SHARE_CYCLE_DIR=/scratch/masabas/bandwidth/cylc-run
“”"
communication method = poll
submission polling intervals = PT1M
execution polling intervals = PT1M

Job is running in both the cases. Log view issue is still exist.

Ok,

The setup outlined here is correct, if your workflow is running fine, there’s no need to change the platform configuration.

The cylc cat-log command (which is used by the GUI and Tui to display log files) will need to be changed in order to work for platforms with no $HOME directory (it is currently looking for the symlink to the files in $HOME).

I have opened an issue to fix this issue:

Until we have a fix, you can work around the problem by configuring retrieve job logs for the platform. This will tell Cylc to copy the job logs from the HPC onto the local host once the job has finished.

You won’t be able to view the logs via Cylc whilst the job is running, but you will be able to view them once the job finishes and the logs have been coppied.

retrieve job logs = True did not bring the job.err, job.out to my workstation after the job finish.

[jobs-poll ret_code] 0
[jobs-poll out] 2025-08-14T14:54:17+03:00|20250813T1200Z/unGrib/23|{“job_runner_name”: “slurm”, “job_id”: “7466609”, “run_status”: 0, “time_submit_exit”: “2025-08-14T14:44:09+03:00”, “time_run”: “2025-08-14T14:44:24+03:00”, “time_run_exit”: “2025-08-14T14:53:26+03:00”}
[((‘job-logs-retrieve’, ‘succeeded’), 23) ret_code] 1
[((‘job-logs-retrieve’, ‘succeeded’), 23) err] File(s) not retrieved: job.out

Hmm, that suggests it retrieved some of the files (e.g. job.err) but not the job.out?

Try taking a look at the job’s job-activity.log file, it might contain some clues about what went wrong.

On some PBS HPCs the job.out and job.err files are written to a temporary location whilst the job is running, then PBS moves these files to the configured location once the job has succeeded. This might be the cause of the error? If so, Cylc can support this, configure retries for the job log retrieval using retrieve job logs retry delays and Cylc will keep retrying until successful.

We configure retrieve job logs retry delays = PT10S, PT30S, PT3M for our PBS HPC.

If I am understand correctly, This issue is exist in previous versions also for slurm jobs.

This is a homeless compute node slurm job submission script. Here, the - -output, - - error is writing homelessly (I am suspecting this is still missing HOME). If you submit a small job in login node, then - -output is directed with HOME/…..

As, you said the logs are coming retrieve job logs = True or retrieve job logs retry delays with scp/rsync.

Previously, the [symlink dirs] /scratch/masabas which is equivalent to homeless HOME. So, no issue is raised. It works fine. Because, homeless HOME, symlink dirs are directing same area.

This time I am trying with [symlink dirs] /scratch/masabas/bandwidth .This is not matching with the homeless HOME. So, not able to sync from remote to host.

The job.out and job.err files are written to a temporary location, which is defined as CYLC_RUN_DIR could be a good solution.

Hi Oliver,

The job.err file in remote look like this.

cat /scratch/masabas/cylc-run/cylctest/run2/log/job/20250421T1200Z/testwork/01/job.err

/var/spool/slurmd/job7522798/slurm_script: line 61: /scratch/masabas/bandwidth/cylc-run/cylctest/run2/.service/etc/job.sh: No such file or directory
/var/spool/slurmd/job7522798/slurm_script: line 62: cylc__job__main: command not found

job-activity.log

cat ~/cylc-run/cylctest/run2/log/job/20250421T1200Z/testwork/01/job-activity.log
[jobs-submit ret_code] 0
[jobs-submit out] 2025-08-17T14:09:27+03:00|20250421T1200Z/testwork/01|0|7522798
2025-08-17T14:09:27+03:00 [STDOUT] Submitted batch job 7522798
[jobs-poll ret_code] 0
[jobs-poll out] 2025-08-17T14:10:28+03:00|20250421T1200Z/testwork/01|{“job_runner_name”: “slurm”, “job_id”: “7522798”, “job_runner_exit_polled”: 1, “time_submit_exit”: “2025-08-17T14:09:27+03:00”}

Hope, this will helpful to trace the bug.

Hope, this will helpful to trace the bug.

There may be a bug in cylc cat-log (hard for me to test without access to a platform with no $HOME directory).

However, the traceback in the job.err file you reported is probably a setup issue.

Make sure you’ve configured Cylc to symlink the entire workflow run directory onto /scratch (if that is where you want to store your workflows).

Instructions:

Issue is resolved with - - chdir=/scratch/masabas/bandwidth in the directives for slurm jobs.

Rest of the global.cylc settings are same.

Thank you very much.

2 Likes