Setting directives at runtime: --nodes from env-var

Hello all,

Apologies, still using cylc v7.9.1 here. But my issue likely applies to cylc 8 as well.

I have a suite with dynamic parameters tat depend on the model domain which is defined by values is rose-suite.conf. Upon startup the suite creates a grid, extracts the horizontal grid dimensions, calculates how many processors are required, and finally calculates the required number of nodes to run the model.
The startup tasks that compute these runtime parameters simply write those values into little files in ${CFG_MPI_PARAMS_DIR} (under the suite’s share directory). In the root’s environment section I can import the values of these dynamic parameters, because shell substitution works well there:

    [[root]]
        [[[environment]]]
            MPI_PARAM_NPROCS = $(cat ${CFG_MPI_PARAMS_DIR}/MPI_PARAM_NPROCS.cfg || echo 0)
            MPI_PARAM_NUM_NODES = $(cat ${CFG_MPI_PARAMS_DIR}/MPI_PARAM_NUM_NODES.cfg || echo 0)

so any subsequent task can grab the values from the environment variables, e.g. MPI_PARAM_NPROCS.

In the runtime section I define the task that runs the model like this:

[runtime]
    [[run_model]]
        [[[environment]]]
            ROSE_TASK_APP = run_model
            MPI_OPTS = -np ${MPI_PARAM_NPROCS}
        [[[directives]]]
            --partition = defq
            --nodes = ${MPI_PARAM_NUM_NODES}

While this looks logical to me, the mechanism of using variables only works for the [[[environment]]] section of the task definition. The variable MPI_OPTS will indeed take a value like “-np 103” if 103 was the content of the little file ${CFG_MPI_PARAMS_DIR}/MPI_PARAM_NPROCS.cfg (written by the suite startup tasks).

But this doesn’t work for the [[[directives]]] and the job fails to start at all with an error reported in job-activity.log:

[STDERR] sbatch: error: “${MPI_PARAM_NUM_NODES}” is not a valid node count

It appears that no shell substitution is done, the text is interpreted verbatim as –nodes ${MPI_PARAM_NUM_NODES} rather than –nodes 4.

So my question is:
How can I dynamically set the tasks directives, like --nodes ?

Thanks,
Fred

For the moment, the only way to change the directives for a task is using cylc broadcast. For instance, in your startup tasks, rather than write the values into files, broadcast the required settings to the relevant tasks / families.

To give a little more detail on how you can make directives dynamic using broadcasts as suggested above.

This example sets the --nodes directive for all tasks in the MPI family on a per-cycle-point basis allowing MPI_PARAM_NUM_NODES to be set dynamically whilst the workflow is running:

[scheduling]
    [[graph]]
        P1D = """
            start_cycle => MPI
        """

[runtime]
    [[start_cycle]]
        script = """
            cylc broadcast \
                "${CYLC_WORKFLOW_ID}" \
                -p "${CYLC_TASK_CYCLE_POINT}"
                -n MPI \
                -s '[directives]--nodes=${MPI_PARAM_NUM_NODES}' \
                -s '[environment]MPI_PARAM_NUM_NODES=${MPI_PARAM_NUM_NODES}'
        """
        [[[environment]]]
            MPI_PARAM_NPROCS = $(cat ${CFG_MPI_PARAMS_DIR}/MPI_PARAM_NPROCS.cfg || echo 0)
            MPI_PARAM_NUM_NODES = $(cat ${CFG_MPI_PARAMS_DIR}/MPI_PARAM_NUM_NODES.cfg || echo 0)

    [[MPI]]

    [[run_model]]
        inherit = MPI

Hi @fredw - I’ll explain exactly why this doesn’t work, even though it might look sensible at first glance.

The workflow config file (suite.rc in Cylc 7, or flow.cylc in Cylc 8`) is not a shell script, of course, so Cylc itself does not parse bash syntax.

Particular task config items, namely the script and [environment] ones, get written verbatim to the task job scripts that get submitted to execute your tasks. The same goes for Slurm directives, which are written to the job script (as they are supposed to be) as shell comments. The written values are *as defined in the suite.rc at start-up, and nothing gets evaluated by the shell until the shell actually executes the job script, on the job host. So all of your job scripts are going to look like this:

...
# --nodes=${NODES}
...
# Cylc user environment definitions:
NODES=$(cat nodes.txt)
...
# Cylc user script items
echo Hello World
...

The shell never sees the directives, because they’re protected inside a shell comment.

Note each of your tasks will actually define NODES by cating the file that the first task writes. They do not just inherit the resulting value from root. What they inherit from root is the literal environment variable definition including the cat statement.

Long story short, the values of batch system directives (like environment variables and script items) are defined at workflow start-up, and are written verbatim to the job script in the expected format - which is a shell comment for directives.

To support what you want we would have to make Cylc artificially interpret bash syntax in certain Cylc config item values before writing job scripts and handing them off to the shell (via Slurm or whatever) for execution. That is feasible but there are potential pitfalls and complications.

The correct way to do it, as suggested, is to use cylc broadcast to “broadcast” new/updated config item values to the scheduler at run time, to replace the values it parsed at start-up. The new values will then be used instead of the originals when new job scripts are written, for any tasks targetted by the broadcast command.

I hope that all makes sense!

Thank you very much. This is the method I used in the end and it works fine.
I inserted a new task, which always runs before the “run_model” task, which broadcasts the directives.
It’s a little bit ugly, because it’s not entirely obvious at first glance why this is needed, but I added a comment pointing to this thread, so future-me might remember.

As always, thanks for the prompt reply

2 Likes