Controlling which R1 tasks run upon bootstrapping

Hello all.

I have a suite that runs a few tasks in R1 to prepare the suite directory. These tasks grab the source code, and compile it, so that subsequent apps will run properly. Running them in R1 makes sense as I don’t need to do it again afterwards.

The tasks that require the compiled code only ever run in T00, but the suite has other things that run at T12 as well.

When I start the suite on a T00 cycle point all is fine. But when I start the suite on a T12 cycle point some of those tasks in R1 also run, but shouldn’t. In fact the R1 tasks that prepare the suite for the next T00 run should run in this case, but not go any further because I haven’t started on a T00.

[[dependencies]]
    [[[R1]]]    # Run once at initial cycle
        graph = """
            install_srcs      => make_engine
            make_engine  => run_engine
        """
    [[[T00]]]
        graph = """
            poll_data_00 => run_engine => housekeep
        """

    [[[T12]]]
        graph = """
             ...
        """

I hope I have simplified my actual suite correctly to show what’s going on.

The poll_data_00 task is only designed to run in T00, that’s fine.
But the run_engine task should have a dependency on make_engine, as it cannot run beforehand.
How do I make sure that install_srcs and make_engine runs in R1 regardless of the cycle-point I start the suite on. All the while maintaining that make_engine is finished before run_engine ever runs?

Thank you,
Fred

Hi Fred,

But when I start the suite on a T12 cycle point some of those tasks in R1 also run, but shouldn’t.

As you’ve written the graph above all of your R1 tasks should run in the initial cycle point, no matter what it is, because R1 is short for R1/^/ which means “run once at the initial cycle point”.

If you need to be able to start on T00 or T12 cycle points, but your start-up tasks only matter for T00 tasks, then the easiest fix would be (I think) change R1 to R1/T00 which means run once at the first T00 point at or beyond the initial cycle point.

Hilary

By the way there is also the related min() syntax which might be relevant:

min(T00, T12)  # the first incidence of T00 or T12 after the initial cycle point

https://cylc.github.io/doc/built-sphinx-single/index.html#advanced-examples

Something else to note about your workflow: the graph says that run_engine only has to wait on make_engine in the first cycle, and not in subsequent cycles. With Cylc 7 it should work as you intend despite that, with start-up tasks in R1/T00, but only:

  • If the next run_engine is delayed waiting for real-time data (guessing that’s what poll_data_00 does)
    • which won’t have any effect if you run behind the clock
  • And because Cylc-7 (and earlier) tasks implicitly depend on their own previous-instance submitting (which allows successive instances of the same task to run concurrently but not out-of-order)
    • this implicit constraint will be removed in Cylc 8

So in Cylc 8 you’ll find (if you start this suite far enough behind the clock that poll_data_00 does not provide an constraint on upcoming cycles) that upcoming run_engines try to execute before the initial make_engine is finished.

Best to make the true dependencies of the system explicit like this:

[[[T00]]]
      graph = "poll_data_00 & make_engine[^] => run_engine => housekeep"

or else explicitly constrain successive instances of run_engine execute in order (so the second one can’t run before first, which can’t run before make_engine is done, …), e.g.:

[[[T00]]]
      graph = """poll_data_00 => run_engine => housekeep
                  run_engine[-P1D]:submit => run_engine"""

Thank you Hilary, and sorry for not replying sooner. I got side-tracked onto another project and only today managed to return to this problem.

First I looked at your suggestion using the R1/T00 syntax (which I didn’t know about). I split the R1 block into 2 sections, one [[[R1]]] and another [[[R1/T00]]].
This prevents run_engine from running on an initial T12 cycle point. In that case the suite simply runs install_srcs and make_engine, but not run_engine (which doesn’t need to run until the next T00 cycle point).

[[dependencies]]
    [[[R1]]]    # Run once at initial cycle (at any time point T00 and T12)
        graph = """
            install_srcs      => make_engine
        """
    [[[R1/T00]]]    # Run once at initial cycle (only at time point T00)
        graph = """
            make_engine  => run_engine
        """
    [[[T00]]]
        graph = """
            poll_data_00 => run_engine => housekeep
        """

    [[[T12]]]
        graph = """
             ...
        """

Looking at the resulting bubble graph, I also noticed that there could be a timing issue, as you pointed out in your later reply. The run_engine task in the subsequent T00 cycle point could run before, the install_srcs => make_engine is finished in the initial T12 cycle point.

So I experimented with your other suggestions (using the [^] syntax I wasn’t familiar with):

[[dependencies]]
    [[[R1]]]    # Run once at initial cycle (at any time point T00 and T12)
        graph = """
            install_srcs      => make_engine
        """
    [[[R1/T00]]]    # Run once at initial cycle (only at time point T00)
        graph = """
            make_engine[^]  => run_engine
        """

The extra [^] worked. There was now a dependency from run_engine in the T00 cycle point back to make_engine in the initial T12 cycle point.
But I don’t understand why this worked at all - the suite was started on a T12 cycle point, so why is R1/T00 intepreted at all?

I also used the other suggestion of using inter-cycle dependencies:

[[[T00]]]
    graph = """
            run_engine[-P1D] => run_engine
        """

but as you say in cylc 7 it doesn’t cause any different behaviour. but I left it in as it makes the resulting behaviour more explicit.

As always many thanks for your help!! Very much appreciated
Fred

Your code comments suggest you’ve misunderstood the meaning of R1 and R1/T00 slightly. This comment is correct:

 [[[R1]]]    # Run once at initial cycle (at any time point T00 and T12) [OK]

But this one:

[[[R1/T00]]]    # Run once at initial cycle (only at time point T00) [WRONG]

should read:

[[[R1/T00]]]    # Run once at the first T00 cycle (which may or may not be the initial cycle point!)

Does that explain why “R1/T00 is interpreted at all”?

Hilary

p.s. R1 by itself means “run once at the initial cycle point” because R1/^ is inferred if a specific point is not given

1 Like

Now I get it. Thanks for the explanation. I’ll change the comments in my suite

1 Like