Make task run before any other task can run

Hiya

What’s the best way to force a workflow to run a particular task (in this case a dummy one, remote_setup) before any others?

Currently I’m achieving this by manually graphing the remote_setup before each task which starts the workflow. This works but isn’t trivially portable. I’m sure there’s a really simple answer to this but I can’t figure it out from the docs.

Thanks in advance.

Jonny

Hi @jonnyhtwilliams

Generally yes, dependencies are what make tasks wait for other tasks, so you need your start-up task to be first in the graph, and all other tasks must be downstream of it (but not necessarily directly downstream, of course).

I suspect you might be asking how to make parentless tasks in every cycle of cycling workflow wait for start-up tasks though? The same principle holds, but there are a couple of ways to achieve the result. (Either way, I’ll explain it here as most users run into this question).

[scheduling]
    [[graph]]
        R1 = "prep => get-data"
       # ERROR: get-data is only forced to wait on prep in the first cycle!
        P1 = """
              get-data => model => post
              model[-P1] => model
        """

The problem here is that 1/prep, 2/get-data, 3/get-data, ...can all run immediately at startup, because Cylc runs multiple cycles at once as far as dependencies allow.(out to the runahead limit)

So you need to make all the parentless get-data tasks wait on prep in every cycle:

[scheduling]
    cycling mode = integer
    [[graph]]
        R1 = "prep => get-data"
        P1 = """
              prep[^] => get-data => model => post
              model[-P1] => model
        """

Result:

This does reflect the true dependencies of the system, and it works fine, but in more complex workflows with a bunch of startup tasks it can make graph visualization very messy.

A workaround that achieves the result less messily is to put a dummy task (prepped, say) in every cycle, and connect that back to the initial prep task like this:

[scheduling]
    cycling mode = integer
    [[graph]]
        R1 = "prep => prepped"
        P1 = """
              prepped[-P1] => prepped => get-data => model => post
              model[-P1] => model
        """
[runtime]
    [[prepped]]
         run mode = skip  # dummy task

(We are also considering adding an isolated start-up graph to Cylc, which will complete before the main cycling graph begins so users won’t have to manage true start-up processes with dependencies like this - but the team hasn’t finished with that discussion yet).

Thanks so much @hilary.j.oliver for the in depth explanation, very useful indeed.

If I only want to run this prep task in the very first cycle, I guess I can just use your combination of R1 and P1 in the first example.

I must admit I had overlooked the subtleties involved and the differences between R1 and P1! :grin: