Suggestions sought for handling complex workflow

Hi all! We are in the process of verifying our first full workflow in order to put it in production (yey!), and I am already trying to figure out how to handle a variation of it.
This workflow would be relatively easy to achieve except for the pesky nowcast that has a varying model cycles length.
What would be the best way to keep track of where I am at? I can’t think of making it work without having a file that captures how many model cycles I am running with at the moment. I am thinking in particular when having to restart the model because of some problem or other.
Each “cast” runs in its own workflow, with triggers managing the dependencies bewteen the casts, and clock triggers for scheduling the runs
Cheers!
Gaby

Hi,

I’m not sure I’ve understood the setup, but here’s a Cylc translation of the diagram:

[scheduling]
    [[graph]]
        P3D = b3[-P3D] => b3 => h3 => n3 => f10
        +P1D/P3D = h3[-P1D] => n4 => f10
        +P2D/P3D = n4[-P1D] => n5 => f10

It would be possible to use Jinja2 to adapt this for an arbitrary number of n<x> cycles.

Hi Oliver, it’s food for thought, thanx.

Hi @gturek

We might be able to help more if you can tell us exactly how your diagram (with labels (task names?) like “B3”, “N3”, “F10” etc.) relate to the "model"s, "cast"s, and “workflows”, in your text.

Also, by “each cast runs in its own workflow” do you literally mean distinct Cylc workflows - so you are running sub-workflows (workflows inside tasks)?

In which case, by “triggers managing dependencies between the casts” do you mean workflow_state xtriggers, for inter-workflow triggering?

Hilary

Hi Hilary, no actually I don’t run sub workflows, I’ve seen that mentioned several times here but I have not given it much thought. We basically launch the same workflow, but with different configurations depending on the cast and use workflow_state xtriggers to orchestrate between them.

In response to your question, basically, the b<ncycle>, h<ncycle>, n<ncycle> and f<ncycle> above are complete workflows, they are all the same graph wise, but the configuration (meaning environment variables, other quantities such as <ncycle> in namelists) change depending on the cast. Some tasks are conveniently skipped when necessary (great feature that!)

The nowcast (‘n’) is the one that has the model run for <ncycle>=3,4,5. So really this variation only impacts nominally the task running the model and its config files (to be verified…)

Here is a graph of the workflow:

PS. There is a plan to have someone one your end review our workflow and make suggestions, but I am still waiting to hear from management how it’s going to work exactly

And for comparison, here’s how the whole thing runs now:

OK, good. So you launch each instance (of the same source workflow, configured differently) either manually or via a script, and there is some dependence between them managed with workflow_state xtriggers. (They would only be sub-workflows if another workflow had tasks to launch them).

From here on, I think it’s still not enough for me to understand the workflow and make suggestions, sorry.

A couple of questions that come to mind, that may or may not be useful: Your Cylc graph appears to show a single cycle. Along with your statement quoted just above, does that suggest you are not using Cylc’s cycling capabilities at all, but rather manually (or script) launching several non-cycling graphs as distinct workflows, for each forecast cycle? Or does each launched graph run for several cycles (the tick-marks in your schematic)? If so, I’m not necessarily saying that’s the wrong way to do it (I don’t understand the scenario very well, so I may be way off base here), but I do wonder if it would be easier to do the whole thing at once with cycling graphs (something like @oliver.sanders showed above).

Hilary

p.s. your “current scenario” graph seems pretty clear, but I’m still struggling to make the connection to the other stuff. Is it just that the nowcast (N) bit is new, and doesn’t appear in the current system?

Hi Hilary, I’ve only shown the one cycle to show the full workflow, but each workfow at launch time has the same init and final cycle dates.

In the current scenario the best analysis is ‘h’ (hindcast), the analysis in ‘n’ (nowcast), the 14 days model forecast is ‘f’ and the 10 days model forecast is the ‘daily_forecast’ (which “goes away” in the new scheme).

xtriggers and recurrence rules live in a separate .cylc file and are in included in flow.cylc
Here are xtriggers: (XT_CAST is the workflow ID, D2M is days to next monday, which is when the hindcast actually runs, but with weds’ date)

  {% if CAST == 'hcst' %}
    [[ xtriggers ]]
        clock_trigger = wall_clock()
  {% elif CAST == 'ncst' %}
    [[ xtriggers ]]
        clock_trigger = wall_clock(PT4H)
        # Special case for hindcasts. We assume that we may want to
        # start a hindcast up to wed before the nowcast run
    {% if D2M == 5 %}
        hindcast_r1 = workflow_state("{{XT_CAST}}//%(point)s/block_NEMO_cycle1", is_trigger=True):PT2S
    {% elif D2M == 6 %}
        hindcast_r1= workflow_state("{{XT_CAST}}//%(point)s/block_NEMO_cycle1", offset="-P1D", is_trigger=True):PT2S
    {% endif %}
        hindcast = workflow_state("{{XT_CAST}}//%(point)s/block_NEMO_cycle1", offset="-P2D", is_trigger=True):PT2S
  {% elif CAST == 'fcst' %}
    [[ xtriggers ]]
        clock_trigger = wall_clock(PT10H)
        nowcast = workflow_state("{{XT_CAST}}//%(point)s/block_NEMO_cycle1", is_trigger=True):PT2S
  {% elif CAST == 'daily' %}
    [[ xtriggers ]]
        clock_trigger = wall_clock(offset=PT23H)
        nowcast = workflow_state("{{XT_CAST}}//%(point)s/block_NEMO_cycle1", is_trigger=True):PT2S
  {% endif %}

And here is the recurrence rules

{% if CAST == 'hcst' %}
  {% if D2M >= 5 %}
        # Special case for hindcasts. We assume that we may want to
        # start a hindcast up to wed before the nowcast run
        R1/+P{{D2M}}D = block_NEMO<cycle=1>[-P{{D2M}}D] => FAMPREP
        R/+P{{D2M}}D/P7D = block_NEMO<cycle=1>[-P7D] => FAMPREP
  {% else %}
        {{ RECUR_MON }} = block_NEMO<cycle=1>[-P7D] => FAMPREP
  {% endif %}
{% elif CAST == 'ncst' %}
    {% if D2M >= 5 %}
        R1 =  @hindcast_r1 => FAMPREP
        {{ RECUR_WED }} ! R1 =  @hindcast => FAMPREP
    {% else %}
        {{ RECUR_WED }} =  @hindcast => FAMPREP
    {% endif %}
{% elif CAST == 'fcst' %}
        {{ RECUR_WED }} = @nowcast  => FAMPREP
{% elif CAST == 'daily' %}
        {{ RECUR_WED }} = @nowcast => FAMPREP
        P1D ! {{ RECUR_WED }} = block_NEMO<cycle=0>[-P1D] => FAMPREP
{% endif %}

My conondrum is how do I configure the nowcast workflow with the correct nowcast’s model <ncycle> which varies, unlike for the other casts for which it remains set to 3.

I mean, I could just have 3 different nowcast workflows, n1, n2, n3 preconfigured with the correct <ncycle> but it seems overkill. I’d rather have some logic that says: if I am in nowcast mode and condition??? then <ncycle>=3 (or 4 or 5)

Hi Gaby,

That is clearer, but there’s still too much about your specific workflows that I don’t understand. I’m sure it would be easier to sit down and go through it in person, if only that were possible.

For instance, it’s not exactly clear to me what ncycle is:

other quantities such as <ncycle> in namelists) … The nowcast (‘n’) is the one that has the model run for <ncycle>=3,4,5.

Does that mean the workflow runs for three cycles (3,4,5), in the sense of Cylc graph recurrences, or is ncycle a static (known at start-up) configuration parameter that has (presumably) something to do with cycling?

My conondrum is how do I configure the nowcast workflow with the correct nowcast’s model <ncycle> which varies, unlike for the other casts for which it remains set to 3.

I mean, I could just have 3 different nowcast workflows, n1, n2, n3 preconfigured with the correct <ncycle> but it seems overkill. I’d rather have some logic that says: if I am in nowcast mode and condition??? then <ncycle>=3 (or 4 or 5)

Sorry if I’m being dense, but I wonder if you could abstract this whole thing to a simpler workflow question focused on just this problem (e.g. I have a task foo that needs to run in these cycles in scenario A, and these cycles in scenario B, how do I that?), without all the terminology and surrounding bits of graph that are quite specific to your use case but maybe not relevant to the core of the problem.

Hi Hilary, thank you for your patience. <ncycle> is a parameter for the model NEMO, meaning how many days it needs to run the simulation for. Its value also affects other calculations such as end/start dates needed for the workflow to stage the correct input files (such as restarts) as the dates are encoded in the input file names ( I long for the totally sensical way files were ordered @ NIWA by year/month/day :cry:)

Just to finalize this discussion, the solution we have come up with is to parametrize <ncycle> depending on the day of the week and cast the model is running. It turns out that there will a lot more variability than that envisaged in the graph posted at the beginning of this discussion.