Splitting a large workflow into two?

Hi there! So… I have been handed a flow.cylc file for the modelling part of our workflow (we use NEMO, BTW)

The graph (and flow.cylc below):

I am starting to think that maybe it makes sense to have a separate workflow that only executes the model part and triggers off the “main” workflow which takes care of prep and post tasks.
Is this a viable approach? It would enable the R&D people to fiddle with the modeling bit and leave everything else alone.
How would I share key environment variables between the two workflows?
Cheers
Gaby

#!Jinja2

[cylc]
   UTC mode = True

[meta]

{% set ig_modesystem = 0 %}
{% set ig_ntimes = 1 %}
{% set cycIAU = -1 %}
{% set ig_write_DUP = 0 %}
{% set ig_reload_DUP = 0 %}

{% set cycledeb = 0 %}
{% set cyclefirstassim = 1 %}
{% if ig_modesystem >= 2 %}
    {% set cyclefirstassim = ig_ntimes + 1 %}
{% elif ig_modesystem == 0 %}
    {% set cyclefirstassim = 0 - cycIAU %}
{% elif ig_modesystem <= -1 %}
    {% set cyclefirstassim = 1 - cycIAU %}
{% endif %}
{% if cyclefirstassim <= 0 %}
    {% set cyclefirstassim = 1 %}
{% endif %}

{% set cyclefirstmodel = 0 %}
{% if ig_modesystem == 0 %}
    {% set cyclefirstmodel = -1 - cycIAU %}
{% elif ig_modesystem <= -1 %}
    {% set cyclefirstmodel = 0 - cycIAU %}
{% endif %}
{% if cyclefirstmodel <= -1 %}
    {% set cyclefirstmodel = 0 %}
{% endif %}

{% set cyclefin = ig_ntimes %}
{% if ig_modesystem >= 2 %}
    {% set cyclefin = ig_ntimes - 1 %}
{% endif %}

{% set cyclestopassim = cyclefin %}
{% if ig_modesystem == 1 %}
    {% set cyclestopassim = cyclefin - 1 %}
{% elif ig_modesystem == 0 and cycIAU == 0 %}
    {% set cyclestopassim = cyclefin - 1 %}
{% elif ig_modesystem == -1 and ig_ntimes != 1 %}
    {% set cyclestopassim = cyclefin - 1 %}
{% elif ig_modesystem == -2 %}
    {% set cyclestopassim = cyclefin - 1 %}
{% endif %}

{% if ig_modesystem == 0 or ig_modesystem == 1 %}
    {% if ig_write_DUP == 1 %}
        {% set cyclefin = cyclefirstmodel %}
        {% set cyclefirstassim = ig_ntimes + 1 %}
    {% elif ig_write_DUP == 2 %}
        {% set cyclefin = cyclefirstassim %}
    {% endif %}
    {% if ig_reload_DUP == 1 or ig_reload_DUP == 2 %}
        {% set cyclefin = cyclefirstassim %}
    {% endif %}
    {% if ig_reload_DUP == 1 or ig_write_DUP == 2 %}
        {% set cyclestopassim = cyclefirstassim - 1 %}
    {% endif %}
{% endif %}

[scheduling]
    cycling mode = integer
    initial cycle point = {{ cycledeb }}
    final cycle point  = {{ cyclefin }}

    [[graph]]
        R1 = """
             block_INIT => block_DATA
             launch_blockDATA:launch_blockDATA_yes? => block_DATA
             block_DATA => launch_ANALYSIS
             launch_ANALYSIS:launch_ANALYSIS_yes? => launch_BIAS_HTS:launch_BIAS_HTS_yes? => block_BIASTS 
             launch_ANALYSIS:launch_ANALYSIS_yes? => launchblock_SAM:launchblock_SAM_yes? => block_SAM
             block_SAM => cpmxpackcmz2cpmx => cpmx2cdf => cdf2cmx => cmx2splitcmx => canredcmx2cmx_forsplit => cpmx2cdf_forsplit
             (block_BIASTS | launch_BIAS_HTS:launch_BIAS_HTS_no?) & (block_SAM | launchblock_SAM:launchblock_SAM_no? )| launch_ANALYSIS:launch_ANALYSIS_no? => block_NEMO => com2newcom
             launchblock_NEMO:launch_blockNEMO_yes? => block_NEMO
             launchblock_mergeola:launchblock_mergeola_yes? => ola2wola
             launchblock_concat_bias_hts:launchblock_concat_bias_hts_yes? => concat_bias_hts
             block_NEMO => launchblock_mergeola & launchblock_concat_bias_hts
             """
        P1 = """
             com2newcom[-P1] => block_SAM
             launch_ANALYSIS:launch_ANALYSIS_yes? => launch_BIAS_HTS:launch_BIAS_HTS_yes? => block_BIASTS 
             launch_ANALYSIS:launch_ANALYSIS_yes? => launchblock_SAM:launchblock_SAM_yes? => block_SAM
             ola2wola[-P1] & concat_bias_hts[-P1] => block_BIASTS
             (block_BIASTS | launch_BIAS_HTS:launch_BIAS_HTS_no?) & (block_SAM | launchblock_SAM:launchblock_SAM_no? )| launch_ANALYSIS:launch_ANALYSIS_no? => block_NEMO => com2newcom
             block_SAM => cpmxpackcmz2cpmx => cpmx2cdf => cdf2cmx => cmx2splitcmx => canredcmx2cmx_forsplit => cpmx2cdf_forsplit
             launchblock_NEMO:launch_blockNEMO_yes? => block_NEMO
             launchblock_mergeola:launchblock_mergeola_yes? => ola2wola
             launchblock_concat_bias_hts:launchblock_concat_bias_hts_yes? => concat_bias_hts
             block_NEMO => launchblock_mergeola & launchblock_concat_bias_hts
             """

[runtime]
  [[ root ]]
     [[[ environment ]]]
       ig_typeIAU=14
       ig_anasam=1
       ig_ntimes0={{ig_ntimes}}
       flag_stopsystem="no"
       ig_mergeola=1
       ig_bias_hts=1
       FLAG_BIAS_HTS="yes"
       julmin_compute_HTS_bias_correction=27317
       julstop=27359
       ig_tint=7

  [[initvar]]
       pre-script = """
       set -x

       [ {{ig_modesystem}} -eq 1 ] && ig_typeIAU=0
       [[ ${ig_typeIAU} -ge 1 && ${ig_typeIAU} -le 9 ]] && cycIAU=0
       [[ ${ig_typeIAU} -ge 10 && ${ig_typeIAU} -le 19 ]] && cycIAU=-1
       [[ ${ig_typeIAU} -ge 20 && ${ig_typeIAU} -le 29 ]] && cycIAU=-2
       export flag_launchblock_DATA="yes"
       touch /utmp/cmer/legalloudeco/initvar
       """

    [[block_INIT]] 
      script = touch /utmp/cmer/legalloudeco/block_INIT ; sleep 15
    [[launch_blockDATA]]
        inherit = initvar
        script = """
          flag_launchblock_DATA="yes"
          [ {{ig_modesystem}} -eq 3 ] && flag_launchblock_DATA="no"
          [[ {{ig_modesystem}} -eq 0 && {{ig_reload_DUP}} -eq 1 ]] && flag_launchblock_DATA="no"
          [[ {{ig_modesystem}} -eq 4 || {{ig_modesystem}} -eq 5 ]] && flag_stopsystem="yes"
          if [[ $flag_launchblock_DATA == 'yes' ]]; then
                cylc message 'yes_launchblock_DATA'
          else
                cylc message 'no_launchblock_DATA'
          fi
          if [[ $flag_stopsystem == 'yes' ]]; then
                cylc message 'yes_stopsystem'
          else
                cylc message 'no_stopsystem'
          fi
          touch /utmp/cmer/legalloudeco/launch_blockDATA ; sleep 15
        """
        completion = succeeded and ((launch_blockDATA_yes or launch_blockDATA_no) and (flag_stopsystem_yes or flag_stopsystem_no))
        # Register the three custom outputs:
        [[[outputs]]]
            launch_blockDATA_yes = 'yes_launchblock_DATA'
            launch_blockDATA_no = 'no_launchblock_DATA'
            flag_stopsystem_yes = 'yes_stopsystem'
            flag_stopsystem_no = 'no_stopsystem'

    [[block_DATA]] 
      script = touch /utmp/cmer/legalloudeco/block_DATA ; sleep 15

    [[launchblock_NEMO]]
        inherit = initvar
        script = """
          flag_launchblock_NEMO="yes"
          cycle=${CYLC_TASK_CYCLE_POINT}
          [ ${cycle} -lt {{cyclefirstmodel}} ] && flag_launchblock_NEMO="no"
          [ "${flag_stopsystem}" = "yes" ] && flag_launchblock_NEMO="no"

          if [[ $flag_launchblock_NEMO == 'yes' ]]; then
                cylc message 'yes_launchblock_NEMO'
          else
                cylc message 'no_launchblock_NEMO'
          fi
          touch /utmp/cmer/legalloudeco/launchblock_NEMO_${cycle} ; sleep 15
        """
        completion = succeeded and (launch_blockNEMO_yes or launch_blockNEMO_no)
        # Register the three custom outputs:
        [[[outputs]]]
            launch_blockNEMO_yes = 'yes_launchblock_NEMO'
            launch_blockNEMO_no = 'no_launchblock_NEMO'


    [[block_NEMO]]
      script = touch /utmp/cmer/legalloudeco/block_NEMO_${CYLC_TASK_CYCLE_POINT} ; sleep 15
    [[launch_ANALYSIS]]
        inherit = initvar
        script = """
          flag_launch_ANALYSIS="yes"
          cycle=${CYLC_TASK_CYCLE_POINT}
          [ "${flag_stopsystem}" = "yes" ] && flag_launch_ANALYSIS="no"
          [ ${cycle} -lt {{cyclefirstassim}} ] && flag_launch_ANALYSIS="no"
          [ ${cycle} -gt {{cyclestopassim}} ] && flag_stopsystem="yes"
          if [[ $flag_launch_ANALYSIS == 'yes' ]]; then
                cylc message 'yes_launch_ANALYSIS'
          else
                cylc message 'no_launch_ANALYSIS'
          fi
          if [[ $flag_stopsystem == 'yes' ]]; then
                cylc message 'yes_stopsystem'
          else
                cylc message 'no_stopsystem'
          fi
          touch /utmp/cmer/legalloudeco/launch_ANALYSIS_${CYLC_TASK_CYCLE_POINT} ; sleep 15
        """
        completion = succeeded and ((launch_ANALYSIS_yes or launch_ANALYSIS_no) and (flag_stopsystem_yes or flag_stopsystem_no))
        # Register the three custom outputs:
        [[[outputs]]]
            launch_ANALYSIS_yes = 'yes_launch_ANALYSIS'
            launch_ANALYSIS_no = 'no_launch_ANALYSIS'
            flag_stopsystem_yes = 'yes_stopsystem'
            flag_stopsystem_no = 'no_stopsystem'

    [[launch_BIAS_HTS]]
        inherit = initvar
        script = """
          flag_launchblock_BIAS_HTS="yes"
          cycle=${CYLC_TASK_CYCLE_POINT}
          [ {{ig_modesystem}} -ge 2 ] && flag_launchblock_BIAS_HTS="no"
          [ ${ig_bias_hts} -eq 0 ] && flag_launchblock_BIAS_HTS="no"
          if [ "${FLAG_BIAS_HTS}" = "yes" ] ; then
              [ ${julmin_compute_HTS_bias_correction} -gt $((${julstop}-(${ig_ntimes0}-${cycle})*${ig_tint})) ] && flag_launchblock_BIAS_HTS="no"
          fi
          if [[ $flag_launchblock_BIAS_HTS == 'yes' ]]; then
                cylc message 'yes_launchblock_BIAS_HTS'
          else
                cylc message 'no_launchblock_BIAS_HTS'
          fi
          touch /utmp/cmer/legalloudeco/launch_BIAS_HTS_${CYLC_TASK_CYCLE_POINT} ; sleep 15
        """
        completion = succeeded and (launch_BIAS_HTS_yes or launch_BIAS_HTS_no)
        # Register the three custom outputs:
        [[[outputs]]]
            launch_BIAS_HTS_yes = 'yes_launchblock_BIAS_HTS'
            launch_BIAS_HTS_no = 'no_launchblock_BIAS_HTS'

    [[block_BIASTS]]
      script = touch /utmp/cmer/legalloudeco/block_BIASTS_${CYLC_TASK_CYCLE_POINT} ; sleep 15

    [[launchblock_SAM]]
        inherit = initvar
        script = """
          flag_launchblock_SAM="yes"
          [ {{ig_modesystem}} -ge 1 ] && flag_launchblock_SAM="no"
          [ ${ig_anasam} -eq 0 ] && flag_launchblock_SAM="no"
          if [[ $flag_launchblock_SAM == 'yes' ]]; then
                cylc message 'yes_launchblock_SAM'
          else
                cylc message 'no_launchblock_SAM'
          fi
          touch /utmp/cmer/legalloudeco/launchblock_SAM_${CYLC_TASK_CYCLE_POINT} ; sleep 15 
        """
        completion = succeeded and (launchblock_SAM_yes or launchblock_SAM_no)
        # Register the three custom outputs:
        [[[outputs]]]
            launchblock_SAM_yes = 'yes_launchblock_SAM'
            launchblock_SAM_no = 'no_launchblock_SAM'

    [[block_SAM]]
      script = touch /utmp/cmer/legalloudeco/block_SAM_${CYLC_TASK_CYCLE_POINT} ; sleep 15

    [[launchblock_mergeola]]
        inherit = initvar
        script = """
          flag_launchblock_mergeola="yes"
          [ ${ig_mergeola} -eq 0 ] && flag_launchblock_mergeola="no"
          [ {{ig_modesystem}} -eq 3 ] && flag_launchblock_mergeola="no"
          [[ {{ig_modesystem}} -eq -1 && {{ig_ntimes}} -eq 1 ]] && flag_launchblock_mergeola="no"
          if [[ $flag_launchblock_mergeola == 'yes' ]]; then
                cylc message 'yes_launchblock_mergeola'
          else
                cylc message 'no_launchblock_mergeola'
          fi
          touch /utmp/cmer/legalloudeco/launchblock_mergeola_${CYLC_TASK_CYCLE_POINT} ; sleep 15
        """
        completion = succeeded and (launchblock_mergeola_yes or launchblock_mergeola_no)
        # Register the three custom outputs:
        [[[outputs]]]
            launchblock_mergeola_yes = 'yes_launchblock_mergeola'
            launchblock_mergeola_no = 'no_launchblock_mergeola'

    [[ola2wola]]
      script = touch /utmp/cmer/legalloudeco/ola2wola_${CYLC_TASK_CYCLE_POINT} ; sleep 15

    [[launchblock_concat_bias_hts]]
        inherit = initvar
        script = """
          flag_launchblock_concat_bias_hts="yes"
          cycle=${CYLC_TASK_CYCLE_POINT}
          [ {{ig_modesystem}} -ge 2 ] && flag_launchblock_concat_bias_hts="no"
          [ ${ig_bias_hts} -eq 0 ] && flag_launchblock_concat_bias_hts="no"
          if [ ${FLAG_BIAS_HTS} = "yes" ] ; then
              [ ${julmin_compute_HTS_bias_correction} -gt $((${julstop}-(${ig_ntimes0}-1-${cycle})*${ig_tint})) ] && flag_launchblock_concat_bias_hts="no"
          fi
          [ {{ig_reload_DUP}} -eq 1 ] && flag_launchblock_concat_bias_hts="no"
          [[ {{ig_reload_DUP}} -eq 2 && ${cycle} -eq {{cycledeb}} ]] && flag_launchblock_concat_bias_hts="no"
          [ ${cycle} -eq {{ig_ntimes}} ] && flag_launchblock_concat_bias_hts="no"
          [[ {{ig_modesystem}} -eq -1 && {{ig_ntimes}} -eq 1 ]] && flag_launchblock_concat_bias_hts="no"
          if [[ $flag_launchblock_concat_bias_hts == 'yes' ]]; then
                cylc message 'yes_launchblock_concat_bias_hts'
          else
                cylc message 'no_launchblock_concat_bias_hts'
          fi
          touch /utmp/cmer/legalloudeco/launchblock_concat_bias_hts_${CYLC_TASK_CYCLE_POINT} ; sleep 15
        """
        completion = succeeded and (launchblock_concat_bias_hts_yes or launchblock_concat_bias_hts_no)
        # Register the three custom outputs:
        [[[outputs]]]
            launchblock_concat_bias_hts_yes = 'yes_launchblock_concat_bias_hts'
            launchblock_concat_bias_hts_no = 'no_launchblock_concat_bias_hts'


    [[concat_bias_hts]]
      script = touch /utmp/cmer/legalloudeco/concat_bias_hts_${CYLC_TASK_CYCLE_POINT} ; sleep 15
    [[com2newcom]]
      script = touch /utmp/cmer/legalloudeco/com2newcom_${CYLC_TASK_CYCLE_POINT} ; sleep 15
    [[cpmxpackcmz2cpmx]] 
      script = touch /utmp/cmer/legalloudeco/cpmxpackcmz2cpmx_${CYLC_TASK_CYCLE_POINT} ; sleep 15
    [[cpmx2cdf]] 
      script = touch /utmp/cmer/legalloudeco/cpmx2cdf_${CYLC_TASK_CYCLE_POINT} ; sleep 15
    [[cdf2cmx]] 
      script = touch /utmp/cmer/legalloudeco/cdf2cmx_${CYLC_TASK_CYCLE_POINT} ; sleep 15
    [[cmx2splitcmx]] 
      script = touch /utmp/cmer/legalloudeco/cmx2splitcmx_${CYLC_TASK_CYCLE_POINT} ; sleep 15
    [[canredcmx2cmx_forsplit]] 
      script = touch /utmp/cmer/legalloudeco/canredcmx2cmx_forsplit_${CYLC_TASK_CYCLE_POINT} ; sleep 15
    [[cpmx2cdf_forsplit]]
      script = touch /utmp/cmer/legalloudeco/cpmx2cdf_forsplit_${CYLC_TASK_CYCLE_POINT} ; sleep 15

Hi,

Having one Cylc workflow run another is absolutely possible, the main use case is for solving multi-dimensional cycling problems. However, sub-workflows are currently tricky and have many caveats. Sharing configuration between them is difficult and it introduces a management problem (as you now have many workflows rather than one).

If the intention is about splitting apart the model component from the surrounding infrastructure, then it’s not necessary to adopt a sub-workflow pattern, there is another approach which should be a lot easier. Rather than splitting it into two workflows, split it into two Cylc files.

Cylc supports include files e.g:

#!Jinja2

{% include "model.cylc" %}

These allow you to split up a single flow.cylc file into multiple *.cylc files. A common use for this pattern is to make workflows portable by putting the site-specific bit (typically platforms and directives) into site-specific include files which are only activated at the relevant sites. E.g:

(flow.cylc)

#!Jinja2

# configure the site we are running the workflow at
# (this could be done in a Rose optional configuration)
{% set SITE = "niwa" %}

# configure the generic parts of the workflow
[runtime]
  [[my_task]]
    script = my_script

# include the site-specific stuff
{% include "site/" + SITE + ".cylc" %}

(site/niwa.cylc)

[runtime]
  [[my_task]]
    platform = ...
    [[directives]]
       ... = ...

However, this approach can also be used to write more modular workflows. Here’s an overview of the approach used in a simple modular workflow I worked on a while back:

(flow.cylc)

#!Jinja2

# configure the models to run
{% set models = ["model1", "model2"] %}

# include the required models into the workflow
{% for model in models %}
  {% include "models/" + model + ".cylc" %}
{% endfor %}

# the overarching infrastructure
[task parameters]
    models = {{ models | join(", ") }}

[scheduling]
    [[graph]]
        R1 = """
            prep => <model + 1>_end => <model>_start => post
        """

[runtime]
    [[prep]]
    [[post]]

(models/model1.cylc)

[scheduling]
    [[graph]]
        R1 = """
            model1_start => a & b & c => model1_end
        """

[runtime]
    [[model1_start]]
    [[model1_end]]
    [[a, b, c]]

In this workflow:

  • An arbitrary number of models were run in series.
  • Needed to be easy to add or remove models.
  • Each model could have a different graph.

So we established a simple module pattern where each model had a start and end task (which they were doing anyway).

Your scenario is probably quite different, but I hope this shows how a module pattern can be established to separate models from the surrounding infrastructure.

1 Like

Hi Oliver, thanx for that… As usual, the devil is in the detail. I have already separated out the nemo model tasks into a separate file, so I guess I will keep it that way and go from there.

… and where the structure of a sub-graph needs to be dynamically determined, in every cycle point.

I agree with @oliver.sanders that a sub-workflow approach is not great for this.

I would use Jinja2 to switch the non-model bits on/off (i.e., include or exclude them when the file is parsed) as needed.

As a minimal example, if the “non-model bits” are a => model (pre) and model => b (post):

{% if extra_stuff %}
[scheduling]
    [[graph]]
        R1 = "a => model => b"
[runtime]
    [[a]]
        ...
    [[b]]
        ...
{% endif %}

The boolean extra_stuff switch could be determined in various ways - e.g. as a manual setting, or by code that examines the local environment to see what context it is running in.

The “extra bits” could be held in an include file, if you like.

(Note that Cylc config sections and items can be repeated, in which case the items add to or override previous definitions - so you don’t have to have all the [scheduling] content (say) in the same location in the file).