Dynamical task creation

I am interested to understand if dynamical task creation is possible.

Let’s say task a results in the output (files) x, y and z, which are unknown in name and number. For each of these outputs I would like to add tasks a=>f(x), a=>g(y) and a=>h(z) to the graph of the running suite. Further tasks b and c, depending on the dynamically created tasks f, g, h, would be of interest as well, e.g. dependencies like a=>f(x)=>b or a=>g(y)&h(z)=>c.

I had a problem at work today where I considered such dynamical task creation as possible solution but could not find it immediately in the documentation. There is no need for me to have this feature (as I can find another way for what I was intending to do) but I thought it would be good to understand if above described dynamical tasks are available / desirable in Cylc.

Caveat - I’m a relatively new and inexperienced member of the support team.

I’ve had a skim through the documentation on task-triggering at https://cylc.github.io/doc/built-sphinx/suite-config.html#task-triggering and have come to the conclusion that nothing does exactly what you want although you may be able to get your script (a) to provide message triggers which will help.

Where I’m becoming a bit speculative is my second suggestion: You could run a second suite as from one of your tasks, using your unknown number of files to populate a jinja2 template, or otherwise.

1 Like

Hi @BHFock,

@wxtim is dead right (thanks @wxtim!), the best way to get this kind of dynamic behaviour (where parts of the workflow graph are not known until runtime) is sub-suites: your task a should run another suite (with --no-detach so that task a does not finish until the sub-suite does) that can be configured by Jinja2 logic with your dynamically-determined parameters. Note that you can now tell sub-suites to use an alternate run directory (see cylc register --help) - this can make housekeeping easier if you have one or more sub-suites being generated on every main-suite cycle.

You can do it without sub-suites if you know everything that could be needed downstream of a: just define all those paths in the graph (with each triggered by a message trigger from a) and use suicide triggers to remove those that aren’t needed at run time. However, that is quite painful; I would definitely use sub-suites. (You will be able to define alternate paths through the graph without any need for suicide triggers soon … that’s a planned Cylc 9 feature that is probably now going to be in Cylc 8).

Hilary

2 Likes

Sounds like you have a branching workflow problem where you want to change the graph at runtime. Dynamic graphs are a bit tricky in Cylc7.

It is possible though. Sub-suites are a nice solution for some use cases.

To make you aware of another option which may be worth considering, you can “turn-on” all possible branches in the graph, then selectively switch them off on a cycle-by-cycle basis depending on the output of the relevant task.

First add all of the possible branches to the graph (in this example using message triggers):

# branch (a)
foo:a => bar & baz
# branch (b)
foo:b => bar => pub
# branch (c)
foo:c => qux

Then dynamically “turn-off” the undesired branches at runtime by:

  1. Using script:

    script="""
        if [[ <some-condition> ]]; then
           the-script-to-run
        else
           echo 'skipping task'
        fi
    """
    

    A simple solution suitable for small background jobs, not so good for jobs which require submission as this job will still queue (wasting resources in the process).

  2. Using the expired state:

    [runtime]
       [[_BRANCH_A]]
       [[_BRANCH_B]]
       [[_BRANCH_C]]
       [[bar]]
           inherit = _BRANCH_A, _BRANCH_B
       [[baz]]
           inherit = _BRANCH_A
       [[pub]]
           inherit = _BRANCH_B
       [[qux]]
           inherit = _BRANCH_C
       [[foo]]
           # ...
           post-script="""
               all_branches=(_BRANCH_A _BRANCH_B _BRANCH_C)
               branch_to_run=$(<somelogichere>)
               branches=( "${all_branches[@]/$branch_to_run}" )
               for branch in ${branches[@]};
                   cylc reset "$CYLC_SUITE_NAME" "$branch.$CYLC_TASK_CYCLE_POINT" -s expired
               endfor
           """
    

    These tasks are still visible in Rose Bush / Cylc Review providing you with some kind of logging which can be helpful in some situations.

  3. By removing them from the suite:

    As for (2) but using cylc remove rather than cylc reset.

    These tasks get completely removed so become disappear in Rose Bush / Cylc Review.

  4. Using suicide triggers:

    foo:c => !bar
    foo:b | foo:c => !baz
    foo:a | foo:c => !pub
    foo:a | foo:b => !qux
    

    Warning suicide triggers are tricky, write them out one per task to suicide as above.

Note: The task foo can communicate its outcome to all tasks in the current cycle using `cylc broadcast.

1 Like