Useful workflow pattern for "case studies", with start, end and recurrence?

Hi all - as just discussed with @oliver.sanders, am wanting to have a workflow which can run a given graph in one of two modes:

  • near-real-time: vanilla 6H recurrence cycling - am fine writing the graph recurrence syntax for this
  • historic “case study” mode: same graph, but running through a set of past events, each event defined by a start (initial cycle point) & end (final cycle point), again at 6H recurrence.

It’s the graph recurrence syntax for the latter which I’m struggling with.

In case it’s of use, below I’ve put a Jinja snippet from my proto-workflow illustrating the characteristics of the overall event set:

  • as you can see, the duration for a given event changes (i.e. start & end dates are separated by different, per-event intervals)
  • note there is no linking between events - these are all well-separated in time, and independent
  • if performance considerations come in (see this has come up in discussions linked above), overall there will be OoM(10) events (~20-30), each of this sort of few day (<10 day) duration - so under OoM(10) recurrences per event
    • performance is a secondary consideration for these historic runs - fine with slightly sluggish!
    • note we’re likely to do some parametric exploration - “workflow_Run_A through events with model_config_A; workflow_Run_B seeing what happens with model_config_B; …”. But we’re fine to run these sort of experiments serially, rather than in parallel
# RISER event dates
# * TODO: replace hardcoded values here with dynamic pandas-based reading from csv of event list
{%
    set riser_events = (
        {"Event": "R01", "Start": "2024-03-23T21Z", "End": "2024-03-24T18Z", "Kp_max": "8"},
        {"Event": "R02", "Start": "2023-09-19T00Z", "End": "2023-09-25T00Z", "Kp_max": "6"},
        {"Event": "R03", "Start": "2019-08-31T09Z", "End": "2019-09-01T12Z", "Kp_max": "5"},
    )
%}

Sadly, ISO8601 (:2005) does not provide us with a R/start/stop/interval syntax.

However, the cycling interval is regular for your examples (PT6H), so we do have the option of RN/start/interval, e.g:

# run once every year, starting in 2000, stopping after 5 repetitions
R5/2000/P1Y

So, if we can compute N, there’s a nicer alternative to configuring every cycle point individually. Nicer as in easier to read, but also, much easier for Cylc to run when the number of events gets large.

Here’s my proof of concept:

  • Define your list of events (you might want to do this in a separate file (e.g.).
  • Compute the number of cycles between the start and stop of each event.
  • I’ve created a parameter for each event and used it for every tasks within that event. This way we can support overlapping events (because each task has a unique name).
  • You can also add extra parameters to this matrix, e.g. model.
#!Jinja2

{% from "generate" import count %}

{%
    set riser_events = [
        {"Event": "R01", "Start": "2024-03-23T21Z", "End": "2024-03-24T18Z", "Kp_max": "8"},
        {"Event": "R02", "Start": "2023-09-19T00Z", "End": "2023-09-25T00Z", "Kp_max": "6"},
        {"Event": "R03", "Start": "2019-08-31T09Z", "End": "2019-09-01T12Z", "Kp_max": "5"},
    ]
%}

[task parameters]
    event = {{ riser_events | map(attribute='Event') | join(', ') }}

[scheduler]
    allow implicit tasks = True

[scheduling]
    initial cycle point = 2000
    [[graph]]

{% for event in riser_events -%}
    {% set _count = count(event['Start'], event['End'], 'PT6H', True) %}
    {% set param = "<event=" + event["Event"] + ">" %}

        # Event: {{ event['Event'] }}
        R{{ _count }}/{{ event["Start"] }}/PT6H = """
            run{{ param }}[-PT6H] => run{{ param }}
        """
{%- endfor %}

[runtime]

{% for event in riser_events %}
    [[<event={{ event['Event'] }}>]]
        [[[environment]]]
            EVENT = {{ event['Event'] }}
            KP_MAX = {{ event['Kp_max'] }}
{% endfor %}

    [[run<event>]]
        inherit = <event>

To compute the count, I’ve just written a thin wrapper around the generate function. There will be a nicer, more mathsey way to do this I’m sure, but it works:

from metomi.isodatetime.parsers import TimePointParser, DurationParser


TPP = TimePointParser(assumed_time_zone=(0,0))
DP = DurationParser()


def generate(start, stop, duration, stop_inclusive=False):
    """Generate cycles between start and stop.

    This implements the recurrence format: R/start/stop/interval

    Note, this assumes UTC.

    This approach can be used to pre-generate cycles in Cylc workflows.
    Be aware that explicitly generating cycle points in this way is less
    efficient than allowing Cylc to generate them on the fly so use this
    approach sparingly as it may cause Cylc to use more CPU that you might like
    it to.

    Args:
        start: The start cycle as an ISO8601 date (e.g. 2000 or 20000101T00)
        stop: The stop cycle as an ISO8601 date (e.g. 2000 or 20000101T00)
        duration: The cycling interval as an ISO8601 duration (e.g. P1Y)
        stop_inclusive: If True, then "stop" will be included in the results.

    Yields:
        ISO8601 datetimes.

    """
    start = TPP.parse(start)
    stop = TPP.parse(stop)
    duration = DP.parse(duration)

    pointer = start
    if stop_inclusive:
        while pointer <= stop:
            yield pointer
            pointer = pointer + duration
    else:
        while pointer < stop:
            yield pointer
            pointer = pointer + duration


def count(start, stop, duration, stop_inclusive=False):
    """Count the number of cycles between start and stop.

    See "generate" above for arguments.

    Returns:
        Integer count.

    """
    _count = 1
    for _ in generate(start, stop, duration, stop_inclusive):
        _count += 1
    return _count

From an efficiency perspective, this will mostly impact workflow validate / startup time (Jinja2 is only run when the workflow configuration is read in). You’ll know when you’re pushing the limits because cylc validate will get slow. With 2000 “R02” events, the config took ~20 seconds to load, so really not too bad. After that, this should be reasonably efficient.

1 Like

Hi @edmundh

@oliver.sanders ‘s solution looks good, but just for completeness (and assuming I’ve understood your use case from a quick read) - wouldn’t it be more natural to run each of your “events” as a separate workflow? Perhaps as sub-workflows (i.e., launched by tasks in a top level controlling workflow)?

NOTE: I’m not really suggesting you try this, given Oliver’s solution, because sub-workflows do bring some complications - but it would still be interesting to know if that would be the most natural way to do it in principle, because we have future plans to make sub-workflows more “built in” to Cylc.

1 Like

Thanks very much @oliver.sanders and @hilary.j.oliver - greatly appreciated.

Yes it’s a good point Hilary - I’d half thought about supporting the “run in historic mode” use-case via a basic bash wrapped looping through the examples, and running the central workflow. But using your subworkflow pattern (very readable write-up, thanks!) would definitely be easier for running / monitoring - all the usual cylc benefits!

I’m currently mocking the workflow up with dummy implicit tasks, so it’s pretty easy for me to try both approaches, and see if I can get insights as to what will end up being a better match to our needs. Will give it a whirl, and let you know!

Given your “interesting to know” point Hilary, I’ve put some more context below on these use-cases/needs. Some of them are probably quite specific to this project, but I think the general “be able to easily pivot a workflow between forecast and multi-event hindcast mode” requirement is probably a common need for modelling of events rare enough that we can call them “events”!
Here it’s space weather events, but could equally apply to terrestrial weather/climate events, e.g. tropical cyclones.

Click for more explicit context on use-cases/needs

This is for a space weather “research-to-operations” project (RISER), investigating the viability of using an alternative observation source to drive an obs-post-processing => multi-model chain.

Proof-of-principle has been well-demonstrated by researchers, but with research-grade software coupling of the various links in the chain - i.e. too brittle for operations. Similarly, most of the research work has been for research purposes, so has been run with that mindset, using all the data available at time of research to constrain things best → anachronistic effects like using observations “from the future” of event being simulated. Fine for best science, but means results probably aren’t representative of what you’d get operationally, running in near-real time.

We’re planning to use a cylc+rose workflow here to improve matters from R2O perspective:

  • workflow to make the chaining / composition / architecture of the various elements more operationally viable
    • workflow should also allow easier running of experiments exploring sensitivity to various post-processing / modelling parameter choices
  • various workflow modes to ensure this is operationally viable, and explore performance

We’re keen to characterise performance of this approach on impactful events - which are relatively rare. As such, the “near-real time” mode lets us demonstrate the approach is operationally robust/viable in day-to-day use. While running in the historic “hindcast” mode over several events lets us characterise how it performs during these rare events (in representative manner - and check that it’s robust then too!). Will also support a historic “campaign” mode allowing anachronistic obs, so can compare this to hindcasts.

Given this, I think my main requirements are as follows:

  • scientific: be able to easily pivot between running in “near-real time” and both historic modes - and be sure that “near-real time” and “hindcast” modes are equivalent - that we don’t cheat in hindcast mode (i.e. only use observations available at the time).
  • software implementation: to enable above, I much prefer having a common core graph & apps which all modes run (and not simply having distinct per-mode workflow codebases which I manually try to ensure are equivalent). And use the cycle point datetimes + mode-dependent logic to bracket what observations the underlying apps can access in given mode.
  • human: this is a joint project with various research partners, who’ve not come across cylc+rose before. Those most likely to interact with & extend workflow are training up (thanks to Oliver and team!) - but this is still new to them. As such I’m keen to ensure that whatever implementation I take, it’s as simple as I can make it. So there’s not too much conceptual “impedance mismatch” between the tutorials they’ll be working through and the workflow we’ll be using in project. Enlightened self-interest too, as I’ll likely need to help maintain this, for duration of project at minimum!

IMO, this use case is not a natural fit for sub-workflows, we’ve just come up against a limitation of cycling syntax. If ISO8601-1 supported a start/step/stop recurrence, we wouldn’t think twice about how to implement this workflow.


ISO8601-1 doesn’t actually support the concept of recurrences as Cylc uses them, R/2000/P1D actually refers to a repeating interval, i.e the time span between events not a recurrence of discrete events. This is part of the reason some recurrences are awkward or cannot be written in this syntax. A later revision of the specification finally added support for recurrences in the true sense which they achieved by incorporating the RRULE syntax (what Apple’s iCal and now other calendar packages like MS Outlook use).

So far, I haven’t encountered a problem that cannot be expressed in RRULE. In this case, the R01 event would be written like so:

DTSTART:20240223T210000Z
RRULE:FREQ=HOURLY;UNTIL=20240324T180000Z;INTERVAL=6;WKST=MO

Not the prettiest syntax, but it works flawlessly. Thankfully rrule.js offers a human language based representation which is somewhat easier to comprehend:

every 6 hours until March 24, 2024

See the RRULE demo for more.

Whilst surprisingly easy to implement RRULE support in Cylc, there are some technical barriers / limitations, namely the lack of alternative calendar support and limitations of the human language representation (which is only available in the JS library anyway) which would likely be a prerequisite to real world usage.

Of course, there is another option which is just to extend the ISO8601 syntax to support start/step/stop syntax, I doubt there would be any techincal challenges to doing so, just a syntax/representation problem.

What I meant by that was, if there’s no interdependence between the different “event” graphs ( as @edmundh said, I think) then there’s no particular need for them to be in the same workflow - especially if they don’t have the same initial and final cycle points (although you’ve found a good solution for that problem).

(That said, this certainly doesn’t have the dynamic sub-graph configuration problem that is the classic reason to use sub-workflows).