Monthly tasks not disappearing from graph after multiple months

I’m running a multi-month trial. I have tasks that run at the start of the month, which are dependencies of a 6-hourly task that runs N days later. In this case, N is 2 days. After several months, my monthly tasks still appear in the graph. I’m not sure why, when they shouldn’t be needed for spawning anything new. I’m also not sure I’m using the graph syntax correctly.

The graph looks like this

[[[ 01T00!^ ]]]  {# first of the month, but not the first cycle #}
      graph = """
            housekeep[-PT6H] => musli_surface_setup => SURFACE_DECISIONS
            SURFACE_DECISIONS:succeed-all => musli_surface_make => musli_finish
             musli_surface_make => musli_surface_monitor => musli_finish
             musli_finish => housekeep
[[[03T00]]]  {# Stationlist update day. make sure musli has run before updating stationlists #}
       graph = "musli_finish[01T00-P1M] => cycle_check"

[[[ 01T00!^ ]]]  {# first of the month, but not the first cycle #}
       graph = "housekeep[-PT6H] => calc_groundgps_bias"

[[[03T00]]]  {# Bias update day #}
       graph = "calc_groundgps_bias[01T00-P1M] => cycle_check"

housekeep and cycle_check run every 6 hours. musli tasks and calc_groundgps_bias run at the start of the month, and are dependencies of cycle_check 2 days later.
This works fine, except that all the musli and calc_groundgps_bias tasks remain visible in the gui forever after.

Why does this happen, and what is the best dependency syntax to use?

I suppose if I definitely want to do the updates on the 3rd, I can change to [-P2D] for the dependency.

@srennie - have you left out some graph sections? I don’t see the 6-hourly graph. If the fulll graph is too complicated to post, maybe you could try to reproduce the problem with a simpler graph (with the same cycling structure, however) of dummy tasks?

Yes, that’s just the relevant snippet for the tasks hanging around - and I removed some redundant task paths. The full graph is whole NWP suite.
I’ll see if I can come up with a simpler graph to demonstrate.

I think I can reproduce the issue with this suite.

    UTC mode = True
 title="monthly update suite"
    initial cycle point = 20191120T0600Z
    final cycle point = 20200305T0000Z
    max active cycle points = 3
            graph = """
                cycle_check => do_stuff => housekeep
                do_stuff[-PT6H] => do_stuff
# first of the month, but not the first cycle
        [[[ 01T00!^ ]]]
            graph = housekeep[-PT6H] => monthly => housekeep

# Stationlist update day. make sure musli has run before updating stationlists
            graph = monthly[01T00-P1M] => cycle_check

        script = sleep 10
            host = localhost

            batch system = background
            execution time limit = PT1M
            submission retry delays = PT37S

        inherit = None, FAMILY

OK, problem reproduced.

I made your example work a bit faster by increasing the 6 hour cycle to 1 day, and deleting sleep 10 for the jobs.

I think what’s going on is, the clean-up algorithm that gets rid of finished tasks is interpreting this:

monthly[01T00-P1M] => cycle_check

as dependence on an absolute cycle point (which rightly keeps the depended-on task around for ever, in Cylc 7)… even though it isn’t.

I’m not sure this is worth fixing in Cylc 7, for the following reasons:

  • it’s an unusual dependency form that isn’t used much
  • it doesn’t cause other succeeded tasks to be retained (which could create a performance problem)
  • from your CYLC_VERSION value above you’re very far from up to date, even with Cylc 7
  • this will not affect Cylc 8, which has an entirely new scheduling algorithm

So for now, I’d suggest you just manually cylc remove those tasks every so often, or try doing it automatically with suicide triggers in the graph.

Is that OK?


Thanks, that does clarify things a bit.

I’m not sure if there is a better way to express a dependency on a task that ran at the most recent 01T00 regardless of the relative period between the tasks. I suppose if we pin down the delay we want, using e.g. [-PT2D] would also allow the suite to remove the tasks. Otherwise I’ll look at just removing the tasks.

Hi again @srennie

I just took a closer look at your problem dependency (which I should have done straight off!) and on reflection I don’t think triggers of that form should work at all.

Omitting leading components of a datetime implies a recurrence, not a duration. So 01T00 means “every first of the month”, not “the most recent first of the month”.

So you just need to change the 03T00 section of your graph, something like this:

            graph = "monthly[-P1M2D] => cycle_check"

As an aside, the exclusion in [[[01T00!^]]] won’t be doing anything unless your initial cycle point matches 01T00 (which it doesn’t in your example).

You might need more exclusions to handle the initial cycles of the 03T00 recurrence too, if the first instance of the upstream task is excluded in the other recurrence, and the offset is P1M2D.

Here’s my test example:

    UTC mode = True
    initial cycle point = 20200101T0000Z
            graph = "a[-P1D] => a"
            graph = "a => foo"
            graph = "foo[-P1M2D] => bar"
    [[node attributes]]
        foo = "style=filled","fillcolor=red" 
        bar = "style=filled","fillcolor=blue"

Thanks! I will rework my graph into something more like that.
Although I think we won’t start on other than the first of a month for this project, it is possible the first cycle will be 00 not 06Z.

No matter how many examples of dependencies are in the docs, I always wish there were a few more!