Wall time periods of allowed task startup

aulemahal · November 17, 2023, 10:57pm

Hi all!

I’m starting to learn how to use Cylc, and for the moment I really like it. I was wondering if there was an existing mechanism for limiting the startup of some tasks to a specific weekly schedule.

In the workflow I am creating, I have 2 steps that are resource-hungry. I am already limiting both with their own internal queue with limit = 1. However, in both cases, I was also told to limit the activity during the workday.

To be specific, one step is the transfer of a few GBs from an external HPC to the local server. I am using globus for this. These large transfers tend to hog the bandwith and slowdown the internet for all my colleagues, so I only do them during the night. My intuition was to write a custom xtrigger with python that returns True when the current time is the night (or the weekend) and False during the workday. However, I’ve read elsewhere in this discourse that : “once satisfied, a trigger function will not be called again”. Which makes my idea useless.

My second resource-intensive step is even more complex as it combines a time period schedule with a per-day maximum limit of transfer. But it’s the same idea that I’d want a trigger that can switch state.

My other idea was to replace the xtrigger by a task with infinite retries. That feels wrong and unelegant, thus my question here.

Thanks!

hilary.j.oliver · November 17, 2023, 11:23pm

If I understand your use case, that’s exactly what Cylc’s advanced cycling capabilities are for. The idea is you create a graph of tasks that repeat on some datetime cycle, and attach clock triggers to some tasks to connect their “cycle points” to the real time clock.

Your intuition was kinda good (again, if I have understood your use case). Clock-triggers are a built-in xtrigger. They are not called again once satisfied, but the next instance of the task has a new clock-trigger relative to its new cycle point.

Here’s an example:

[scheduling]
    initial cycle point = 20231116T00Z  # midnight UTC
    [[graph]]
        P1W = @wall_clock => foo => bar  # weekly cycle tied to the clock
[runtime]
    [[foo]]
        script = sleep 10
    [[bar]]
        script = sleep 10

A few things to note:

P1W is short for R/^/P1W, i.e. repeat indefinitely on a weekly cycle starting from the initial cycle point. So in this case it is the initial cycle point that is tying the cycle to midnight (adjust for own timezone).
There’s a lot of flexibility in how you configure cycling - see the user guide. You can have multiple cycles in the same workflow, and task dependencies that cross over between them.
The default @wall_clock xtrigger triggers when the real time clock is exactly equal to the dependent task’s cycle point. You can also configure an offset relative to the cycle point.
Note the cycle points are full datetimes, not just “times of day”, so if you start the workflow “behind the clock” the clock triggers will have no effect until the workflow catches up to real time.

On the last point, if you run my example above, with an initial point of a few days ago, the first cycle will trigger immediately (because that datetime has already passed) and the next cycles (out to the 5 cycle runahead limit) will be lined up waiting on their weekly clock triggers:

Hope that helps?

aulemahal · November 17, 2023, 11:40pm

Thanks for the answer. I think your answer doesn’t really apply to my use case, but I also hadn’t given much details.

I’m already using the datetime cycling feature because my workflow is about handling the data from a climate model. On the HPC the runs are done by monthly batches, so I am using a datetime cycling corresponding to those month slices. Of course, it has nothing to do with real world time. For example, my current test handles data starting in 206501 and running up to 210012.

Maybe I don’t understand some basics about the order in which the tasks and triggers are run. The section of interest in my workflow looks like:

[scheduling]
    initial cycle point = 20650101T00Z
    final cycle point = 21001201T00Z
 
    [[xtriggers]]
        allow_transfer = is_transfer_allowed():PT30M
      

    [[graph]]
        P1M = """
        simulated & @allow_transfer => transfer => final_task
        """

    [[queues]]
        [[[q_transfer]]]
            limit = 1
            members = transfer

Where simulated is a simple task that checks if all the data for the month in question is available on the HPC. It retries indefinitely each hour until it succeeds. transfer is my globus task and is_transfer_allowed is the xtrigger, a python function that returns False when called during working hours.

So in what order are the things called ? Is allow_transfer checked before simulated succeeds ? If many months are all at the transfer step, is allow_transfer called each time a spot is freed in the queue ?

These corresponds to two “problematic” cases I foresee:

I start the workflow at 22:00. allow_transfer is tested and returns True. simulated is called but fails each hour until the day after at noon. transfer starts even though we are during the workday, because allow_transfer was not tested again.
I start the workflow at 05:00. 12 months of data are already available and the transfers start. At 8:00, only 10 have been successfully transferred. The 11th transfer was already “allowed” since its dependent task succeeded at 5:00, so it proceeds, even though we are now during the workday.

hilary.j.oliver · November 18, 2023, 12:40am

Ah, OK, got it.

Yeah, clock-triggers are only intended for cycling real-time workflows.

And xtriggers wait for an external condition to become satisfied, at which time dependent tasks can trigger.

I think the proper solution to this problem is a batch queue (PBS, Slurm, …) that only releases jobs after hours. However, if your site does not provide such a queue (and aren’t willing to do so) we may be able to work around it in Cylc.

simulated & @allow_transfer => transfer

So in what order are the things called ? Is allow_transfer checked before simulated succeeds ?

xtriggers start getting checked as soon as the dependent task enters the active window of the workflow, which is when its first task prerequisite gets satisfied. If transfers only parent task in the graph is simulated, then transfer will be spawned as soon as simulated succeeds, and Cylc will start checking @allow_transferthen. (There’s really no point in checking earlier than that, because transfer could not run anyway due to its dependence on simulated).

If many months are all at the transfer step, is allow_transfer called each time a spot is freed in the queue ?

It will be called as soon as the task enters the active window, and queued tasks are considered “active” (they are being “actively managed” by the scheduler).

A subtlety of xtriggers is that their uniqueness is determined by the uniqueness of their function argument values. I.e. Cylc assumes that if you call a function with the same arguments you should get the same result.

So if you define an xtrigger that simply checks “is it after-hours yet?” and then make every cycle depend on it, it will be considered satisfied for all time once it returns True, so it won’t constrain upcoming cycles at all.

So (if using xtriggers) you should add the dependent task’s cycle point as an xtrigger argument (the user guide shows how to do this - something like “%(point)s” as a string template for the cycle point). The xtrigger function might not use the argument at all, but it effectively makes a new xtrigger that will start checking anew every time a new task instance comes into the active window, instead of one that is shared by all dependent tasks.

This would fix your case (1.) IF transfer has no other parents to spawn it into the active window - because the xtrigger won’t start checking until simulated succeeds.

However, if transfer gets spawned by another parent before simulated succeeds, you will run into problem (1.).

For case (2.) problem is that all 12 months got queued (in your internal Cylc queue) - which means they are ready to run as soon as the queue releases them, but your queue only releases one task at a time, and before it empties external circumstances change such that you want the already-queued tasks be treated as “not ready” any more!

I think a robust solution might be (a) use a task instead of an xtrigger so that time-of-day checking does not start until simulated succeeds, regardless of what parent tasks there are; and (b) set runahead limit = P0 to force a single cycle at a time (which your are effectively doing with the limit = 1 queue anyway). BTW simulated could potentially be an xtrigger.

# with runahead limit = P0
simulated => allow_transfer => transfer

or to still use the xtrigger:

@allow_transfer => dummy
simulated => dummy => tranfer

Note that with runahead limit P0 (one cycle at a time) the next simulated won’t run until transfer is finished. I’m guessing simulated it quick so that probably doesn’t matter, but to get more concurrency you could leave runahead limit alone and put in an intercycle dependency:

@allow_transfer => dummy
simulated => dummy => transfer
simulated[-P1M] => simulated

aulemahal · November 20, 2023, 8:34pm

Thanks a lot!

I’ll think about setupping a proper batch queue, I am even thinking implementing my own very simple batch queue by subclassing the background submission method. The idea being making the solution more portable and hopefully contained in the same source repo as the workflow configuration.
I was also wondering if I could have the globus transfer job treated as such, i.e. avoiding a globus task wait ID in bash script and relying on cylc’s own job polling to get the transfer status.

But all this is a bit more work than I can afford right now. For the time being, I followed your other advices and the solution is to limit the runahead to 12 months and make the xtrigger cycle-dependent. I also added a “buffer” time, i.e. the last transfer can start at N minutes before the workday begins, where N is an estimation of the time it takes to empty the queue (to transfer 12 months).

Topic		Replies	Views
Xtriggers functionality Cylc Support	2	551	November 25, 2021
External triggers (xtriggers) and performance Cylc Support	0	390	August 1, 2019
Parameterized clock triggers? Cylc Support	5	596	June 15, 2020
Best practice for on demand systems? Cylc Support	14	1185	August 20, 2020
Starting a new flow to re-run earlier clock-triggered cycle Cylc Support	7	195	March 4, 2024

Wall time periods of allowed task startup

Related topics