Cylc’s cycling capabilities aren’t just good for datetime cycling systems.
Say you have N datasets that need to be processed as quickly as possible (or you need to compute something for N different river catchments, or whatever).
(A) parameterized tasks
You could use task parameters to duplicate the sub-graph for each dataset (or catchment, or whatever):
#!Jinja2
[scheduler]
allow implicit tasks = True
[task parameters]
m = 1..{{N}}
[scheduling]
[[graph]]
R1 = "prep => process<m> => products<m> => upload<m> & archive<m>"
Here’s the graph for N=3
. Each parameter value makes a distinct sub-graph (each with cycle point 1
).
(B) integer cycling
Or you could cycle over the datasets (or the river catchments, or whatever):
#!Jinja2
[scheduler]
allow implicit tasks = True
[scheduling]
cycling mode = integer
final cycle point = {{N}}
[[graph]]
R1 = "prep"
P1 = "prep[^] => process => products => upload & archive"
Here’s the graph for N=3
. Now there’s only one set of tasks, but they’re repeated over N
cycle points:
Recommendation: if N is large USE CYCLING!
Both approaches work, and if N
is not too large then either one will do.
But if N
is large, cycling is much more efficient.
(In fact the same cycling workflow can process an arbitrarily large number of datasets at no extra cost).
E.g. for N=2500
:
-
The parameterized case is a massive graph of 10,000 distinct tasks. A large number of tasks become ready at once when
prep
finishes, then the scheduler has to manage them all, and the UI has to display them all (and more) even though most of them aren’t going to run any time soon thanks to internal queue limits and external resource constraints. -
The cycling graph has only 4 distinct tasks per cycle, and the scheduler extends the graph dynamically to future cycles, at run time. It (and the UI) only needs to manage the active cycle points. And with no dependencies between cycles it runs all the cycles at once, out to the configurable
runahead limit
- so cycling need not restrict job throughput.
(Addendum: of course if you set the runahead limit to 2500 cycles then the two methods become more or less equivalent. But if you don’t have an entire HPC to yourself then cycling, with an appropriate runahead limit, will make your life much much easier.)