How do I set up a workflow to use parameterized tasks, but not launch all of the tasks in parallel? For example, I have a set of runs parameterized to run with something like:
ldate = 2012,2013,2014,2015
The workflow includes some fairly substantial data wrangling before submitting the executable, and if all 4(ldate)*4(member) jobs start up at once it overloads the system and the jobs will be killed. Is there a way to step through the different tasks sequentially so that doesn’t happen?
Yes, you can define dependencies between parameterized task instances, to make them run in sequence:
However, if there are no real dependencies among those tasks (i.e. you just want to prevent them from all submitting jobs at once and overwhelming the system), well that’s what Cylc’s queues are for:
With queues you can dictate a maximum number tasks (among some group(s) of tasks, or all tasks) that can be active at once. If more than that become ready to run at any point, they will be held back and listed as “waiting (queued)” until one of the active tasks finishes.
(Queues come under the “limiting workflow activity” contents heading in the user guide scheduling configuration section).