Cylc reinstall auto yes?

Hi,

With cylc reinstall is there a way to make it do y automatically instead of prompting? If I’m re-installing 21 workflows at once (7 domains, 3 different configurations each), its tedious having to press y\n for each one. I can do echo y | cylc reinstall .... but would prefer a -y or similar option.

Thanks,
Tom

Hi @TomC,

Apparently not (cylc reinstall --help).

We could easily do it, but I’m not sure a --yes option is that much easier than your workaround, or similarly using the yes command:

$ yes | cylc reinstall [workflow-ID]  

If possible, it would be great for me to have an option in Cylc itself. I construct a command based on various arguments passed to a script which calls Cylc. For me at least it looks neater. It also aligns with things like conda which have that flag.

Fair enough, would you like to contribute a tweak to the cylc reinstall command? :grin:

You could model it on the --yes option that the cylc clean command already has.

I implemented a --yes option as part of adding cylc vr support to cylc tui in this PR - tui: add vr by oliver-sanders · Pull Request #5381 · cylc/cylc-flow · GitHub

Currently expected to arrive in Cylc 8.2.0

1 Like

Worth noting that it is often simpler to consolidate this sort of problem down into a single workflow e.g:

[task parameters]
  configuration = foo, bar, baz
  domains = one, two, three

[scheduling]
  [[graph]]
    P1D = """
      start<configuration, domain> => run<configuration, domain> => post<configuration, domain>
    """

Otherwise, I’m guessing that the ability to run cylc vr on a group of workflows might be of use to you.

# currently possible
$ cylc stop 'my_workflow'  # stop a single workflow
$ cylc stop '*'            # stop all workflows
$ cylc stop 'my_group/*'   # stop all workflows with IDs beginning "my_group/"

# not currently possible
$ cylc vr 'my_workflow' --yes  # reinstall a single workflow
$ cylc vr 'my_group/*' --yes   # reinstall all workflows with IDs beginning "my_group/"

@oliver.sanders - For consistency we should probably add --yes to the unadorned cylc reinstall command too. While validation before reinstall is desirable, it is not compulsory (and the user might have validated these workflows already, separately).

21 workflows because it is 7 distinct models, and the workflow had to split across 3 suites for each model because Cylc7 was not able to handle the load due to the number of tasks. I hope with Cylc8 I can shrink that to 7. Task parameters are not sutiable for this use case.

Following along with interest.
Incidental to the discussion above…

  • I prefer to do “reinstall, validate run dir, reload” so that I don’t need to think about validating with the correct opt-conf-keys. I have a script for this, though a compound command would be nice. Or is there a reason not to do it this way?
  • A colleague is running ~2000 tasks in 3 active cycles and does see some issues with the load. Hopefully handling the load better than cylc7 would have?

Cylc 8.1.0 has a compound command to do that, if you see the other threads, Oliver keeps mentioining things like cylc vr and cylc vip to do validate, install/reinstall, play.

Is this Cylc 8? If so, can you say what the load issue is?

Cylc 8 is in principle much more efficient than Cylc 7:

  • The scheduling algorithm only needs to track active tasks, not entire cycles of tasks. Even huge workflows typically have relatively few active tasks at any one time.
  • The web UI is fed by a subscription to incremental workflow status updates, not global updates at regular intervals
  • The UI Server isolates the schedulers from having to feed multiple UIs at once

However, Cylc server load is a complicated thing. You can always cripple your workflow VMs by having too many schedulers running, or graphs that are too big even by Cylc 8 standards, or too many sub-process (job submission, event handlers, etc.) running at once, or by allowing too many non-Cylc processes on the hosts.

But the good news is, Cylc scales arbitrarily well “horizontally”.

(I’m the colleague srennie mentioned)

This is with cylc8. It’s a data-postprocessing suite with very little inter-task dependency, if it wasn’t for tasks thrashing I/O bandwidths and PBS queue limits, I could theoretically run almost all the tasks in all cycles simultaneously. I’m running 12 instances of this suite (with different output variable lists) and I typically limit each to 20 or so active tasks. If I recall correctly running in one workflow with all the variables (~7000 tasks per cycle) crashed cylc8 on play.

The symptoms I’ve observed are

  • large workflows stopping (or perhaps crashing?) where their smaller (or idle) cousins continue on
  • WebUI only displaying a fraction of the current tasks, the table view shows more but is still incomplete
  • workflows can be very slow to start up, extreme examples can take an hour or two from running cylc play to noticing that the WebUI showing it as running

Some of these issues are confounded a bit by running cylc8, on a login node which supposedly auto-kills processes that run too long. I should probably move all this onto a compute node or similar…

Hi @joshuatorrance

Cylc is primarily aimed at cycling workflows with complex dependencies. Potentially large ones at that, but multiple cycles of 7000 tasks per cycle that all want to run at once is definitely pushing the boat out.

I’m tempted to ask if a workflow engine is really the appropriate tool for this. Maybe you should just submit the lot to PBS all at once, via a short shell script, and let PBS handle the resource management problem??

On the web UI front, I’m not surprised that the tree view (i.e. your browser, in a sense!) can’t handle that many tasks at once, but I would tentatively hope that the paginated table view could. I can’t say that we’ve tested anything that gnarly though.

The scheduling algorithm efficiency gain that I mentioned above doesn’t really apply here because if all of your tasks have their dependencies satisfied at once, then they all appear at once in the “active window” of the workflow. (Active really means “ready to run now” rather than “actually running now”).

Would it be easy for you to give a stripped down version of your workflow config? We might be able to take a look at it and see what’s possible.

Thanks Hilary

Perhaps not the perfect tool for the job but it is the tool we know and we do have some dependencies (e.g. postprocess data for each ensemble member then concatenate the results) not to mention having the datetime cycling handled for us is nice for processing data we’ve produced with other cylc suites.

A detail I glossed over is we have our cylc8 suite working with no particular issues for processing our 100s of years of projection-data or our decades of deterministic reanalysis data, it’s just our ensemble reanalysis that has trouble since it has a task for each ensemble member (~12x as many tasks).

Building a MRE for this has on my ToDo list for a while. I’ll post something on discourse when I do have something coherent up and running.

1 Like

A lot of different conversations going on here!


@hilary.j.oliver

For consistency we should probably add --yes to the unadorned cylc reinstall command too.

Yes, there is already an open PR, see this post further up the chat: Cylc reinstall auto yes? - #5 by oliver.sanders

multiple cycles of 7000 tasks per cycle that all want to run at once is definitely pushing the boat out.

It’s pushing the boat out for sure, but we are pushing scaling harder than that with some workflows. I recently encountered a Cylc 7 workflow with over 100’000 tasks! So 7’000 should be well within the capabilities of Cylc.


@srennie

I prefer to do “reinstall, validate run dir, reload” so that I don’t need to think about validating with the correct opt-conf-keys. I have a script for this, though a compound command would be nice. Or is there a reason not to do it this way?

The cylc vr command does this for you, it was added in Cylc 8.1.0.

https://cylc.github.io/cylc-doc/stable/html/reference/changes.html#combined-commands

A colleague is running ~2000 tasks in 3 active cycles and does see some issues with the load.

We regularly run Cylc 7 workflows with tens of thousands of tasks without hitting scaling limits of Cylc. I would expect ~2000 task cycles to run just fine.

If you’re hitting scaling limits with, make sure you’re not doing any many-many triggers (which are a known bottleneck because the number of dependencies increases to the square of the number of tasks).

https://cylc.github.io/cylc-doc/stable/html/workflow-design-guide/efficiency.html#family-to-family-triggering


@joshuatorrance

Thanks for the info, please report these problems so they can be investigated. This is not the expected behaviour of Cylc!

large workflows stopping (or perhaps crashing?) where their smaller (or idle) cousins continue on

Could you try running this large workflow in debug mode and report any traceback or symptoms:

$ cylc vip --debug

WebUI only displaying a fraction of the current tasks

This issue was fixed in a recent release of Cylc. Please ensure both the workflow and UI are running with the latest versions. If the issue can be repeated with the latest versions please let us know.

workflows can be very slow to start up, extreme examples can take an hour or two from running cylc play to noticing that the WebUI showing it as running

The WebUI scans for workflows at a defined interval, make sure this hasn’t been configured at a couple of hours! If Cylc is genuinely taking over an hour to start a workflow…

Firstly, please ensure you are using the latest version of Cylc 8 as there have been efficiency improvements in recent releases e.g: Cylc 8.1.x Release Announcements - #3 by oliver.sanders

If the issue can be repeated with the latest release of Cylc, then this is a likely symptom of many-to-many task triggers which have N^2 scaling to the number of tasks so will slow things up a bit.

If that’s not the case, could you either provide us with the graph of this workflow for investigation OR run this command:

$ cylc validate <workflow> --profile

And send us the profile.prof file which is generated. This file contains the timings of each function and will help us to identify the issue. We should be able to speed things up a bit!

1 Like

You could look at bunching ensemble members up using rose bunch to divide your total tasks by 12. Or, put in artificial dependencies to limit the number of potential active tasks. If you are forcing a limit of 20 or so, then putting in artificial dependencies may resolve some of your problems.

Brilliant, good to hear. Maybe running on a busy HPC login node is a problem then @joshuatorrance ?

I’m still slightly concerned at the suggestion that the whole lot become ready to run at the same time though - as opposed to 7k tasks/cycle but with a complex graph structure that reduces concurrency. But at the same time, such a simple structure presumably is amenable to easy deconstruction into several smaller workflows if it does become too much for a single scheduler to manage.

Good suggestions.

A (Cylc internal) queue limit of 20 is probably good to protect the hardware from too much activity at once, but note that it doesn’t help the Cylc 8 scheduler much at the first approximation, because queued tasks - which are ready to run, but temporarily held back by the scheduler - are part of the “active window” of the workflow graph.

@joshuatorrance

I think we have identified a performance issue for your kind of workflow, wherein literally thousands of tasks become ready to run at the same time. It takes the scheduler some time to initialize all of those tasks, during which time it will not be responsive. For my test case with ~7000 tasks, on my HP Z-book laptop under WSL, it took several minutes - not 2 hours, but much longer than Cylc 7 start-up all the same. It will depend a lot on resources and the structure of your particular workflow.

With Cylc 8 being so new the immediate development priority is bug fixes and filling in a bunch of smaller gaps, but performance optimization for “pathological” :grinning: workflows like this is pretty high on the list.

If you can give us access to your flow.cylc (whether with the guts cut out of it or in full) that might help. You could get in touch through @jarich at your site if you don’t want to post it to the forum here.

@joshuatorrance and @TomC - this was due to the algorithm that computes the visualization window around active tasks, for the UI.

We’ve just merged a big efficiency boost, for release shortly in Cylc 8.1.3.

On my laptop VM, the scheduler can now start up with 7000 tasks ready to go, in ~15 sec instead of ~400 sec :tada:

1 Like