The role of the UI server in production from a user (and permissions) point of view (cylc8)

Questions about cylc-8 usage

The following lists a whole bunch of assumptions/hopes about how Cylc8 might run for us in production. Due to my not being able to have 2 images in one post, I will respond to this with how we use Cylc7 in production to provide context.

  1. All of the relevant machines on this diagram are generally on the same network and can talk to each other.
  2. Cylc8 is installed on all of the blue VMs, including the new light-blue Cylc8 virtual machines (plus as necessarily on the HPCs as well).
  3. Only the workflow machines (in the special box) run suites and submit jobs to PBS and thus to the HPCs.
  4. Systems Administrators directly install suites onto the workflow machines via the command line.
    1. How do the UI servers get connected to installed (and potentially running) suites?
  5. Suites run with the permissions of certain production realm accounts. For example the atmosphere user might “own” 10 suites, and the oceans user might “own” 5 suites etc etc.
  6. JupyterHub can run UI servers locally and on configurable extra VMs (hence the second light blue circle). I am not concerned if the remote UI servers are not possible, but earlier documentation suggested they are/were.
  7. Suite Owners can log into JupyterHub – as themselves - and scan and find all available suites (just like logging into a machine and running gscan) even where they don’t have write-access to those suites (just like with gscan). The can filter by user/workflow-vm/other filterables, or searching for suite name. Suite owners cannot perform edit runs, change the suite.rc.processed via edit functionality or do any other edits on production suites (because they cannot raise their privileges to the realm account).
  8. Suite Owners don’t have to log out and log back in again or clear their cookies to view other suites running as different production users because it’s just read-only (just like with gscan and the cylc GUI).
  9. Support Staff can connect to JupyterHub and scan and find all available suites, with the same filtering as above, and make limited changes (pause, restart, change environment variables in edit runs…) to all the suites they have permission to affect.
  10. Support Staff don’t have to log out and log back in again or clear their cookies to act as different production users.
    1. Ideally we’d be able to use their Active Directory roles as an indication of which production users they can act on behalf of. (Edit authorisation needs to be checked both at the GUI level and again at the cylc daemon level, so the user’s Active Directory roles will have to be passed to the daemon as well, while being as minimally spoofable as possible.)
    2. For example Bob might be able to affect all suites running as atmosphere, while Jane might be able to affect all atmosphere and ocean suites.
    3. It would be good if cylc-8 allows for any equivalent of “user” and “acting-as user” and always check the permissions of the “acting-as user” for authorisation with a default that “user” and “acting-as user” are identical, but allow authentication/authorisation plugins that differentiate
  11. Systems Administrators can do everything Support Users can and more, with full GUI functionality as well using command line access when necessary.
  12. Suite interactions via the command line also check for user authorisation – with the same “user” and “acting-as user” style plugin or equivalent.

From a user (suite-owner, systems administrator, support staff) point of view, given the above, what are the UI servers? Who needs to know about them?

1 Like

Current cylc-7 usage:

At the moment, for cylc-7, our production looks something like the following:

As above, all machines are on the same network and can talk to each other.

Cylc is installed on all of the blue VMs (and the HPC), but only the workflow machines (in the special box) run suites and submit jobs to PBS and thus to the HPCs. Systems Administrators install suites directly onto their choice of workflow VM and set them running (cross-triggering allows for some flexibility as to which workflow VM suites are installed onto and thus gives us some manual load-balancing capacity). Suites run with the permissions of certain production accounts. For example the atmosphere user might “own” 10 suites, and the oceans user might “own” 5 suites etc etc.

Suite Owners (people who provide the expert support for that particular suite) have read-only access to their suites, via logging into the Cylc-GUI machine and using gscan and the list of cylc workflow hosts, then open the UI from there. They have read-only access, as defined by cylc’s idea of read-only (which means they can’t see the suite’s URL for example). The purpose of having a separate GUI machine is that multiple users using the GUI can be quite resource intensive, especially if any of them decide to look at the graph. All Suite Owners can see all running suites.

Support Staff and Systems Administrators connect to the Cylc-Control machine and then elevate their access by changing to the production account that runs the suite (eg oceans_prod). They then run gscan get the list of cylc workflow hosts, then open the UI from there. They can then start/stop/pause suites, to do edit runs etc. All users with access to the Cylc-Control machine can see and affect all running suites (although they may need to log out of one production user and in as another which is a bit annoying).

Systems Administrators might choose to work directly from the command line as well directly on the cylc workflow machines. This is essential if they need to run a script that connects to each suite and pauses it prior to a PBS outage, for example.

I fully understand that how we use cylc-7 has been informed by how cylc-7 behaves. For example we wouldn’t need the cylc-gui or cylc-control machines if the GUI was light-weight and people “playing” with the graph view didn’t risk slowing production suites (for example). I am sure that how we use cylc-8 will also be informed by how cylc-8 behaves, but I suspect we’ll need a similar set of separations.

Hi @jarich,

Thanks for re-posting this here - I’m sure others will be interested (the developers certainly are).

I understand you’re mainly interested in the Cylc 8 architecture, but a few comments on your described Cylc 7 setup before moving on to your follow-up message:

… gives us some manual load-balancing capacity …

Cylc 7 has some built-in load balancing capability: cylc run can automatically select (based on basic load metrics) which of a pool of hosts to start the suite on. (They all have to share the filesystem though).

The purpose of having a separate GUI machine is that multiple users using the GUI can be quite resource intensive, especially if any of them decide to look at the graph.

and

we wouldn’t need the cylc-gui or cylc-control machines if the GUI was light-weight and people “playing” with the graph view didn’t risk slowing production suites (for example).

In Cylc 8 we plan on not showing the entire graph at once, at least not by default. And the “workflow services” (suite server programs in Cylc 7) will be largely protected from GUIs by the UI Servers.

(although they may need to log out of one production user and in as another which is a bit annoying).

(They could share the suite passphrase.)

Hilary

(now to your follow-up post…)

Hi again @jarich … (hmm maybe I got original post and follow-up the wrong way around - Discourse seems upside-down in this respect)

  1. Cylc8 is installed on all of the blue VMs, including the new light-blue Cylc8 virtual machines (plus as necessarily on the HPCs as well).

As an aside, Cylc 8 is not a monolithic codebase like Cylc 7. The new Workflow Service (server program) plus CLI is now called cylc-flow and is just one of several components (also: the hub and proxy, cylc-ui, cylc-uiserver). cylc-flow will only need to be installed on the workflow hosts.

Systems Administrators directly install suites onto the workflow machines via the command line.

This can stay the same. We might conceivably make this sort of functionality available from the web UI eventually, but almost certainly not in the initial releases.

How do the UI servers get connected to installed (and potentially running) suites?

The UI Servers, which run “as the user” (and one per user) are spawned by the Hub, although they can also be started manually by the user (CLI). The UI Servers will be able to start Workflow Services running, or find already-running ones via cylc scan-like ability. Contrary to your diagram, the UI Servers should probably run on the workflow hosts. A user’s UI Server will hold the status data of all of that user’s workflows, taking incremental updates from them as they evolve. UIs will request information from the UI Server (so multiple UIs will not load up the Workflow Services).

am not concerned if the remote UI servers are not possible, but earlier documentation suggested they are/were.

Definitely possible. As per my earlier comment, the central Hub spawns UI Servers as the user, on the workflow hosts (because the UI Server has to be able to start Workflow Services up, not just communicate with them over the network). There are several potential remote spawning mechanisms, that we’ve tested a little (but successfully) already, including ssh and PBS.

Suite Owners can log into JupyterHub – as themselves - and scan and find all available suites (just like logging into a machine and running gscan) even where they don’t have write-access to those suites (just like with gscan). The can filter by user/workflow-vm/other filterables, or searching for suite name.

The intention is for authenticated users to be able to see other users’ workflows that they are authorized to see, but we don’t know yet how we’ll provide that capability. JupyterHub doesn’t provide any UI to see other users’ Jupyter Notebooks, not because it couldn’t do so in principle but because - unlike for Cylc - the Notebooks themselves do not support multi-user access. We have tested that a hub-authenticated user can in fact see other users’ UI Servers, but at the moment you have to manually switch the target user name in the URL. So we still need to figure out how best to do it. JupyterHub supports something called “hub services” that we may be able to use for this.

Suite owners cannot perform edit runs, change the suite.rc.processed via edit functionality or do any other edits on production suites (because they cannot raise their privileges to the realm account).

Correct, no user should be able to do anything that they’re not authorized to do.

Suite Owners don’t have to log out and log back in again or clear their cookies to view other suites running as different production users because it’s just read-only (just like with gscan and the cylc GUI).

Yes, that’s what we’re aiming for.

Support Staff can connect to JupyterHub and scan and find all available suites, with the same filtering as above, and make limited changes (pause, restart, change environment variables in edit runs…) to all the suites they have permission to affect.

Yes, “support staff” will presumably be defined (as far as Cylc is concerned) by what they are authorized to do.

[AD roles and authorization etc.]

The Hub handles authentication for us, via plugins. We’ve only tested PAM so far, but I don’t expect to have a problem with other authentication systems.

We haven’t started work on authorization yet, so this is still speculation, but we hope to have a two-level authorization system. As an authenticated user, the hub should only let you see Workflow Services that you are authorized to see (perhaps via a site auth config file, that could refer to Active Directory roles or whatever). Then, if you are allowed through by the Hub, the suites (/workflow services) will also check that you are authorized for the particular request or command. Initially at least, authorization might be controlled by simple text config files that are read by a Hub authorization service, and the suites (and/or UI Server).

It’s not clear to me what additional capability your “acting-as” user suggestion would give us? I envisage suite owner A granting temporary authorization to other user B by simply altering their own auth config file accordingly. (And of course other user B would have to be able to get through the site/Hub auth first, so sites can restrict what is possible here).

Systems Administrators can do everything Support Users can and more, with full GUI functionality as well using command line access when necessary.

Yes, if authorized to do so as far as Cylc is concerned. E.g. in the Cylc site config file, specify that users in the “admin” group can control all suites.

My speculation on authorization via AD roles probably assumes that we can get relevant group/role membership info back from the authentication service for an authenticated user (I haven’t checked that yet).

Suite interactions via the command line also check for user authorisation – with the same “user” and “acting-as user” style plugin or equivalent.

The CLI will definitely need authorization too, but we may need (or want) to distinguish between suite job client, suite owner, other-user in terms of the mechanism. This is still to be worked out though.

From a user (suite-owner, systems administrator, support staff) point of view, given the above, what are the UI servers? Who needs to know about them?

The UI Servers literally serve the UI. There is one per user, normally (but not necessarily) spawned by the Hub. Each UI Server holds the status data of one more workflows, and takes incremental updates from the workflows as they evolve. The UI gets status data from the UI Server, not from the workflows (although commands have to be passed through to them). We think UI Servers can be made pretty much transparent to users and admins. The Hub spawns (or re-spawns) them on demand, and they don’t need ot exist if/when no one is looking at their workflows.

Hilary

Just to note, there’s a lot of complex work-in-progress info here. Others on the Cylc team may want to chip in to correct me or add to what I’ve said in places.

My thoughts (again, all subject to challenge) …

That’s one way of doing it but we may prefer to have the UI servers running on separate hosts - I think it’s too early to say. The UI servers will need access to the cylc-run directory, preferably via a shared filesystem but we may be able to support ssh access as well?

The CLI will be able to interact with the workflow servers direct but I assume this will only work when you are logged in as the suite owner and if you have access to the cylc-run directory. We also want the CLI to be able to go via the hub in which case it should be possible to interact with others users suites (with appropriate authentication and authorisation) but this may prove tricky?

You don’t “act” as another user. You always remain logged in as yourself. You can interact with another users suites if you have the appropriate authorisation. Any actions you perform on another users suites will always be logged with your user id.

1 Like

That’s one way of doing it but we may prefer to have the UI servers running on separate hosts …

Yes I was a bit loose with the term “workflow hosts” - I just meant, one of the pool of “cylc hosts” that sees the same filesystem (where the suite directories are). However, you could choose to reserve specific hosts just for the UI Servers, and others just for the Workflow Services (/suite server programs).

In principle we could support UI Servers on other hosts that don’t see the suite directories, via ssh as you say, but I’m not aware of any good reason to do that at the moment.

Agreed. I don’t think we should support this initially. However, @matthewrmshin was keen that we design the system such that we could support this in the future.

This is certainly an improvement. :slight_smile:

In our current setup, we run the suites in production as non-user realm accounts (eg oceans_prod) and we don’t share the passphrases liberally although we could. Consequently actions on the suites have thus far required the user performing the action to be able a) log onto the workflow machine that the daemon is running on, and b) sudo to that non-user realm account to perform the required actions. This has meant that identification of who did what requires looking beyond cylc’s data and into sudo logs etc.

But a robust authentication/authorisation system as planned would indeed alleviate that need entirely. If we can allow Bob to do actions A, B and C on this suite, then it doesn’t matter which user the suite is running as; Bob can do their actions. Likewise we might allow Jane to do actions C, D, and E on the same suite; and when we pass the user and host information back through our event handlers, we will be able to easily differentiate whether it was Bob or Jane who did C to that suite at that moment in time. :slight_smile: