Hello Cylcerists,
I’ve been working on the Bureau of Meteorology’s Cylc8 migration, and there’s one key area of functionality that I’d like some advice on.
The Bureau has used a modified version of gscan
in the past to view all running workflows across our Cylc7 instances. I’m attempting to do this for Cylc8.
Is there a way to query the GraphQL API to get ALL USERS workflows that I have access to?
As far as I can tell, Cylc8 doesn’t expose this in an easy to consume format. I could query each known user’s endpoint for their workflows, but this seems unnecessarily messy.
We’ve successfully managed to deploy a multi-user deployment of Cylc8. I’ve got an account that is configured to have READ access to all other user’s UI servers.
I’ve tried to be brief here, but please ask for more detail if needed!
Thanks,
Jez
Hi,
The security model of Cylc keeps all of the Cylc components in userspace. The cylc-uiserver application runs as the user. If you configure authorisation, then other users may also contact it to view the user’s workflows. One user, one server, potentially many observers.
In order to view workflows running under multiple accounts, you need to start a cylc-uiserver for each of the accounts you want to view, then issue the same query to each server. I.E. to do this, you need to issue multiple requests to multiple servers.
Presently the cylc-ui (the web application bit) only opens one connection to one server so cannot list multiple user’s workflows.
You may also want to talk to @jarich who is familiar with the context.
@oliver.sanders has explained the reason for this. As a decentralized system, there is no Cylc component that knows “all the users”, so in fact that information simply does not exist.
Generally this is a great advantage of Cylc, in terms of security, admin burden, usability, and scalability, over more cumbersome central-server workflow managers. But of course there are some scenarios - such as multiple production accounts - where it would be nice to see multiple users at once.
The Cylc 8 hub is “central”, but it still can’t know who “all the users” are (or, more particularly, which ones you want to see) unless you tell it somehow. For the moment, we support targeting specific others users, which you as the user have to ask for. I think there is scope for configuring groups of users at the hub in some way, but that’s yet to be done (you might be considering that, as I recall!).
" The Bureau has used a modified version of gscan
"
This is mistaken. The Bureau uses the standard gscan version.
The thing that the Bureau does differently is that we create a tree each workflow’s contact files that we then rsync into a shared location that is available to support users. Then each support user links their .cylc/auth
directories to that location:
[jarich@SUPPORT_HOST ~]$ ls .cylc/auth/
user1@SCHEDULER_HOST
user2@SCHEDULER_HOST
user3@SCHEDULER_HOST
user4@SCHEDULER_HOST
user5@SCHEDULER_HOST
[...]
[jarich@SUPPORT_HOST ~]$ ls .cylc/auth/user1@SCHEDULER_HOST/
workflow1
workflow2
workflow3
[...]
[jarich@SUPPORT_HOST ~]$ ls .cylc/auth/user1@SCHEDULER_HOST/workflow1/
contact passphrase ssl.cert
We actually have 2 support hosts; the one above gets the contact, passphrase and ssl.cert for each workflow, and the other only gets the contact file, thus allowing us “read only” and “control” VMs with access managed by RBAC.
I am wondering whether a solution might be to (somehow) take advantage of Jupyterhub to have an extra step between authentication and spawning the UI Server. I envision this as:
- Some method exists that gives the hub has access to all workflow contact (or equivalent) files, somehow.
- User authenticates
- User gets to a cylc hub-owned page that shows unified workflow status page with links to all of the known workflows (based on information provided in step 0)
- User can then open their UI server/other user’s UI servers
- Cylc hub then spawns the UI servers in its regular way with internal Cylc permissions.
I don’t know if Jupyter Hub allows this, but if you can host both UI Servers and Notepads from the same Hub, it feels like it should be possible. Any thoughts?
This would expose the state of workflow names and state of workflows that individual users may not have access to, but only their names and state. That might be considered acceptable.
Hub owned pages are a possibility (see Jupyter Hub services), they run under the user account used to serve Jupyter Hub. We would like to turn our internal Cylc server monitoring page into a hub service for ease of deployment / sharing.
The hub user could be given the relevant filesystem permissions to see the security certificates that Cylc uses (loaded in the .service
directory alongside the contact
file) e.g. via user groups. However, doing so wouldn’t make it easier for the hub to get at the workflow data as it’s the user’s servers that are responsible for scanning workflows, connecting to them and populating a data store. Doing this at the hub would involve re-implementing the entire data stack.
Multi-user workflow data is already available via the users’ personal servers. In order to build a multi-user workflow dashboard, you need to open one websocket connection to each of the user’s servers you want to monitor workflows from. This should cause their server to be spawned automatically if not already running. You can then send the same GraphQL subscription to each and collate the results. The cylc-ui data store already supports collating workflows from multiple user accounts, but cylc-ui does not support the management of multiple websocket connections.
One possibility for building a multi-user dashboard outside of the cylc-ui would be a Jupyter Server extension (i.e. what cylc-ui and jupyter-lab are, you can run any number of extensions on the same server). This would provide you with a root url to implement a bespoke web-app from. This could make use of the existing GraphQL apis to open websocket connections to each user’s server.
One possibility for building a multi-user dashboard inside of the cylc-ui would be a bespoke “view” that handles the management of the additional websocket connections and injects the resulting data into the cylc-ui data store. Each “bit” of the cylc-ui is a “view” that can be developed somewhat independently from the rest of the app. Cylc-ui views are close to being plugins, however, they have to be built into the cylc-ui application (i.e. you would have to roll your own cylc-ui to get this functionality), however this is not too difficult, we already have the required configurations in CylcUIServer to deploy a bespoke frontend.
The solution we’ve ended up going with is to use a jupyter server extension, with a tornado webapp talking to user servers’s graphql endpoints and presenting a global view on a webpage.