More information about "what did the operator do"?

After performing a number of operations to nudge a workflow back into running “on time”, I figured I should capture the operations I was making. I checked out the workflow owner’s UI server logs and found:

2024-10-07T22:52:38 INFO     jarich: authorized to trigger
2024-10-07T22:52:41 INFO     jarch: authorized to trigger
2024-10-07T22:53:32 INFO     other_user: authorized to read
2024-10-07T22:54:33 INFO     other_user: authorized to read
2024-10-07T22:55:33 INFO     other_user: authorized to read
2024-10-07T22:56:34 INFO     other_user: authorized to read
2024-10-07T22:56:44 INFO     jarich: authorized to set
2024-10-07T22:56:52 INFO     jarich: authorized to set
2024-10-07T22:57:14 INFO     jarich: authorized to set
2024-10-07T22:57:23 INFO     jarich: authorized to set

but unfortunately that doesn’t tell me what I triggered or set. The hub logs also don’t provide this information:

[I 2024-10-07T22:52:38.561 JupyterHub log:192] 200 GET /hub/api/user (jarich@127.0.0.1) 7.96ms
[I 2024-10-07 22:52:38.564 CylcUIServer] jarich: authorized to trigger
[I 2024-10-07 22:52:38.733 CylcHubApp log:192] 200 POST /user/c1_project_prod/cylc/graphql (jarich@::ffff:10.19.96.44) 181.73ms
[...]
[I 2024-10-07 22:52:41.410 CylcUIServer] jarich: authorized to trigger
[I 2024-10-07 22:52:41.541 CylcHubApp log:192] 200 POST /user/c1_project_prod/cylc/graphql (jarich@::ffff:10.19.96.44) 134.01ms

Is there a way to add this extra information in? For example:

2024-10-07T22:52:41 INFO     jarich: authorized to trigger
2024-10-07T22:52:41 INFO     jarich: triggered workflow_owner/workflow/cycle_point/task
...
2024-10-07T22:57:23 INFO     jarich: authorized to set
2024-10-07T22:57:23 INFO     jarich: set output=succeeded on workflow_owner/workflow/cycle_point/task2

We’ve certainly had situations in the past where operators have made mistakes. Triggering the wrong task, accidentally suspending/stopping workflows etc. It would be really helpful if we could get more details on who did what, especially when multiple people are taking actions.

I think at the moment UI Server logging is pretty minimal, and you should look to the scheduler logs to see exactly what operations were performed by whom, and the result, on the workflows.

Is that insufficient in your opinion? I guess we could easily log the command at the UI Server too.

I have looked in ~workflow_owner/cylc-run/workflow/log/scheduler/log and found:

2024-10-07T22:52:38Z INFO - Command "force_trigger_tasks" received from jarich. ID=d2332b68-5657-40ec-8e7c-2c071a3a7261
    force_trigger_tasks(flow=['all'], flow_wait=False, tasks=['20241007T0350Z/process_INT_MODELS_1'])

That’s sufficient. Thanks. :smiley:

2 Likes

Hi,

Yes, the workflow logs have the full details, the UIS logs just cover connections and requests.

I can see the attraction to being able to see intervention details in the UIS logs (i.e. being able to see interventions on all workflows running under a user account) or at the hub (i.e. being able to see interventions on all workflows in the deployment). Might be useful for sysadmin work, etc.

We can potentially get some of this information into the UIS log, however, it would be in GraphQL format (so would appear quite differently to how it does in the workflow log) due to the way the UIS works.

I don’t think we can get this info into the hub logs because the requests don’t go though the hub, however, some form of log aggregation could satisfy this.