Hi,
Following on from https://cylc.discourse.group/t/cylc-hub-issues/1126 we’re still not quite there with a working multi-user cylc hub setup.
We are seeing the Cylc UI disconnect from workflows every 52s (the red banner). We are struggling to work out what/where this disconnect is happening. Has anyone else encountered this?
We are running JupyterHub on our PUMA2 server with an Apache webserver running behind the ARCHER2 NGINX reverse web proxy. We have tried upping the NGINX and Apache proxy timelimits to no avail. The folks that manage the reverse proxy have told me they cannot see any dropped flows for out-of-state errors between the proxy and PUMA2.
In the PUMA2 apache access logs I’m seeing GET requests every 52s:
10.22.10.2 - - [30/Apr/2025:13:38:40 +0100] "GET /user/ros/cylc/subscriptions HTTP/1.1" 200 - "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36"
10.22.10.2 - - [30/Apr/2025:13:39:32 +0100] "GET /user/ros/cylc/subscriptions HTTP/1.1" 200 - "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36"
10.22.10.2 - - [30/Apr/2025:13:40:24 +0100] "GET /user/ros/cylc/subscriptions HTTP/1.1" 200 - "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36"
10.22.10.2 - - [30/Apr/2025:13:41:16 +0100] "GET /user/ros/cylc/subscriptions HTTP/1.1" 200 - "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36"
And in the system log when accessing, for example a log file via Cylc UI, tornado errors confirming closed websocket.
Apr 30 12:51:26 puma2 cylc[4072843]: Task exception was never retrieved
Apr 30 12:51:26 puma2 cylc[4072843]: future: <Task finished name='Task-344653' coro=<TornadoSubscriptionServer.on_start() done, defined at /home/n02/n02/fcm/metomi/cylc-8.4.1-1/lib/python3.9/site-packages/cylc/uiserver/websockets/tornado.py:129> exception=WebSocketClosedError()>
Apr 30 12:51:26 puma2 cylc[4072843]: Traceback (most recent call last):
Apr 30 12:51:26 puma2 cylc[4072843]: File "/home/n02/n02/fcm/metomi/cylc-8.4.1-1/lib/python3.9/site-packages/cylc/uiserver/websockets/tornado.py", line 148, in on_start
Apr 30 12:51:26 puma2 cylc[4072843]: await self.send_execution_result(
Apr 30 12:51:26 puma2 cylc[4072843]: File "/home/n02/n02/fcm/metomi/cylc-8.4.1-1/lib/python3.9/site-packages/cylc/uiserver/websockets/tornado.py", line 173, in send_execution_result
Apr 30 12:51:26 puma2 cylc[4072843]: await BaseSubscriptionServer.send_execution_result(
Apr 30 12:51:26 puma2 cylc[4072843]: File "/home/n02/n02/fcm/metomi/cylc-8.4.1-1/lib/python3.9/site-packages/graphql_ws/base_async.py", line 181, in send_message
Apr 30 12:51:26 puma2 cylc[4072843]: return await connection_context.send(message)
Apr 30 12:51:26 puma2 cylc[4072843]: File "/home/n02/n02/fcm/metomi/cylc-8.4.1-1/lib/python3.9/site-packages/cylc/uiserver/websockets/tornado.py", line 49, in send
Apr 30 12:51:26 puma2 cylc[4072843]: await self.ws.write_message(data)
Apr 30 12:51:26 puma2 cylc[4072843]: File "/home/n02/n02/fcm/metomi/cylc-8.4.1-1/lib/python3.9/site-packages/tornado/websocket.py", line 332, in write_message
Apr 30 12:51:26 puma2 cylc[4072843]: raise WebSocketClosedError()
Apr 30 12:51:26 puma2 cylc[4072843]: tornado.websocket.WebSocketClosedError
Apr 30 12:51:26 puma2 cylc[4072843]: During handling of the above exception, another exception occurred:
Apr 30 12:51:26 puma2 cylc[4072843]: Traceback (most recent call last):
Apr 30 12:51:26 puma2 cylc[4072843]: File "/home/n02/n02/fcm/metomi/cylc-8.4.1-1/lib/python3.9/site-packages/cylc/uiserver/websockets/tornado.py", line 151, in on_start
Apr 30 12:51:26 puma2 cylc[4072843]: await self.send_error(connection_context, op_id, e)
Apr 30 12:51:26 puma2 cylc[4072843]: File "/home/n02/n02/fcm/metomi/cylc-8.4.1-1/lib/python3.9/site-packages/graphql_ws/base_async.py", line 181, in send_message
Apr 30 12:51:26 puma2 cylc[4072843]: return await connection_context.send(message)
Apr 30 12:51:26 puma2 cylc[4072843]: File "/home/n02/n02/fcm/metomi/cylc-8.4.1-1/lib/python3.9/site-packages/cylc/uiserver/websockets/tornado.py", line 49, in send
Apr 30 12:51:26 puma2 cylc[4072843]: await self.ws.write_message(data)
Apr 30 12:51:26 puma2 cylc[4072843]: File "/home/n02/n02/fcm/metomi/cylc-8.4.1-1/lib/python3.9/site-packages/tornado/websocket.py", line 332, in write_message
Apr 30 12:51:26 puma2 cylc[4072843]: raise WebSocketClosedError()
Apr 30 12:51:26 puma2 cylc[4072843]: tornado.websocket.WebSocketClosedError
PUMA2 Apache setup:
<VirtualHost *:80>
ServerName XXXXXXX
ProxyPreserveHost On
# Use RewriteEngine to handle WebSocket connection upgrades
RewriteEngine On
RewriteCond %{HTTP:Connection} Upgrade [NC]
RewriteCond %{HTTP:Upgrade} websocket [NC]
RewriteRule /(.*) ws://localhost:8000/$1 [P,L]
# HTTP proxy to JupyterHub
ProxyPass "/" "http://localhost:8000/"
ProxyPassReverse "/" "http://localhost:8000/"
RequestHeader set "X-Forwarded-Proto" expr=%{REQUEST_SCHEME}
# Long timeouts
Timeout 3600
ProxyTimeout 3600
ProxyWebsocketIdleTimeout 3600
</VirtualHost>
ARCHER2 NGINX reverse proxy setup:
server {
listen 80;
server_name XXXXXXXXX;
location / {
proxy_set_header HOST $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_pass http://XXXXXXXX;
proxy_buffering off;
proxy_http_version 1.1;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto https;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $http_connection;
}
}
I’ve been told they have raised the proxy_read_timeout
and proxy_send_timeout
to 300 seconds; I don’t have access to reverse proxy to check what exactly has been done.
Any ideas on how to track down where the problem is occurring or what the issue maybe greatly received. It’s so annoying as we are so very nearly there, but the frequent disconnects cause the cylc hub to spawn new cat-log
& tail
processes each time which brings PUMA2 to a grinding halt within hours.
Has anyone else managed to successfully setup Cylc Hub behind a reverse web proxy?
Our Cylc versions are:
Cylc 8.4.1-1
Cylc UI 2.7.0
Cylc Hub 5.2.1