Cleared cycle's not clearing from the Tree

Hello,

I’ve noticed an odd occurrence with the Tree view whenever I would stop and restart the workflow. The cycles appear to hang out in the Tree View with no tasks listed. Here are some details of what I did to recreate this issue.

I’m using Cylc v8.2.3. I am running a workflow that is fairly simplistic. Not a lot of multiple dependencies or complexity. Very few optional triggers too.

[[graph]]
      T00, T06, T12, T18 = """
        @start & purge[-PT6H]:finish => prune & purge
        @start => dumpfile_sniffer:ready => unpack => recon => forecast?
        forecast:start => fieldsfile_sniffer:ready<fcsthr>? => process<fcsthr> => trim<fcsthr>
        process<fcsthr> => tracker => finish

        forecast:fail? => forecast_ss
        dumpfile_sniffer:delayed? => send_dumpfile_alert
        fieldsfile_sniffer:delayed? => send_fieldsfile_alert
      """

When I starting the initial run of the workflow I do use the option ‘–no-run-name’, not sure if that is a factor.
cylc vip --no-run-name --workflow-name my_workflow

After monitoring for a couple cycles with no issues. I stopped the workflow in the WebUI. Once confirmed it had stopped, I restarted my_workflow from my terminal.
cylc play my_workflow

The workflow runs starting right where it left off, and that cycle and subsequent cycles will remain in the Tree View with no Tasks pending. When looking at the tree view it looks something like this.

I did check the other views like Table View, which shows some Tasks from these cycles in a submitted state. However, when I search our logs (Cylc and Slurm) I confirmed these Tasks successfully completed and without error. Additionally, I don’t understand how, if this is truly in a submitted state the workflow can progress and not stall.

When I re-ran these hung Tasks from the 19th the Tasks ran and cleared both the table and tree graph, but when I reloaded my WebUI some of those tasks might return. Usually its a shorter list than I started with, like they are clearing, but the last one or two Tasks didn’t update fast enough and got thrown back. Has anyone else experienced this issue before, or know what I am doing wrong?

Hi,

The --no-run-name bit shouldn’t matter.

I’m not sure what caused this, we’ll look into it. If you encounter this again (or any issues with the GUI in general), it’s a good idea to try reloading the browser tab to see if the issue goes away. This can help us pinpoint which part of the system the bug is lurking in.

Could you check what version of the GUI you are using. When you open the GUI it is displayed in the bottom left hand corner. The GUI is distributed in the cylc-uiserver package which is separate from the cylc-flow package so it’s useful for us to know both numbers. (There was a bug a little while back which could potentially have resulted in this behaviour if you’re running off of an older GUI version).

I’ve noticed once in a while if Cylc isn’t updating, refreshing does help. In this case it didn’t seem to help. Even when other users accessed the workflow these Cycle’s appeared in their view.

Cylc UI 2.3.0

1 Like

Hi @Jaclyn

I tried making a quick-running test workflow based on your graph above, but after multiple stop/restart cycles with both cylc-8.2.3 and 8.2.4 I don’t see that behaviour in the GUI.

It might help if you could try running this yourself, and maybe amend the example if it is missing any graph features (e.g. suicide triggers?) that appear in your real workflow:

[task parameters]
   fcsthr = 1..2
[scheduler]
   allow implicit tasks = True
[scheduling]
   initial cycle point = 2024
   [[xtriggers]]
       start = wall_clock()
[[graph]]
      T00, T06, T12, T18 = """
        @start & purge[-PT6H]:finish => prune & purge
        @start => dumpfile_sniffer:ready => unpack => recon => forecast?
   # ???----> fieldsfile_sniffer:ready<fcsthr>? <---- ???
        forecast:start => fieldsfile_sniffer:ready? => process<fcsthr> => trim<fcsthr>
        process<fcsthr> => tracker => finish
        forecast:fail? => forecast_ss
        dumpfile_sniffer:delayed? => send_dumpfile_alert
        fieldsfile_sniffer:delayed? => send_fieldsfile_alert
      """
[runtime]
   [[process<fcsthr>, trim<fcsthr>]]
   [[dumpfile_sniffer, fieldsfile_sniffer]]
       script = """
               cylc message "I'm ready"
                   # cylc message "I'm delayed"
           """
       [[[outputs]]]
           ready = "I'm ready"
           delayed = "I'm delayed"

Note the commented-out line where you parameterized the task output. The built-in parameters are only meant for task names, so if that’s not a typo you must be using Jinja2 to loop over the parameters under [runtime]? (It shouldn’t really work in the graph either, but apparently it does!)

I ran this graph in my environment as is (aside from the parameterized output, I didn’t notice anything missing from your example graph). As you said, I didn’t see that same failure to clear the cycles after stopping and starting the workflow. I am going to try some additional runs with some changes to the graph and see if I can identify what is potentially contributing to this problem for us. I will post here again if I discover anything more.

We loop through the list of files we generate and use the forecast hours in the file names to make a “FILE_LIST”. Which isn’t actually iterating over the parameters, it just happens to match the values. We’ve used this in our old cylc7 graphs as well and it’s worked quite well for us so far :slight_smile:

1 Like

I remember seeing this issue not too long ago (when I was playing with a previous version for an esiwace presentation). Unfortunately I was not keeping track of Cylc version (only knew it was in a commit in 8.2.x), but I got a similar situation, where tree nodes were not removed.

I just synced my branches (UIS and flow), copied Hilary’s experiment, modifying until validation passed

[task parameters]
   fcsthr = 1..2
[scheduler]
   allow implicit tasks = True
[scheduling]
   initial cycle point = 2024
   [[xtriggers]]
       start = wall_clock()
[[graph]]
      T00, T06, T12, T18 = """
        @start & purge[-PT6H]:finish => prune & purge
        @start => dumpfile_sniffer => unpack => recon => forecast?
   # ???----> fieldsfile_sniffer:ready<fcsthr>? <---- ???
        forecast:start => fieldsfile_sniffer? => process<fcsthr> => trim<fcsthr>
        process<fcsthr> => tracker => finish
        forecast:fail? => forecast_ss
#        dumpfile_sniffer:delayed? => send_dumpfile_alert
#        fieldsfile_sniffer:delayed? => send_fieldsfile_alert
      """
[runtime]
   [[process<fcsthr>, trim<fcsthr>]]
   [[dumpfile_sniffer, fieldsfile_sniffer]]
       script = """
               cylc message "I'm ready"
"""

Opened it on Firefox LTS + Linux Ubuntu LTS. Left it run for some minutes, then issued cylc stop ghosts --now, then cylc play ghosts a few times. Then paused a task, restarted it again a few times.

Took me some 5 minutes, then I got the same JS console error I had seen some time ago: “Uncaught TypeError: t.ancestors is undefined”

When that happened I got one cycle node in the tree that seemed to be lingering longer than expected. But no other nodes. I restarted the workflow (did not reload the page) and then the tree view fixed itself. Unfortunately I took the screenshot after the tree was fixed. I also forgot to look at the WebSockets to see which message could have triggered it, but maybe it helps other to troubleshoot it.

TL;DR: might be due to a JS bug, where a obj.ancestors was not set (either because the JS code needs to be fixed, or due to a delta or something from the UIS).

1 Like

Thanks @Bruno_P_Kinoshita!

The ancestors and childTasks fields are required by the UI data store to create the family hierarchy. If this is not set in the store when a “pruned” delta arrives, then the store will be unable to remove the tasks/families which would explain what we’re seeing.

Now we’ll need to investigate those deltas to work out which end of the system the problem is in.

1 Like

Just wanted to follow up on this thread. Thank you very much to everyone who posted a response. I think for my particular situation we’ve found a solution to keep these “ghosts” from taking over our GUI.
We did observe that the jobs that tended to fail most often were the “trim”. This task did not trigger a follow-on task.
Old Graph:

[[graph]]
      T00, T06, T12, T18 = """
        @start & purge[-PT6H]:finish => prune & purge
        @start => dumpfile_sniffer:ready => unpack => recon => forecast?
        forecast:start => fieldsfile_sniffer:ready<fcsthr>? => process<fcsthr> => trim<fcsthr>
        process<fcsthr> => tracker => finish

        forecast:fail? => forecast_ss
        dumpfile_sniffer:delayed? => send_dumpfile_alert
        fieldsfile_sniffer:delayed? => send_fieldsfile_alert
      """

New Graph:

[[graph]]
      T00, T06, T12, T18 = """
        @start & purge[-PT6H]:finish => prune & purge
        @start => dumpfile_sniffer:ready => unpack => recon => forecast?
        forecast:start => fieldsfile_sniffer:ready<fcsthr>? => process<fcsthr> => trim<fcsthr> => finish
        process<fcsthr> => tracker => finish

        forecast:fail? => forecast_ss
        dumpfile_sniffer:delayed? => send_dumpfile_alert
        fieldsfile_sniffer:delayed? => send_fieldsfile_alert
      """

By adding the finish task after trim we are able to avoid leaving these tasks behind in the GUI.