Tasks not clearing

Hello again. I’m sure I’m missing something obvious here, but we’ve got a task that won’t clear the GUI. I confirmed it is causing the workflow to stall. In this case, the two sniffer tasks fail intentionally until the files are available. The dumpfile sniffer is clearing in this example, though there are two differences. One, the dumpfile sniffer succeeds on its first go since these are back in time and the start files are already present. Second, the dumpfile sniffer doesn’t send any cylc messages until the final “ready”, while the fieldsfile sniffer sends a message for each forecast hour. Either way, everything is working exactly as expected except that the fieldsfile sniffer isn’t clearing and thus stalling the workflow.

Here is the graph:

[[graph]]
      T00, T06, T12, T18 = """
        @start & purge[-PT6H]:finish => prune & purge
        @start => dumpfile_sniffer:ready => unpack => recon => forecast?
        forecast:start => fieldsfile_sniffer:ready<fcsthr> => process<fcsthr> => trim<fcsthr>
        process<fcsthr> => tracker => finish

        forecast:fail? => forecast_ss
        dumpfile_sniffer:delayed? => send_dumpfile_alert?
        fieldsfile_sniffer:delayed? => send_fieldsfile_alert?

Here’s the GUI:

Does the workflow literally stall, as in report in the scheduler log that it has stalled, or does it just hang as if still waiting on something?

If it stalls, the log should list the partially satisfied or incomplete tasks that caused the stall.

Otherwise, does the offending sniffer task actually exit - either with success or failure - and is that exit status picked up and logged by the scheduler?

I don’t think you’ve given enough detail for me to see what’s going wrong. One idea, just in case:

...
        forecast:start => fieldsfile_sniffer:ready<fcsthr>  # no '?'
...

All of your individual :ready<fcsthr> outputs are required, not optional. So if the task finishes without completing every single one of them, it will be retained in the active window as an incomplete task, and that will eventually stall the workflow at the runahead limit. However, that would be logged as the stall reason, as I mentioned above.

Yes, it literally stalls.

The scheduler log states that none of the ready_fcsthrXXX outputs completed.

The fieldsfile_sniffer job does finish and is marked as completed. However, the scheduler log states that the ready_fcsthrXXX outputs didn’t complete. However, those outputs were sent by the fieldsfile_sniffer job and they in turn correctly triggered the process jobs. So it’s not clear exactly what is happening here.

Correct, and they should be required. These are message triggers. All of the messages are being sent and the appropriate post process jobs are being triggered. So I’m not sure what constitutes “finishing” a cylc task message trigger here.

Also note that the dumpfile sniffer uses the same mechanic except that it doesn’t use parameterization for multiple message triggers, just a single message trigger. In that case, the task is being marked as completed and it is clearing.

Dammit, there’s a bug!

A task which completes multiple required outputs over multiple submissions will currently not get marked as complete when it finishes. I’m very surprised that no one has hit this before!

The best workaround I can come up with is to write your outputs to a file and message the whole lot back at the end of the task script, here’s an example:

[scheduling]
    [[graph]]
        R1 = """
            a:1 & a:2 & a:3 => z
            a?
        """

[runtime]
    [[a]]
        script = """
            # add the output to the outputs file
            touch outputs
            echo "$CYLC_TASK_TRY_NUMBER" >> outputs

            # send all outputs back to Cylc
            while IFS= read -r output; do
                cylc message -- "$output"
            done < outputs

            # fail (retries configured)
            false
        """
        execution retry delays = 2*PT0S
        [[[outputs]]]
            1 = 1
            2 = 2
            3 = 3
    [[z]]

Dammit, there’s a bug!

A task which completes multiple required outputs over multiple submissions will currently not get marked as complete when it finishes. I’m very surprised that no one has hit this before!

Thanks for the heads up! I think for now we can just add the optional marker to the output messages to let the graph clear. That doesn’t seem to be causing us any problems, so I think it’s just a matter of having the “correct” graph once the bug has been addressed.

I once again appreciate the rapid responses. Cheers.

Good spotting @oliver.sanders - I missed “completed multiple outputs over multiple retries bit”, that’s definitely the problem.

The good news is it works already on the cylc set dev branch for 8.3.0.

A bit of background for @puskar49 -

Given that optional outputs can be completed in this way in the current release, I think the required output behaviour (in the the current release) was deliberate. We originally figured that all required outputs must be completed in any single execution of the job. That’s the natural interpretation of “required output” for a model run, say. The reason that is wrong, in the context of the workflow, is a bit subtle: downstream tasks get triggered individually as each output is completed, so in terms of the workflow it doesn’t matter it if takes multiple job retries to do that.