Tip: Custom Outputs

Cylc Tip: Custom Outputs

Task dependencies are usually based on the success of the previous task, e.g:

# "c" depends on the success of both "a" and "b"
a & b => c

For many situations this is all that is required. However, writing dependencies on the “success” of a task is masking the true data dependency that lies beneath. If we were to write this dependency in terms of the exchange of data, it might look like this:

# "c" depends on three data files, one generated by "a", the other two by "b"
a:file1 & b:file2 & b:file3 => c

Cylc (both 7 & 8) supports “custom outputs” which allow you do do just that. Register your custom outputs in the “[runtime]” section, and tell Cylc when they have been generated using the “cylc message” command, e.g:

[runtime]
    [[b]]
        script = """
            python myscript1.py > file1; cylc message -- 'file1 ready'
            python myscript2.py > file2; cylc message -- 'file2 ready'
        """
        [[[outputs]]]
            # <trigger> = <message>
            # * triggers can be used in the graph
            # * messages are the strings you send using "cylc message"
            file1 = file1 ready
            file2 = file2 ready

Using custom outputs can help clarify the true data dependencies a task requires.

It is especially useful when a task generates multiple outputs over the course of its run, but you don’t need to wait for the entire task to succeed before running downstream tasks e.g:

# run "x2" as soon as "file2" is ready (don't wait for "b" to finish running)
b:file2 => x2
# run "x3" as soon as "file3" is ready (don't wait for "b" to finish running)
b:file3 => x3

For more information, there is a message trigger tutorial in the Cylc docs.


Cylc 8 Tip: Output Validation

At Cylc 8, custom outputs have become more powerful thanks to output validation. This allows you to tell Cylc what outputs a task “should” generate so that errors can be caught earlier.

Going back to the last graph, because there are no question marks (?) after these outputs, they are all considered to be “required”. Cylc will consider it to be an error if any of these outputs are not generated, i.e:

# if "a" does not generate the "file1" output, it will marked as having "incomplete outputs"
# if "b" does not generate files 2 & 3 it will be marked as having "incomplete outputs"
a:file1 & b:file2 & b:file3 => c

But, sometimes the outputs generated might change from run-to-run. For this, you can use “optional outputs” e.g:

# d requires either "file2" or "file3"
# (it is ok if "file2" and "file3" are not generated)
b:file2? | b:file3? => d

This is fine, but because both outputs are “optional”, if “b” doesn’t generate either “file2” OR “file3”, then nothing will happen (i.e. “d” will not run and Cylc will not flag this as an error). This might be what you want, but what if you need one of these files, just not necessarily both?

To handle this Cylc 8 has introduced “completion expressions”. These allow you to validate the outputs that are generated to ensure everything has worked as intended.

[runtime]
    [[b]]
        script = ...
        # "b" must succeed and generate at least one file
        completion = succeeded and (file1 or file2)

When the task finishes running, the completion expression is checked. If the task outputs do not satisfy this condition, then Cylc will raise it as an error (see the workflow log).

ERROR - Incomplete tasks:
* 1/b did not complete the required outputs:
✓ ⦙  succeeded
  ⦙  and (
⨯ ⦙    file1
⨯ ⦙    or file2
  ⦙  )

The task will stay visible in the GUI giving you a chance to edit and re-run it as needed to recover the workflow.

Use the “cylc show” command to see the list of outputs a task has generated, and the status of the “completion condition”.

$ cylc show myworkflow//1/b
...
outputs: ('⨯': not completed)
⨯ 1/b expired
✓ 1/b submitted
⨯ 1/b submit-failed
✓ 1/b started
✓ 1/b succeeded
⨯ 1/b failed
⨯ 1/b file1
⨯ 1/b file2
output completion: incomplete
✓ ⦙  succeeded
  ⦙  and (
⨯ ⦙    file1
⨯ ⦙    or file2
  ⦙  )

We will be increasing the utility and visibility of this feature (especially in the GUI) in the near future.

You can also use the completion condition to handle failure conditions (and more).

In this example, the task “get_live_data” is allowed to fail, but only if it also generates the output “data_not_available” (where we’ve interpreted the exit code “42” as meaning “no data” to differentiate from “no network connection” or “error running the script”, etc).

[scheduling]
    [[graph]]
        R1 = """
            get_live_data:failed? => get_archive_data
            get_live_data? | get_archive_data => run_model
        """

[runtime]
    [[get_live_data]]
        completion = succeeded or (error and data_not_available)
        script = """
            get_data "${CYLC_TASK_CYCLE_POINT} || (if $? == 42; then cylc message -- 'no data'; else exit 1; fi)
        """
        [[[outputs]]]
            data_not_available = no data

This provides a much more robust approach to error handling than just relying on the default “:succeeded” and “:failed” outputs that tasks generate.

Some more completion expression examples:

completion = succeeded or expired

completion = succeeded and (x or y or z)

completion = (succeeded and data) or (failed and (error1 or error2))

For more information, see the custom outputs example in the docs and the reference material on completion expressions.

2 Likes