7 to 8 first steps

Suite configuration file

This is a short summary about my experience migrating an ensemble suite to cylc 8. Porting a suite configuration from suite.rc to a cylc8 workflow flow.cylc is straight forward and cylc validate . is giving you hints what need to be changed. I used an on-deman service that had cylc pre-installed in a conda environment. This means that the platform configuration was actually provided, that saves some time setting up global.cylc.

clyc config

To pre-process a workflow file with jinja2 parameters you can use:

cylc view -j .

The cylc install procedure

At first I was missing some configuration files on the job runner, but then the cylc documentation stipulates that a few pre-defined directories are copied across platforms by default. Moving my scripts and namelist files to app, etc, bin, and lib saved my day. These folder cannot contain any temporary runtime files, which will get deleted on cylc reinstall and/or cylc reload. Of course if you can specify you own source directories with global.cylc[install]source dirs.

Start/Restart/Reload

I think this is different from the previous rose/cylc version, but If the workflow reached the final cycle point and shut down, it is finished and cannot be restarted. See documentation.

Monitoring

cylc tui is a pretty nifty tool. Very handy to monitor a workflow. The only thing I was missing is a hook to cylc cat-log to display standart output and standard error logs froma a task.

5 Likes

Thanks for that encouraging report! Out of curiosity, did your workflow have graph branching that required conversion to the new optional task output syntax?

Since I do not quite know what that the ‘new optional task output syntax’ is, I do not think I have used it. I implemented a simple barrier task to wait for all memeber tasks to complete as mentioned in the parameterised flows section in the documentation.

One remaining item that I have not yet figured out is, how to mark a task off as success. A typical scenario is that a PBS job on HPC gets lost or dies (PBS will communicate that via email saying that the job was killed). Now cylc on the other hand is waiting for the job number to report back which will never happen. In cylc7 there was an option to force the job state to one of the following: waiting, ready, succeeded, failed.
What would the equivalent be in cylc 8? In the example below tasks check_AWAVE is the one I want to clear. Also bwix_prep is somehow duplicated? Is this fixed in cylc 8.0.2?

$ cylc --version
8.0.1
$ cylc wo wav-nat86/run1 -T
check_AWAVE, 20220915T0000Z, submitted
bwix_prep, 20220915T0000Z, succeeded
bwix_wave, 20220915T0000Z, succeeded
svg_prep, 20220915T0000Z, succeeded
nat_prep, 20220915T0000Z, succeeded
bwix_prep, 20220915T0000Z, waiting
per_prep, 20220915T0000Z, succeeded
syd_prep, 20220915T0000Z, succeeded
syd_wave, 20220915T0000Z, succeeded
per_wave, 20220915T0000Z, succeeded
svg_wave, 20220915T0000Z, succeeded
nat_wave, 20220915T0000Z, succeeded
bwix_next, 20220915T0000Z, succeeded
bwix_point, 20220915T0000Z, succeeded
bwix_field, 20220915T0000Z, succeeded
syd_next, 20220915T0000Z, succeeded
per_next, 20220915T0000Z, succeeded
bwix_link, 20220915T0000Z, succeeded
syd_point, 20220915T0000Z, succeeded
syd_field, 20220915T0000Z, succeeded
per_field, 20220915T0000Z, succeeded
per_point, 20220915T0000Z, succeeded
bwix_plot, 20220915T0000Z, succeeded
syd_link, 20220915T0000Z, succeeded
syd_tape, 20220915T0000Z, succeeded
bwix_tape, 20220915T0000Z, succeeded
per_link, 20220915T0000Z, succeeded
per_tape, 20220915T0000Z, succeeded
svg_next, 20220915T0000Z, succeeded
svg_field, 20220915T0000Z, succeeded
svg_point, 20220915T0000Z, succeeded
svg_link, 20220915T0000Z, succeeded
clean_run, 20220915T0000Z, waiting
svg_plot, 20220915T0000Z, succeeded
svg_tape, 20220915T0000Z, succeeded
mdss, 20220915T0000Z, waiting
nat_next, 20220915T0000Z, succeeded
nat_field, 20220915T0000Z, succeeded
nat_point, 20220915T0000Z, succeeded
nat_link, 20220915T0000Z, preparing

This is one of the scenarios that polling is designed to deal with.
https://cylc.github.io/cylc-doc/stable/html/user-guide/running-workflows/tracking-task-state.html
If you poll the task (from the UI or using cylc poll from the command line) then Cylc will check and find that the job is no longer queued and the task will go into the “submit-failed” state.

By default, Cylc will poll jobs every 15 minutes whilst they are submitted or running (with additional polls triggered if a task reaches it’s execution time limit) so this should happen automatically if you wait long enough.

The reason for doing that, in Cylc 7, was to manually set up the right task states to make the right downstream tasks trigger (if something went wrong that the workflow did not handle automatically) because the Cylc 7 scheduler matched task prerequisites and outputs at run time, to decide what tasks could trigger next.

Cylc 8, on the other hand, has a “spawn on demand” scheduler: it triggers tasks directly
according to the graph, as upstream outputs are completed. So manually resetting task state would essentially have no effect in Cylc 8.

The equivalent thing in Cylc 8, for events that the workflow does not handle automatically, is:

  • manually trigger tasks with cylc trigger
  • use cylc set-outputs to tell the scheduler to act as if specific outputs were completed

… however, you have a task in the submitted state (i.e. the job script has been handed off to PBS): check_AWAVE, 20220915T0000Z, submitted. Attempts to trigger an active task (submitted or running) will be ignored because the task is already active. If the task is not really active because it was cancelled from PBS by external means before it could start running, then as noted by @dpmatthews, you need to poll the task (i.e. query its state) from Cylc, which should result in it going to the submit-failed state (after which you can retrigger it or whatever).

I’ve not seen this before! cylc workflow-state reads the target workflow database “task_states” table. If you still have an example showing this duplication, could you run this command:

sqlite3 -header -column ~/cylc-run/wav-nat86/runN/log/db  # (e.g.)

Correction, I’ve figured out how a task can appear multiple times in the task_states DB table.

I suspect you re-triggered the task with cylc trigger --flow=new or --flow=none??. By default (no --flow option) forced triggering makes the triggered task belong to existing flows (usually flow 1) and updates the existing DB entry, but otherwise it will create a new DB entry.

I’m reasonably sure that the task_states table should only record the latest instance of a task that triggered multiple times (whether in the same flow or not). [UPDATE: actually we do need a separate entry for the same task in a different flow]

This should be fixed in 8.0.3.