Method to listen for files?

schaferk · August 24, 2022, 2:08pm

I use “retries” to run tasks that look for files every 10 minutes.

Every attempt of this method creates another directory of log files, sometimes hundreds of these directories are created.

Is there a Cylc capability that “listens” for files, sort of a constantly running listener?

oliver.sanders · August 24, 2022, 3:34pm

Yes, re-running the whole task every time is not very efficient

Here are three options:

1) Long-lived polling task

Rather than re-running your polling task when it fails you could move this logic into the task itself.

For example here’s an untested Bash implementation:

[runtime]
    [[poller]]
        script = """
            for TRY in $(seq 1 "$TRIES"); do
                if [[ -f "$FILE" ]]; then
                    exit 0
                fi
                sleep "$INTERVAL"
            done
            exit 1
         """
        [[[environment]]]
            FILE = /path/to/file
            TRIES = 100
            INTERVAL= 10

2) Rose polling

You can avoid the above boiler-plate by using a Rose application.

http://metomi.github.io/rose/doc/html/tutorial/rose/furthertopics/polling.html

3) Cylc XTrigger

Cylc external triggers are Python functions which Cylc runs asynchronously and repeatedly until they return True.

Here’s an untested example:

# <workflow>/lib/python/file_poll.py
import os.path

def file_poll(path=None):
    return os.path.exists(path).exists(), {}

[scheduling]
    [[xtriggers]]
        # the :PT10M bit specifies how often Cylc should run this
        my_poller = file_poll(file='/path/to/file'): PT10M
    [[graph]]
        P1D = """
            @my_poller => do_something
        """

Note: XTriggers run on the scheduler host (the place where the Cylc workflow process lives) so this approach only works if you can see the relevant filesystem from there.

For more info:

https://cylc.github.io/cylc-doc/stable/html/user-guide/writing-workflows/external-triggers.html#custom-trigger-functions

schaferk · August 24, 2022, 4:30pm

thank you o. for the informative response.

TomC · August 31, 2022, 7:36am

I’ve got xtriggers to handle remote/local file polling. I always meant to share them into the cylc-xtriggers repository, but I’ve not made them cylc8/py3 compatible and so haven’t shared outside of my organisation. I could probably attach them here if you wanted them. The code is a bit complex because I give people the option to:

poll for remote or local files
make sure the appropriate number of files exist if they come in sporadically
not start polling until some time period after a cycle point
finish polling some time after a cycle point to allow things to continue running without data
finish polling some time after the first poll was attempted
check that the file age is recent enough for their purposes (sometimes filenames don’t change, the contents just gets updated)
don’t action the same file multiple times (for ondemand models which get triggered by a file arrival)
don’t poll if a desired previous task has not reached a chosen status
some string replacements for a filename to insert 0 or non-0 padded date information into the filename

There is also a file_contains poll xtrigger I have to check the contents of files for either a regex string and/or number of lines in the file. I think it works, I haven’t got around to using it because I’ve been distracted. It did work once upon a time at least.

schaferk · August 31, 2022, 12:07pm

all of the options arise at one time or another.

i would be most interested in seeing implementation of remote polling and action based on a specific number of files arriving.

please post and/or direct to a repository.

TomC · September 1, 2022, 2:25am

I’ve put them in Comparing cylc:master...ColemanTom:new_xtriggers · cylc/cylc-flow · GitHub

They are python2 andonly tested with Cylc 7.8.4. I guarantee they won’t work with Cylc8/python3. If someone could help with that, updating documentation to include descriptions, and figure out some way to test them properly (I do have some very limited pytest which aren’t added in this change as they don’t add much value), I would be happy to see them move into cylc-flow itself. I just don’t have time to figure out good end-to-end automated testing of them which I feel is necessary given the complexity.

schaferk · September 2, 2022, 2:17pm

thank you TomC.

checked out and reviewing.

Topic		Replies	Views
Best practice for on demand systems? Cylc Support	14	1196	August 20, 2020
Xtriggers functionality Cylc Support	2	558	November 25, 2021
External triggers (xtriggers) and performance Cylc Support	0	391	August 1, 2019
Wall time periods of allowed task startup Tips	4	207	November 20, 2023
Execution retry delays and execution job polling Cylc Support	1	703	February 4, 2021

Method to listen for files?

1) Long-lived polling task

2) Rose polling

3) Cylc XTrigger

Related topics