Method to listen for files?

I use “retries” to run tasks that look for files every 10 minutes.

Every attempt of this method creates another directory of log files, sometimes hundreds of these directories are created.

Is there a Cylc capability that “listens” for files, sort of a constantly running listener?

Yes, re-running the whole task every time is not very efficient

Here are three options:

1) Long-lived polling task

Rather than re-running your polling task when it fails you could move this logic into the task itself.

For example here’s an untested Bash implementation:

        script = """
            for TRY in $(seq 1 "$TRIES"); do
                if [[ -f "$FILE" ]]; then
                    exit 0
                sleep "$INTERVAL"
            exit 1
            FILE = /path/to/file
            TRIES = 100
            INTERVAL= 10

2) Rose polling

You can avoid the above boiler-plate by using a Rose application.

3) Cylc XTrigger

Cylc external triggers are Python functions which Cylc runs asynchronously and repeatedly until they return True.

Here’s an untested example:

# <workflow>/lib/python/
import os.path

def file_poll(path=None):
    return os.path.exists(path).exists(), {}
        # the :PT10M bit specifies how often Cylc should run this
        my_poller = file_poll(file='/path/to/file'): PT10M
        P1D = """
            @my_poller => do_something

Note: XTriggers run on the scheduler host (the place where the Cylc workflow process lives) so this approach only works if you can see the relevant filesystem from there.

For more info:

thank you o. for the informative response.

I’ve got xtriggers to handle remote/local file polling. I always meant to share them into the cylc-xtriggers repository, but I’ve not made them cylc8/py3 compatible and so haven’t shared outside of my organisation. I could probably attach them here if you wanted them. The code is a bit complex because I give people the option to:

  • poll for remote or local files
  • make sure the appropriate number of files exist if they come in sporadically
  • not start polling until some time period after a cycle point
  • finish polling some time after a cycle point to allow things to continue running without data
  • finish polling some time after the first poll was attempted
  • check that the file age is recent enough for their purposes (sometimes filenames don’t change, the contents just gets updated)
  • don’t action the same file multiple times (for ondemand models which get triggered by a file arrival)
  • don’t poll if a desired previous task has not reached a chosen status
  • some string replacements for a filename to insert 0 or non-0 padded date information into the filename

There is also a file_contains poll xtrigger I have to check the contents of files for either a regex string and/or number of lines in the file. I think it works, I haven’t got around to using it because I’ve been distracted. It did work once upon a time at least.

1 Like

all of the options arise at one time or another.

i would be most interested in seeing implementation of remote polling and action based on a specific number of files arriving.

please post and/or direct to a repository.

I’ve put them in Comparing cylc:master...ColemanTom:new_xtriggers · cylc/cylc-flow · GitHub

They are python2 andonly tested with Cylc 7.8.4. I guarantee they won’t work with Cylc8/python3. If someone could help with that, updating documentation to include descriptions, and figure out some way to test them properly (I do have some very limited pytest which aren’t added in this change as they don’t add much value), I would be happy to see them move into cylc-flow itself. I just don’t have time to figure out good end-to-end automated testing of them which I feel is necessary given the complexity.

1 Like

thank you TomC.

checked out and reviewing.