Hello again,
I’ve investigated the stalling suite again. I found this in the workflow log:
2023-11-13T18:22:51Z ERROR - Incomplete tasks:
* 20231024T0400Z/crop_gpm_precip did not complete required outputs: ['succeeded']
2023-11-13T18:22:51Z WARNING - Partially satisfied prerequisites:
* 20231012T0000Z/housekeep is waiting on ['20231012T0000Z/get_modis_chla:expired', '20231012T0000Z/crop_modis_chla:succeeded', '20231012T0000Z/upd_convert_gfs:succeeded', '20231012T0000Z/upd_get_ncep_gfs:expired', '20231012T0000Z/get_modis_kd490:expired', '20231012T0000Z/crop_modis_kd490:succeeded', '20231012T0000Z/wrfda_get_ncep_gdas_obs:expired', '20231012T0000Z/wrfda_get_ncep_gdas_obs:succeeded', '20231012T0000Z/get_seviri:expired', '20231012T0000Z/get_seviri:succeeded']
* 20231013T0000Z/housekeep is waiting on ['20231013T0000Z/get_modis_chla:expired', '20231013T0000Z/crop_modis_chla:succeeded', '20231013T0000Z/upd_convert_gfs:succeeded', '20231013T0000Z/upd_get_ncep_gfs:expired', '20231013T0000Z/get_modis_kd490:expired', '20231013T0000Z/crop_modis_kd490:succeeded', '20231013T0000Z/wrfda_get_ncep_gdas_obs:expired', '20231013T0000Z/wrfda_get_ncep_gdas_obs:succeeded', '20231013T0000Z/get_seviri:expired', '20231013T0000Z/get_seviri:succeeded', '20231012T2300Z/housekeep:succeeded']
* 20231014T0000Z/housekeep is waiting on ['20231014T0000Z/get_modis_chla:expired', '20231014T0000Z/crop_modis_chla:succeeded', '20231014T0000Z/upd_convert_gfs:succeeded', '20231014T0000Z/upd_get_ncep_gfs:expired', '20231014T0000Z/get_modis_kd490:expired', '20231014T0000Z/crop_modis_kd490:succeeded', '20231014T0000Z/wrfda_get_ncep_gdas_obs:expired', '20231014T0000Z/wrfda_get_ncep_gdas_obs:succeeded', '20231014T0000Z/get_seviri:expired', '20231014T0000Z/get_seviri:succeeded', '20231013T2300Z/housekeep:succeeded']
* 20231012T1200Z/housekeep is waiting on ['20231012T1200Z/fcst07_get_ncep_gfs:expired', '20231012T1200Z/fcst07_convert_gfs:succeeded', '20231012T1200Z/fcst02_get_ncep_gfs:expired', '20231012T1200Z/fcst02_convert_gfs:succeeded', '20231012T1200Z/fcst04_get_ncep_gfs:expired', '20231012T1200Z/fcst04_convert_gfs:succeeded', '20231012T1200Z/fcst10_convert_gfs:succeeded', '20231012T1200Z/fcst10_get_ncep_gfs:expired', '20231012T1200Z/fcst01_convert_gfs:succeeded', '20231012T1200Z/fcst01_get_ncep_gfs:expired', '20231012T1200Z/fcst11_convert_gfs:succeeded', '20231012T1200Z/fcst11_get_ncep_gfs:expired', '20231012T1200Z/fcst15_convert_gfs:succeeded', '20231012T1200Z/fcst15_get_ncep_gfs:expired', '20231012T1200Z/fcst14_get_ncep_gfs:expired', '20231012T1200Z/fcst14_convert_gfs:succeeded', '20231012T1200Z/fcst06_convert_gfs:succeeded', '20231012T1200Z/fcst06_get_ncep_gfs:expired', '20231012T1200Z/fcst09_get_ncep_gfs:expired', '20231012T1200Z/fcst09_convert_gfs:succeeded',
'20231012T1200Z/fcst13_convert_gfs:succeeded', '20231012T1200Z/fcst13_get_ncep_gfs:expired', '20231012T1200Z/upd_convert_gfs:succeeded', '20231012T1200Z/upd_get_ncep_gfs:expired', '20231012T1200Z/fcst00_convert_gfs:succeeded', '20231012T1200Z/fcst00_get_ncep_gfs:expired', '20231012T1200Z/fcst05_convert_gfs:succeeded', '20231012T1200Z/fcst05_get_ncep_gfs:expired', '20231012T1200Z/fcst03_convert_gfs:succeeded', '20231012T1200Z/fcst03_get_ncep_gfs:expired', '20231012T1200Z/fcst08_get_ncep_gfs:expired', '20231012T1200Z/fcst08_convert_gfs:succeeded', '20231012T1200Z/fcst12_get_ncep_gfs:expired', '20231012T1200Z/fcst12_convert_gfs:succeeded', '20231012T1200Z/wrfda_get_ncep_gdas_obs:expired', '20231012T1200Z/wrfda_get_ncep_gdas_obs:succeeded', '20231012T1200Z/get_seviri:expired', '20231012T1200Z/get_seviri:succeeded', '20231012T1100Z/housekeep:succeeded']
* 20231015T0000Z/housekeep is waiting on ['20231015T0000Z/get_modis_chla:expired', '20231015T0000Z/crop_modis_chla:succeeded', '20231015T0000Z/upd_convert_gfs:succeeded', '20231015T0000Z/upd_get_ncep_gfs:expired', '20231015T0000Z/get_modis_kd490:expired', '20231015T0000Z/crop_modis_kd490:succeeded', '20231015T0000Z/wrfda_get_ncep_gdas_obs:expired', '20231015T0000Z/wrfda_get_ncep_gdas_obs:succeeded', '20231015T0000Z/get_seviri:expired', '20231015T0000Z/get_seviri:succeeded', '20231014T2300Z/housekeep:succeeded']
* 20231013T1200Z/housekeep is waiting on ['20231013T1200Z/fcst07_get_ncep_gfs:expired', '20231013T1200Z/fcst07_convert_gfs:succeeded', '20231013T1200Z/fcst02_get_ncep_gfs:expired', '20231013T1200Z/fcst02_convert_gfs:succeeded', '20231013T1200Z/fcst04_get_ncep_gfs:expired', '20231013T1200Z/fcst04_convert_gfs:succeeded', '20231013T1200Z/fcst10_convert_gfs:succeeded', '20231013T1200Z/fcst10_get_ncep_gfs:expired', '20231013T1200Z/fcst01_convert_gfs:succeeded', '20231013T1200Z/fcst01_get_ncep_gfs:expired', '20231013T1200Z/fcst11_convert_gfs:succeeded', '20231013T1200Z/fcst11_get_ncep_gfs:expired', '20231013T1200Z/fcst15_convert_gfs:succeeded', '20231013T1200Z/fcst15_get_ncep_gfs:expired', '20231013T1200Z/fcst14_get_ncep_gfs:expired', '20231013T1200Z/fcst14_convert_gfs:succeeded', '20231013T1200Z/fcst06_convert_gfs:succeeded', '20231013T1200Z/fcst06_get_ncep_gfs:expired', '20231013T1200Z/fcst09_get_ncep_gfs:expired', '20231013T1200Z/fcst09_convert_gfs:succeeded',
'20231013T1200Z/fcst13_convert_gfs:succeeded', '20231013T1200Z/fcst13_get_ncep_gfs:expired', '20231013T1200Z/upd_convert_gfs:succeeded', '20231013T1200Z/upd_get_ncep_gfs:expired', '20231013T1200Z/fcst00_convert_gfs:succeeded', '20231013T1200Z/fcst00_get_ncep_gfs:expired', '20231013T1200Z/fcst05_convert_gfs:succeeded', '20231013T1200Z/fcst05_get_ncep_gfs:expired', '20231013T1200Z/fcst03_convert_gfs:succeeded', '20231013T1200Z/fcst03_get_ncep_gfs:expired', '20231013T1200Z/fcst08_get_ncep_gfs:expired', '20231013T1200Z/fcst08_convert_gfs:succeeded', '20231013T1200Z/fcst12_get_ncep_gfs:expired', '20231013T1200Z/fcst12_convert_gfs:succeeded', '20231013T1200Z/wrfda_get_ncep_gdas_obs:expired', '20231013T1200Z/wrfda_get_ncep_gdas_obs:succeeded', '20231013T1200Z/get_seviri:expired', '20231013T1200Z/get_seviri:succeeded', '20231013T1100Z/housekeep:succeeded']
* 20231016T0000Z/housekeep is waiting on ['20231016T0000Z/get_modis_chla:expired', '20231016T0000Z/crop_modis_chla:succeeded', '20231016T0000Z/upd_convert_gfs:succeeded', '20231016T0000Z/upd_get_ncep_gfs:expired', '20231016T0000Z/get_modis_kd490:expired', '20231016T0000Z/crop_modis_kd490:succeeded', '20231016T0000Z/wrfda_get_ncep_gdas_obs:expired', '20231016T0000Z/wrfda_get_ncep_gdas_obs:succeeded', '20231016T0000Z/get_seviri:expired', '20231016T0000Z/get_seviri:succeeded', '20231015T2300Z/housekeep:succeeded']
* 20231017T0000Z/housekeep is waiting on ['20231017T0000Z/get_modis_chla:expired', '20231017T0000Z/crop_modis_chla:succeeded', '20231017T0000Z/upd_convert_gfs:succeeded', '20231017T0000Z/upd_get_ncep_gfs:expired', '20231017T0000Z/get_modis_kd490:expired', '20231017T0000Z/crop_modis_kd490:succeeded', '20231017T0000Z/wrfda_get_ncep_gdas_obs:expired', '20231017T0000Z/wrfda_get_ncep_gdas_obs:succeeded', '20231017T0000Z/get_seviri:expired', '20231017T0000Z/get_seviri:succeeded', '20231016T2300Z/housekeep:succeeded']
* 20231018T0000Z/housekeep is waiting on ['20231018T0000Z/get_modis_chla:expired', '20231018T0000Z/crop_modis_chla:succeeded', '20231018T0000Z/upd_convert_gfs:succeeded', '20231018T0000Z/upd_get_ncep_gfs:expired', '20231018T0000Z/get_modis_kd490:expired', '20231018T0000Z/crop_modis_kd490:succeeded', '20231018T0000Z/wrfda_get_ncep_gdas_obs:expired', '20231018T0000Z/wrfda_get_ncep_gdas_obs:succeeded', '20231018T0000Z/get_seviri:expired', '20231018T0000Z/get_seviri:succeeded', '20231017T2300Z/housekeep:succeeded']
* 20231014T1200Z/housekeep is waiting on ['20231014T1200Z/fcst07_get_ncep_gfs:expired', '20231014T1200Z/fcst07_convert_gfs:succeeded', '20231014T1200Z/fcst02_get_ncep_gfs:expired', '20231014T1200Z/fcst02_convert_gfs:succeeded', '20231014T1200Z/fcst04_get_ncep_gfs:expired', '20231014T1200Z/fcst04_convert_gfs:succeeded', '20231014T1200Z/fcst10_convert_gfs:succeeded', '20231014T1200Z/fcst10_get_ncep_gfs:expired', '20231014T1200Z/fcst01_convert_gfs:succeeded', '20231014T1200Z/fcst01_get_ncep_gfs:expired', '20231014T1200Z/fcst11_convert_gfs:succeeded', '20231014T1200Z/fcst11_get_ncep_gfs:expired', '20231014T1200Z/fcst15_convert_gfs:succeeded', '20231014T1200Z/fcst15_get_ncep_gfs:expired', '20231014T1200Z/fcst14_get_ncep_gfs:expired', '20231014T1200Z/fcst14_convert_gfs:succeeded', '20231014T1200Z/fcst06_convert_gfs:succeeded', '20231014T1200Z/fcst06_get_ncep_gfs:expired', '20231014T1200Z/fcst09_get_ncep_gfs:expired', '20231014T1200Z/fcst09_convert_gfs:succeeded',
and further down:
* 20231022T1800Z/housekeep is waiting on ['20231022T1800Z/upd_convert_gfs:succeeded', '20231022T1800Z/upd_get_ncep_gfs:expired', '20231022T1800Z/wrfda_get_ncep_gdas_obs:expired', '20231022T1800Z/wrfda_get_ncep_gdas_obs:succeeded', '20231022T1800Z/get_seviri:expired', '20231022T1800Z/get_seviri:succeeded', '20231022T1700Z/housekeep:succeeded']
* 20231022T1900Z/housekeep is waiting on ['20231022T1900Z/get_seviri:expired', '20231022T1900Z/get_seviri:succeeded', '20231022T1800Z/housekeep:succeeded']
* 20231022T2000Z/housekeep is waiting on ['20231022T2000Z/get_seviri:expired', '20231022T2000Z/get_seviri:succeeded', '20231022T1900Z/housekeep:succeeded']
* 20231022T2100Z/housekeep is waiting on ['20231022T2100Z/get_seviri:expired', '20231022T2100Z/get_seviri:succeeded', '20231022T2000Z/housekeep:succeeded']
and at the end
2023-11-13T18:22:51Z CRITICAL - Workflow stalled
2023-11-13T18:22:51Z WARNING - PT1H stall timer starts NOW
2023-11-13T19:22:51Z WARNING - stall timer timed out after PT1H
2023-11-13T19:22:51Z ERROR - Workflow shutting down - "abort on stall timeout" is set
2023-11-13T19:22:51Z INFO - platform: cluster-slurm - remote tidy (on metoc-cl4)
2023-11-13T19:22:52Z INFO - DONE
I find this strange because the graph explicitly says that :expired
should also lead to housekeep
.
Here’s the section from the graph in question:
{% if DOWNLOAD_SEVIRI is sameas true %}
[[[T-00]]] # every hour at zero minutes past (every hour on the hour).
# Task is clock triggered at <CYCLE> + 3:25 hours
graph = "(get_seviri | get_seviri:expired) => housekeep"
{% endif %} # DOWNLOAD_SEVIRI
There are more sections like this in the graph, but they all follow a similar pattern.
This syntax used to work in cylc7, am I getting something wrong in cylc8?
A slight variation of the pattern using the expired
status is this:
[[[T18]]]
# The upd_get_ncep_gfs task is clock triggered at <CYCLE> + 3 hours, 30 minutes
# The sst_get_ncep_gfs task is clock triggered at 00Z + 22 hours, 35 minutes (which is <CYCLE> + 4hours, 35 minutes)
graph = """
upd_get_ncep_gfs => upd_convert_gfs
(sst_get_ncep_gfs? | sst_get_ncep_gfs:expired?) => upd_convert_gfs
(upd_get_ncep_gfs:expired? | upd_convert_gfs) => housekeep
"""
and this part of the graph
{% if DOWNLOAD_NASA_GPM is sameas true %}
[[[T-00]]] # every hour at zero minutes past (every hour on the hour). Note that the - character takes the place of the hour digits as we may n$
# Task is clock triggered at <CYCLE> + 15 hours 50 minutes (e.g. the T00 is triggered at 19:50 GST)
graph = """
get_gpm_precip? => crop_gpm_precip
(get_gpm_precip:expired | crop_gpm_precip) => housekeep
"""
{% endif %} # DOWNLOAD_NASA_GPM
and I seem to remember you saying that the ?
-notation won’t work properly until v8.3. So what does this do in v8.2? I can see the get_gpm_precip
and crop_gpm_precip
tasks are also holding up the suite and causing it to stall. I’m starting to think this might be the wrong syntax for what I’m trying to (in cylc7 I did this with suicide triggers).
This suite which schedules the download of forcing data for various models makes heavy use of the :expired
mechanism, because lots of download sites do not store data indefinitely, but give you a 3-day or 7-day window within which the download is live and afterwards it’s not available anymore, hence I expire those tasks which have no chance of ever getting the data once it’s too late.
Thank you for all of your replies so far, and I’m grateful for any help!
Thank you, Fred