Hello,
I wrote a suite for WRF forecast cycling, based on GFS data. The download of GFS data takes a long time, so every time I branch off a forecast (from the update runs) I split the forecast run into 15 chunks of a day each.
I download the forecast data for the first dat of the forecast (f000…f024), then run the forecast run for that 24hr period (I call that fcst00). In the mean time the input data for the next 24 hour of the forecast is being downloaded (f025…f048)and when that’s finished AND fcst00 is done I can run the next forecast day (fcst01) and so forth, up to fcst15.
The goal is to interleave the downloading of the GFS forecast with running the nested WRF forecast by breaking everything into daily chunks, and thus leapfrogging the downloading tasks and forecast run tasks.
Each new 24-hour forecast run produces a netcdf file containing a single day’s worth of data, but the final product is a netcdf file containing the entire timeline of the 15 day forecast, so I need to concatenate the netcdf files (Using ncrcat from NCO-tools).
When day 01 of the forecast is finished I concatenate fcst00 and fcst01 files, when day 2 of the forecast is finished I concatenate fcst00, fcst01 and fcst02, and so forth. When the run for the last forecast day is finished I concatenate all the fcst00, fcst01, fcst02, fcst03 … fcst15 files into the finished product.
Now my question: When the suite is ready to concatenate fcst00 … fcst15 (the final and last concatenation task for that forecast), but for some reason the earlier concatenation tasks haven’t run yet, I don’t need to run them. There’s no need to concatenate fcst00…fcst14 to produce a file that would be immediately overwritten by the longer output of concatenating fcst00…fcst15
What I called concatenation tasks above is called an archive task in the suite, it takes data from the cylc-run share folder and archives them to the shared storage.
The definition for the task is:
{% for DAY in range(0,FORECAST_DAYS|int) %}
[[fcst{{ '%02d' % DAY }}_wrf_archive]]
inherit = FCST_WRF_ARCHIVE, FORECAST{{ '%02d' % DAY }}_RUN
[[[environment]]]
# for each domain, concatenate the forecast file up to that forecast-day
{% for DOM in range(1,NDOMS+1) %}
SOURCE_FILES_d{{ '%02d' % DOM }} = """
${ROSE_DATAC}/fcst00_postproc_files/$(rose date -c --offset={{CYCLING_INTVAL}} -f"%Y%m%dT%H%MZ")_wrfout_hourly_d{{ '%02d' % DOM }}_fd00.nc
{% for D in range(1,DAY+1) %}
${ROSE_DATAC}/fcst{{ '%02d' % D }}_postproc_files/$(rose date -c --offset=PT{{D * 24}}H -f"%Y%m%dT%H%MZ")_wrfout_hourly_d{{ '%02d' % DOM }}_fd{{ '%02d' % D }}.nc
{% endfor %}
"""
TARGET_FILE_d{{ '%02d' % DOM }}=${EXT_ARCHIVE_DIR}/${ROSE_TASK_CYCLE_TIME}_wrfout_hourly_fcst_d{{ '%02d' % DOM }}.nc
{% endfor %}
{% endfor %}
So you can see how every new fcstXX_wrf_archive includes 1 more file in its SOURCE_FILES, thus producing an ever-growing timeline in the TARGET_FILE.
The graph for this is currently:
{% for DAY in range(0,FORECAST_DAYS|int) %}
fcst{{ '%02d' % DAY }}_wrf_run => fcst{{ '%02d' % DAY }}_wrf_postproc => fcst{{ '%02d' % DAY }}_wrf_archive => housekeep
{% endfor %}
{% for DAY in range(1,FORECAST_DAYS|int) %}
fcst{{ '%02d' % (DAY-1) }}_wrf_archive => fcst{{ '%02d' % DAY }}_wrf_archive
{% endfor %}
The first part simply describes the sequence of run->post_process->archive, the second part says that each the fcstXX_wrf_archive task of a preceeding forecast day is the precondition for the fcstYY_wrf_archive task of the current forecast day. This was supposed to achieve that the tasks run in numerical order fcst00, fcst01… fcst15. And also I have to ensure that several archive tasks don’t run at the same time.
But I’d like a higher ranking fcstYY_wrf_archive task (when it succeeds) to suicide a lower-ranking wrf_archive task.
I tried to relace the 2nd chunk of the graph definition with this:
{% for DAY in range(1,FORECAST_DAYS|int) %}
fcst{{ '%02d' % DAY }}_wrf_archive:succeed => ! fcst{{ '%02d' % (DAY-1) }}_wrf_archive
{% endfor %}
But this doesn’t work. The suicide trigger doesn’t ‘catch’.
What’s a better way of doing this? Thank you!
P.S. A similar problem arises when a forecasting suite is running behing real time (e.g. in catch-up mode). When the inputs for the forecast on DAY X are available, I no longer need to run the forecast for day X-1 (or X-2), as their output will be superseded by the forecast on day X which should better than what yesterdays data would produce. It’s sort of like what I described above - a task suiciding some previous tasks upon successful completion. Any ideas?