Yes 1.8GB is a little on the large side.
There’s no documentation for housekeeping the DB that I’m aware of so I’ll take a stab at it here…
The database has three continuously growing tables, these will be the ones taking up space:
- task_events
- task_jobs
- task_states
For amusement value there is a command you can use to poke at the DB:
$ sqlite_analyzer path/to/db
You can introspect the database structure from the sqlite prompt:
$ sqlite path/to/db
sqlite> . schema
From that introspection here’s the structure of the big tables:
CREATE TABLE task_events(name TEXT, cycle TEXT, time TEXT, submit_num INTEGER, event TEXT, message TEXT);
CREATE TABLE task_jobs(cycle TEXT, name TEXT, submit_num INTEGER, is_manual_submit INTEGER, try_num INTEGER, time_submit TEXT, time_submit_exit TEXT, submit_status INTEGER, time_run TEXT, time_run_exit TEXT, run_signal TEXT, run_status INTEGER, platform_name TEXT, batch_sys_name TEXT, batch_sys_job_id TEXT, PRIMARY KEY(cycle, name, submit_num));
CREATE TABLE task_states(name TEXT, cycle TEXT, flow_label TEXT, time_created TEXT, time_updated TEXT, submit_num INTEGER, status TEXT, PRIMARY KEY(name, cycle, flow_label));
All timestamps are currently stored as ISO8601 strings (which is a contributing factor to that 1.8GB), they are lexicographically sortable so a simple <
is all that’s needed to filter out older entries e.g:
SELECT *
FROM task_states
WHERE time_updated < "2020-10-16T08:33:12Z";
So a simple housekeep script could look something like this:
DELETE
FROM task_events
WHERE time < "<time>";
DELETE
FROM task_jobs
WHERE time_submit < "<time>";
DELETE
FROM task_states
WHERE time_updated < "<time>";