I have a suite that keeps getting shut down a few times a day with the message, “database disk image is malformed”. A very similar suite that started about the same time doesn’t suffer from this problem.
One of our IT support checked the file, “~jtl548/cylc-run/u-bc711/log/db” and he says that normal sqlite3 tools do not show any problem with the file.
Can anyone suggest what might be going on and offer a possible solution?
I’ve never seen that error, sorry (maybe others have, let’s see…).
If you’re only seeing the problem in one suite, that does suggest the suite database has been corrupted. Maybe your IT support person did not try to read the entire database?
I fixed the 2 databases, “~/cylc-run/u-bc711/log/db” and “~/cylc-run/u-bc711/.service/db” following the instruction on the webpage that Hilary suggested (even though sqlite3 said they were OK). The suite ran for 3 days without the “database disk image is malformed” error occurring - in the past I would hit this problem everyday. However the problem didn’t go away entirely. I hit the problem this morning.
I think I have located the cause of the “database disk image is malformed” error. It was to do with our Lustre file system. I switched to using the local disk of our server which acts as the localhost and the problem went away. I haven’t investigated exactly what aspect of the Lustre file system is causing the problem. Perhaps that’s for another day.
Hopefully this will help others who may encounter the same issue.
Thanks for the update @jinlee - that’s great (it would still be nice to know what’s going wrong if you use Lustre, but this sort of thing can be very hard to debug…).