"database disk image is malformed" error

Hi,

I have a suite that keeps getting shut down a few times a day with the message, “database disk image is malformed”. A very similar suite that started about the same time doesn’t suffer from this problem.

One of our IT support checked the file, “~jtl548/cylc-run/u-bc711/log/db” and he says that normal sqlite3 tools do not show any problem with the file.

Can anyone suggest what might be going on and offer a possible solution?

Thank you.

Regards,

Jin

Hi Jin,

I’ve never seen that error, sorry (maybe others have, let’s see…).

If you’re only seeing the problem in one suite, that does suggest the suite database has been corrupted. Maybe your IT support person did not try to read the entire database?

This page shows what appears to be a straightforward way to confirm the problem and fix it using sqlite3: https://starbeamrainbowlabs.com/blog/article.php?article=posts%2F315-sqlite-database-malformed.html

It might be worth a try.

Hilary

Hi Hilary,

I did what the page suggested,

accessdev:/home/548/jtl548/cylc-run/u-bc711/log> sqlite3 db ‘PRAGMA integrity_check’
ok

Perhaps, I will wait and see if there are others who have experienced this type of problem and have a solution.

Thanks.

Jin

You may also want to check the SQLite file at ~jtl548/cylc-run/u-bc711/.service/db.

Hi Hilary and Matt,

I fixed the 2 databases, “~/cylc-run/u-bc711/log/db” and “~/cylc-run/u-bc711/.service/db” following the instruction on the webpage that Hilary suggested (even though sqlite3 said they were OK). The suite ran for 3 days without the “database disk image is malformed” error occurring - in the past I would hit this problem everyday. However the problem didn’t go away entirely. I hit the problem this morning.

Hi Hilary and Matt,

I think I have located the cause of the “database disk image is malformed” error. It was to do with our Lustre file system. I switched to using the local disk of our server which acts as the localhost and the problem went away. I haven’t investigated exactly what aspect of the Lustre file system is causing the problem. Perhaps that’s for another day.

Hopefully this will help others who may encounter the same issue.

Thanks for the update @jinlee - that’s great (it would still be nice to know what’s going wrong if you use Lustre, but this sort of thing can be very hard to debug…).