Blob store in permanent "Failed" state after running out of disk space


we are running Nexus OSS 3.31.1-01 on a Kubernetes cluster and recently had the problem that the volume for the blob store ran out of disk space.
The disk space problem is now resolved, however the blob store seems to be permanently corrupted.

In the logs we find this message:

2021-07-08 07:27:02,848+0000 INFO [FelixStartLevel] *SYSTEM - Blob store metrics file /nexus-data/blobs/default/ null not found - initializing at zero.
2021-07-08 07:27:02,922+0000 ERROR [FelixStartLevel] *SYSTEM - Failed transition: NEW -> STARTED /nexus-data/blobs/default/ (Stale file handle) 

The file mentioned in the message is present and non-empty. The whole blob store is residing on a GlusterFS volume.

All tasks doing anything on this blob store (like compact, reconcile etc) fail with the message “Invalid state: Failed”.

Any ideas what we can do, possibly recovering the artifacts still stored in this blob store?

Probably you should try to figure out why your pod says this attempting to access the file.

Also do note that glusterfs is not supported, and is known to cause problems:

I was able to resolve the issue.

At first I had ruled out the problem to be a file system issue because from a shell inside the container, I was able to access the file without problems.

However there seems to have been something wrong with the directory containing it.
The volume is mounted on /nexus-data/blobs, the blob store is under /nexus-data/blobs/default.

  • I renamed the directory to /nexus-data/blobs/default-old,
  • created a new directory /nexus-data/blobs/default,
  • Moved the whole contents from /nexus-data/blobs/default-old to /nexus-data/blobs/default

I restarted the Nexus pod and the blob store was recognized again, however it said that it contained 0 blobs and had a size of 1 byte.

This was then fixed by a “reconcile” job.