Hello,
We have problems with our Nexus version 3.28.0.
We are not 100% sure when it all started but I suspect it started when we deleted a raw repository containing a very large number of files (about 10 million). After we deleted the repository we also deleted the blob store containing only this deleted repository. After that we started to see out of memory error messages in nexus.log, making blob stores read only (appears after around 10-15 hours). We’ve increased the memory from 16G to 32G wich had little to no effect.
Before the out of memory error in nexus.log, I can see following getting written over and over in our gc log:
2021-01-12T12:09:01.414+0100: 17005.294: [Full GC (System.gc()) [PSYoungGen: 160K->0K(1312256K)] [ParOldGen: 2764589K->2661389K(2796544K)] 2764749K->2661389K(4108800K), [Metaspace: 177178K->177178K(1220608K)], 6.3275304 secs] [Times: user=23.60 sys=0.06, real=6.33 secs]
And nexus.log starting to write these messages:
2021-01-12 12:00:25,126+0100 INFO [elasticsearch[09F612E2-18103B3D-DBA65F31-B4037794-420B6B4F][scheduler][T#1]] *SYSTEM org.elasticsearch.monitor.jvm - [09F612E2-18103B3D-DBA65F31-B4037794-420B6B4F] [gc][old][16049][69] duration [7.8s], collections [1]/[8.4s], total [7.8s]/[6.8m], memory [2.7gb]->[2.7gb]/[3.9gb], all_pools {[young] [97mb]->[89.1mb]/[1.2gb]}{[survivor] [0b]->[0b]/[53mb]}{[old] [2.6gb]->[2.6gb]/[2.6gb]}
At this point the CPU is going bananas at 300%+ and we can see on the Nexus admin interface that the heap is running very low. Nexus can be in this state for a couple of hours before going out of memory and making blobs read only.
We have tried running REBUILD INDEX * and repair in the orintdb component, it found some indexes to fix, now it doesn’t show any error when running “check database”.
We have had info messages in nexus.log showing that our big blob is not deleted because it is not found. I recreated the blob store with the same name from the UI and I can now see the old blob store containing over 10 million blobs (but “only” 97GB). I have run the compact blob store task that should remove everything in that blob store but it only removed about 750k blobs/11m. The task said it went OK.
Does anyone here have similar experiences that might give us a hint on how we can solve this? As I said, I’m not even sure it has something to do with the big deleted blob, it might be something else - any help would be grateful.