Nexus running out of memory

emil.eriksson.palmqvist · January 14, 2021, 11:55am

Hello,

We have problems with our Nexus version 3.28.0.
We are not 100% sure when it all started but I suspect it started when we deleted a raw repository containing a very large number of files (about 10 million). After we deleted the repository we also deleted the blob store containing only this deleted repository. After that we started to see out of memory error messages in nexus.log, making blob stores read only (appears after around 10-15 hours). We’ve increased the memory from 16G to 32G wich had little to no effect.

Before the out of memory error in nexus.log, I can see following getting written over and over in our gc log:
2021-01-12T12:09:01.414+0100: 17005.294: [Full GC (System.gc()) [PSYoungGen: 160K->0K(1312256K)] [ParOldGen: 2764589K->2661389K(2796544K)] 2764749K->2661389K(4108800K), [Metaspace: 177178K->177178K(1220608K)], 6.3275304 secs] [Times: user=23.60 sys=0.06, real=6.33 secs]

And nexus.log starting to write these messages:
2021-01-12 12:00:25,126+0100 INFO [elasticsearch[09F612E2-18103B3D-DBA65F31-B4037794-420B6B4F][scheduler][T#1]] *SYSTEM org.elasticsearch.monitor.jvm - [09F612E2-18103B3D-DBA65F31-B4037794-420B6B4F] [gc][old][16049][69] duration [7.8s], collections [1]/[8.4s], total [7.8s]/[6.8m], memory [2.7gb]->[2.7gb]/[3.9gb], all_pools {[young] [97mb]->[89.1mb]/[1.2gb]}{[survivor] [0b]->[0b]/[53mb]}{[old] [2.6gb]->[2.6gb]/[2.6gb]}

At this point the CPU is going bananas at 300%+ and we can see on the Nexus admin interface that the heap is running very low. Nexus can be in this state for a couple of hours before going out of memory and making blobs read only.

We have tried running REBUILD INDEX * and repair in the orintdb component, it found some indexes to fix, now it doesn’t show any error when running “check database”.

We have had info messages in nexus.log showing that our big blob is not deleted because it is not found. I recreated the blob store with the same name from the UI and I can now see the old blob store containing over 10 million blobs (but “only” 97GB). I have run the compact blob store task that should remove everything in that blob store but it only removed about 750k blobs/11m. The task said it went OK.

Does anyone here have similar experiences that might give us a hint on how we can solve this? As I said, I’m not even sure it has something to do with the big deleted blob, it might be something else - any help would be grateful.

mpiggott · January 14, 2021, 2:51pm

When you say increased the memory, do you mean the system memory or are you updating the JVM options (see Configuring the Runtime Environment)

emil.eriksson.palmqvist · January 14, 2021, 3:06pm

I both added system memory and updated the JVM optitions according to the recommendations here
https://help.sonatype.com/repomanager3/installation/system-requirements
I have 32GB RAM on the system so I have set the JVM optitions as:
-Xms4G
-Xmx4G
-XX:MaxDirectMemorySize=17530M

mpiggott · January 18, 2021, 6:46pm

You could use jstack while the system is consuming a lot of resources to try to get an idea of what is consuming resources, probably look at the sonatype packages for classes which might give an idea

emil.eriksson.palmqvist · January 19, 2021, 6:49am

Our nexus is running stable now.

We figured it out by turning on debug log for org.sonatype.nexus.blobstore and could se that it marked blobs for deletion. When it was freshly restarted it marked about 30 blobs/second (it had to run for about 92 hours to delete all blobs in the blob-store). I haven’t looked to much at the java code because that’s not my area but I suspect that Nexus somehow buffers up a deletion que when you press delete on a large blob in the UI. When it was working its way through this que we had performance issues leading up to a OOM error (possible memory leak?).