Nexus 3.27.0-03 (K8S: 1.18) restarts[killed] during clean up task - troubleshooting

peppe1977 · December 9, 2020, 6:03am

running nexus 3.27.0-03 on Kubernetes 1.18.
created a clean up police (snapshots older than 30 days) and associated with maven2/snapshots repo. Manually trigger the clean up task and after 3-4 mins it gets interrupted and nexus restarts.

2020-12-09 05:36:20,492+0000 INFO [qtp1056665874-49] Pedro.Salomao org.sonatype.nexus.rapture.internal.security.SessionServlet - Created session for user: Pedro.Salomao
2020-12-09 05:36:50,303+0000 INFO [qtp1056665874-436] Pedro.Salomao org.sonatype.nexus.quartz.internal.task.QuartzTaskInfo - Task ‘Cleanup service’ [repository.cleanup] runNow
2020-12-09 05:36:50,304+0000 INFO [qtp1056665874-436] Pedro.Salomao org.sonatype.nexus.quartz.internal.task.QuartzTaskInfo - Task ‘Cleanup service’ [repository.cleanup] state change WAITING -> RUNNING
2020-12-09 05:36:50,323+0000 INFO [quartz-9-thread-20] *SYSTEM org.sonatype.nexus.cleanup.internal.task.CleanupTask - Task log: /nexus-data/log/tasks/repository.cleanup-20201209053650320.log
Killed

The clean up log does not show anything interesting other than few snapshots that got deleted. What classes/logs (levels) do I need to enable as to be able to isolate what causes nexus to restart/get killed?

Thanks /Pedro

mpiggott · December 9, 2020, 3:24pm

I doubt this is something within Nexus, you’ll probably need to look at what your OS might be doing (e.g. virus scanner, se linux, etc.)

peppe1977 · December 9, 2020, 3:41pm

Hi Matthew, thanks. I don’t quite follow you. Can you please elaborate? Nexus (container) is running inside POD within dedicated namespace. Resources allocation are as per recommendation and nexus ‘server’ restarts on its own every time scheduled clean up task (snapshots) is executed. So, trying to isolate what leads to that restart (not the POD/container that gets restarted/killed rather the nexus/java inside the container). Thanks

dsawa · December 9, 2020, 4:14pm

Can you monitor phys and JVM memory usage of that container to rule out k8s killing your pod? It is possible that your use case requires higher values than those recommended by us.

peppe1977 · December 9, 2020, 4:30pm

Hi David, if the POD was killed due to OOM then we would have seen that in pod describe and that’s not the case here. If the Java/nexus process that seems to get killed/restarted. Can it be that clean up tasks leads to fairly high heap consumption (and perhaps default allocated heap isn’t enough)? Your thoughts? Thanks

peppe1977 · December 9, 2020, 4:39pm

Based on sonartype-nexus/helm charts documentation at github - this is the default (I will double check)

[{install4jAddVmParams: -Xms1200M -Xmx1200M -XX:MaxDirectMemorySize=2G -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap}]

Perhaps will monitor mem consumption, as you suggested, during clean up task and heap isn’t enough then can do helm update to pass above though higher values…

dsawa · December 10, 2020, 9:14am

The log you posted suggests to me that it was an external event that caused your NXRM to restart - that’s why I’m asking you to monitor your memory usage, especially since you’re running in k8s.

I’m afraid the config you’ve posted is out of date, please check our current guidance at System Requirements. Remember that this is more of a guide as the actual requirements depend on your usage pattern.

peppe1977 · December 10, 2020, 12:14pm

Indeed, it was the heap have just adjusted that, based on your guidelines (thanks for the link), and was able to successfully execute clean up task - no problem this time. Thanks for the help. Appreciated

dsawa · December 10, 2020, 2:38pm

I’m glad to hear you solved it. Cheers!