Nexus 3.27.0-03 (K8S: 1.18) restarts[killed] during clean up task - troubleshooting

running nexus 3.27.0-03 on Kubernetes 1.18.
created a clean up police (snapshots older than 30 days) and associated with maven2/snapshots repo. Manually trigger the clean up task and after 3-4 mins it gets interrupted and nexus restarts.

2020-12-09 05:36:20,492+0000 INFO [qtp1056665874-49] Pedro.Salomao - Created session for user: Pedro.Salomao
2020-12-09 05:36:50,303+0000 INFO [qtp1056665874-436] Pedro.Salomao - Task ‘Cleanup service’ [repository.cleanup] runNow
2020-12-09 05:36:50,304+0000 INFO [qtp1056665874-436] Pedro.Salomao - Task ‘Cleanup service’ [repository.cleanup] state change WAITING -> RUNNING
2020-12-09 05:36:50,323+0000 INFO [quartz-9-thread-20] *SYSTEM - Task log: /nexus-data/log/tasks/repository.cleanup-20201209053650320.log

The clean up log does not show anything interesting other than few snapshots that got deleted. What classes/logs (levels) do I need to enable as to be able to isolate what causes nexus to restart/get killed?

Thanks /Pedro

I doubt this is something within Nexus, you’ll probably need to look at what your OS might be doing (e.g. virus scanner, se linux, etc.)

Hi Matthew, thanks. I don’t quite follow you. Can you please elaborate? Nexus (container) is running inside POD within dedicated namespace. Resources allocation are as per recommendation and nexus ‘server’ restarts on its own every time scheduled clean up task (snapshots) is executed. So, trying to isolate what leads to that restart (not the POD/container that gets restarted/killed rather the nexus/java inside the container). Thanks

Can you monitor phys and JVM memory usage of that container to rule out k8s killing your pod? It is possible that your use case requires higher values than those recommended by us.

Hi David, if the POD was killed due to OOM then we would have seen that in pod describe and that’s not the case here. If the Java/nexus process that seems to get killed/restarted. Can it be that clean up tasks leads to fairly high heap consumption (and perhaps default allocated heap isn’t enough)? Your thoughts? Thanks

Based on sonartype-nexus/helm charts documentation at github - this is the default (I will double check)

[{install4jAddVmParams: -Xms1200M -Xmx1200M -XX:MaxDirectMemorySize=2G -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap}]

Perhaps will monitor mem consumption, as you suggested, during clean up task and heap isn’t enough then can do helm update to pass above though higher values…

The log you posted suggests to me that it was an external event that caused your NXRM to restart - that’s why I’m asking you to monitor your memory usage, especially since you’re running in k8s.

I’m afraid the config you’ve posted is out of date, please check our current guidance at System Requirements. Remember that this is more of a guide as the actual requirements depend on your usage pattern.

1 Like

Indeed, it was the heap :frowning: have just adjusted that, based on your guidelines (thanks for the link), and was able to successfully execute clean up task - no problem this time. Thanks for the help. Appreciated

I’m glad to hear you solved it. Cheers!

1 Like