Seeking help with solving the issue of Nexus restart caused by Nexus Healthy Check failure

tci00646 · January 19, 2024, 5:57am

Hello everyone,

Recently, I have encountered an issue while using Sonatype Nexus Repository and I am hoping to get some assistance here. The problem is that the Nexus restarts due to Nexus Healthy Check failure. I am seeking advice and solutions on how to resolve this issue.

Here, I would like to provide some necessary information to help you better assist me in resolving this problem:

Nexus version: OSS 3.20.1-01

Environment: EKS 1.21

Healthy Check configuration (Kubernetes Probe):

    readinessProbe:
      failureThreshold: 6
      httpGet:
        path: /
        port: 8081
        scheme: HTTP
      initialDelaySeconds: 900
      periodSeconds: 30
      successThreshold: 1
      timeoutSeconds: 10

    livenessProbe:
      failureThreshold: 6
      httpGet:
        path: /
        port: 8081
        scheme: HTTP
      initialDelaySeconds: 900
      periodSeconds: 30
      successThreshold: 1
      timeoutSeconds: 10

Before the service restarts, there were no fatal exceptions in the logs. However, users reported the service being unavailable 20 minutes before the restart.
We have collected warning and error logs from the period before the issue occurred, as follow file.

Additionally, we have noticed that the users made multiple calls to the search API before the restart, which can be confirmed from the requests.log. In the standard output logs, we can see the following message. I believe this message is from Elasticsearch. Could it be the cause of the service restart? Also, does modifying the Elasticsearch configuration file allow for an increase in the queue? Are there any other impacts?

Unexpected exception: Failed to execute phase [fetch], [reduce] ; shardFailures {[ISndkEWeSyuBePZHbi7rag][06cb8dcc74b10e28188651531a0f3a8c67f138a7][0]: RemoteTransportException[[3AA6A985-F7EC0FC9-13997EF7-3040D930-F0DD43D8][local[1]][indices:data/read/search[phase/fetch/id]]]; nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$4@20ec7be3 on EsThreadPoolExecutor[search, queue capacity = 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@2fe8c189[Running, pool size = 10, active threads = 10, queued tasks = 1000, completed tasks = 517937]]]; }

Thank you very much for your assistance and suggestions regarding this issue. If you have any insights or suggestions on how to resolve this problem, please feel free to reply to this post.

Thank you all!