Disaster Recovery Methodology with Nexus 3.x

We are currently working on upgrading our Nexus 2.x instance to latest 3.x. The methodology we are following is:

  • Build new infrastructure for Nexus 3.x
  • Migrate artifacts from Nexus 2.x to Nexus 3.x
  • Setup automated and backup/restoration process for Nexus 3.x
  • Disaster Recover (DR) instance for Nexus 3.x in remote DC. Idea is to have an RTO and RPO of 15 mins which means we should be able to recover data uploaded till 15 mins from the point of failure

The main challenge we are facing is to sync the DB (OrientDB) between primary and DR sites. As per nexus documentation, there is no way to backup DB without putting nexus in read-only mode. Since we want to maintain RTO and RPO of 15 mins, we can’t afford to keep nexus DB out of sync for more than 15 mins. Is there any way we can achieve this with Nexus 3.x?

Regards
Ankit

Hi Ankit,

Seems like you want High Availability with Nexus 3.x. This is possible in Nexus 3.x using the pro feature HA-C

https://help.sonatype.com/repomanager3/high-availability

Regards,
Mahendra

The High Availability in Nexus 3.x does not currently support syncing to a remote DC due to latency.

@msurani We want to maintain a DR instance in a remote DC such that in case primary instance is down and can’t be restored in a reasonable time, we can use the DR instance without significantly impacting business

@ctolo When we were on Nexus 2.0, we were able to use rsync to maintain the sync to remote DC. The upgrade to Nexus 3.0 is broken because of introduction of OrientDB and the fact it can’t use a filesystem level backup/sync.

What approach do you guys suggest for us to take for maintaining a DR instance in remote DC.

You can still use a filesystem backup in Nexus 3 to maintain your RPO for 15 mins. The concern is that OrientDB might be caught in an inconstant state so a regular clean backup using the backup task is recommended as a failsafe. The ‘Reconcile component database from blob store’ will fix any discrepancies between the components in the blob-store and the OrientDB backup.

https://help.sonatype.com/repomanager3/backup-and-restore

https://help.sonatype.com/repomanager3/configuration/system-configuration#SystemConfiguration-TypesofTasksandWhentoUseThem
Repair - Reconcile component database from blob store

This will work fine if we are using file system for storing blob data. In our scenario, we are using S3 for storing blob data. During DR, we want to move to DR S3 bucket. But we are not able to do so because blob data is stored in OrientDB which holds details of primary S3 bucket name.
How can we update Blob configuration with DR S3 bucket name?

In the case of a DR, the OrientDB is accessible via the console to change the configuration of the S3 blob-store. I would check in with the Support Team on the exact SQL command needed to do this.

There is also this article on relocating a blob-store that might be helpful.

I am also curious if this is even needed. With 11 9s on the durability of S3 I question if you would need to switch to a DR copy of the whole bucket. Guess it is good to be prepared.

To execute the DB command, we have to login to the container because we are running nexus on container. The container will never come up if it is not able to login to the S3 bucket where blob is stored. In case of DR, since the Primary S3 bucket will not be available hence Nexus container will never come up. How can we overcome this issues?

The DB command does not depend on the NXRM container to be running, rather connection to the persistent storage containing the /sonatype-work directory. You should be able to connect to this storage when the NXRM container is not running to make the change.

Please share the steps on how can we login to the orient db if the NXRM container is not running. I am having trouble login into it because “nexus-orient-console.jar” is not available on the server.

Hello Shekhar

Assuming that you are in k8s or similar, you can always create a deployment which is exactly same as the primary deployment with the exception that it has command: and args: something like-

    command: ["/bin/bash", "-c"]
    args:
      - tail -f /dev/null

Or in docker with the --entry-point set to something like /bin/bash. Keep the same image and volumes and you’ll have the ability to perform this task without having to first shutdown the NXRM service.

We are using k8s for Nexus3 OSS deployment. For DR across datacenter, we use rsync. We sync the following using CronJobs that run every 5 mins-

/etc/rsyncd/conf/filter.rsync (minimal set that is sufficient to start new instance):

+ /blobs/
+ /db/
+ /etc/
+ /keystores/
P /.snapshot/
H /*
H /*/
H /.*

Use something like-
rsync -av --delete rsync://user@server:port/nexus-data/ /nexus-data/ --filter '. /etc/rsyncd/conf/filter.rsync'

Nexus3 has been good at rebuilding indexes for elasticsearch, etc. The database backup and recovery can be put in place, but I notice that Nexus does that at startup anyways.