Syncing two nexus instances pointing to the same s3 blob storage

GTT · February 2, 2022, 1:02am

I have an interesting issue. I’m responsible for syncing two different Nexus Repository instances. Both are running Nexus 3 (3.20 and 3.30 if it’s relevant). These instances are hosted in two different kubernetes clusters.

My goal is to make sure that every artifact in the older instance (3.20) exists in the newer instance (3.30). The hitch is, that they are using the same s3 bucket (both pointed at the root bucket) for their blob storage. So far, I don’t believe there have been any collisions.

I found this blog post around moving artifacts around: https://blog.sonatype.com/2010/04/nexus-tip-moving-artifacts-between-nexus-repositories/

It’s not quite the same thing, but I’m wondering if it’s possible to just rebuild the repository metadata and reindex the repository on the newer instance as they’re effectively sharing the same storage already.

My main concern with this strategy is that several artifacts already exist in both repositories, and I’m wondering if this would cause problems during re-indexing.

mpiggott · February 2, 2022, 5:21pm

This is a really bad idea and may result in one Nexus deleting the blobs for the other.

GTT · February 2, 2022, 5:49pm

I am aware. I didn’t set it up this way.

I just need to make sure everything from the old instance is in the new instance. After that I can shut down the old one.

mprescott · February 4, 2022, 10:16pm

You’re in uncharted territory here, but rebuilding metadata and/or reindexing won’t help you pick up the older instance’s components in the new instance. Maven/yum metadata, and the search index are both rebuilt from the records in the embedded component database (OrientDB), and don’t look in blob storage for stray/unattached blobs.

The feature you’re looking for is a scheduled task called “Repair - Reconcile component database from blob store.” This causes Repository to iterate everything in S3 and, if possible, create component records for it.

This will probably increase your exposure to corruption, but it’s the “push through the pain” option.

What you should do if at all possible first is to shut down the “new” instance and get it pointing to a copy of the shared S3 bucket. That way the two instances are detangled, and the new instance won’t suffer any additional corruption from the actions of the old instance.