Relocating from File blob store to S3 blob store


#1

When setting up a new blob store I noticed there’s an option for an S3-based blob store! Super nifty!

I’m migrating over from nexus2 and I’m setting up my blob stores as file blob stores-- I know they work and they work well-- but I’d love to experiment some with S3 and if it works out, convert/relocate/migrate (whatever the correct verb for this operation is) existing file stores to S3. The knowledgebase is pretty light on the S3 subject though.

What’s the general wisdom on whether it is good to make the change, what the considerations are, and if it is indeed a good fit, how to make the change?


#2

@cgamache At this time, the easiest way to get up and running with S3 blob stores is to start from scratch. There are a couple scripts floating around that basically do a download and upload to move the data. We are currently working on in-product features that will allow you to move components between blob stores and group file blob stores together with s3 blob stores for a single repository.


#3

That sounds great!

A follow-up question:

Is the vol-chap scheme the same in the S3 bucket such that one could aws s3 sync the /opt/sonatype-work/nexus3/blobs/blah up to S3 or from a bucket back down to a filesystem? That would be really fantastic if it were.


#4

It is.

However, it would take quite a bit of gymnastics to convert things using ‘aws s3 sync’ as you describe. That would be a useful way to create a backup.

This is the relevant enhancement request to watch and vote on :


#5

In NEXUS-12016 it looks like people want to split blobstores up by repo. Changing medium would be a bonus.

If all the metadata is indeed the same, and if I was just going to move the whole shebang up to S3, then sync might work?

I’m tempted to just give it a try, but I’m not so foolhardy that I would do it without at least checking with the more knowledgeable about the (in)advisability of such a maneuver. I don’t want to irreparably damage my nexus install.

It seems pretty harmless though-- standing up a new blobstore, fill it from my disk-based blobstore, then create all new repos that point to that blobstore… Is there any (re)indexing that I would have to perform on the new repos?

I’ll still give the issue an upvote! I can definitely see how useful it would be to split a blobstore.


#6

That’s the part that’s not fully flushed out at the moment. Like you said, it’d be easy to sync and point a blobstore at the bucket. The problem is that you need some sort of reindex, which doesn’t exist as of 3.13, so that the blobs (i.e. the object files) can be referenced.


#7

Understood! Godspeed to all the wonderful developers who will be working on this.