Leveraging Nexus Repository for Efficient AI Model Management

priyank.patil · January 28, 2024, 4:28pm

Hello Nexus Sonatype Community,

As a long-time Nexus user, I’ve recently ventured into AI and encountered a significant challenge that I believe Nexus could help solve. AI model files are notoriously large, ranging from hundreds of megabytes to tens of gigabytes. Each version is a separate file, leading to two major issues:

Storage Capacity: These large files quickly consume local machine storage.
Network Efficiency: If team members require the same model, they must individually download it, which is a slow and bandwidth-intensive process.

I propose utilizing a local network-based Nexus repository specifically for AI models. This approach offers several benefits:

Single Download: Models are downloaded from the internet once, saving time and bandwidth.
Local Network Access: Team members can access these models at LAN speeds.
Storage Optimization: Users can delete local copies without worrying about re-download times.

I’m keen to hear your thoughts on this approach. Have any of you implemented something similar, or do you have suggestions on optimizing this setup using Nexus? Looking forward to a fruitful discussion!

Thank you!

steviecoaster · February 3, 2024, 5:39pm

I don’t see why you couldn’t do this? Seems like putting the files in a Raw (Hosted) repository would do the trick. There’s not any “metadata” around them but if you controlled the directory structure it should be relatively idiomatic.

Model > version_folder > model_filename_with_version

tlgeverssa · June 13, 2024, 4:16pm

Take a look at JFrog Artifactory(requires license) provides versioning for models & acts a package manager for models, it directly interfaces with HuggingFace and acts as a proxy for any requested models. Simply specify the artificactory repo as an endpoint locally.

DVC is free & sits on top of git allowing you to version models, it provides support for several backends, ideally a local shared volume would be most optimal. It does support Cloud Storage but it is slower.

09/14 Edit:
As a team supporting MLOps we’ve settled on KitOps to address model versioning & management. What Docker is to applications, KitOps is to Model management
It allows you to package a model with the code & data used to train it by with a Kitfile(YAML Configuration).
Once packaged you can push & pull from any OCI Compliant registry including ECR, Dockerhub, Nexus, etc…
There is a command line tool:

kit login tlgevers/kitops-eval
kit package . -t tlgevers/kitops-eval:tagname
kit push tlgevers/kitops-eval:tagname

## retrieving model and is cached locally:
kit pull tlgevers/kitops-eval:tagname

## interact with model locally:
kit dev start

It does not paint you into a corner, supports most model formats

Kitfile example:

manifestVersion: v1.0.0

package:
  authors:
  - Jozu
  description: Updated model to analyze flight trait and passenger satisfaction data
  license: Apache-2.0
  name: FlightSatML

code:
- description: Jupyter notebook with model training code in Python
  path: ./notebooks

model:
  description: Flight satisfaction and trait analysis model using Scikit-learn
  framework: Scikit-learn
  license: Apache-2.0
  name: joblib Model
  path: ./models/scikit_class_model_v2.joblib
  version: 1.0.0

datasets:
- description: Flight traits and traveller satisfaction training data (tabular)
  name: training data
  path: ./data/train.csv
- description: validation data (tabular)
  name: validation data
  path: ./data/test.csv