As a long-time Nexus user, I’ve recently ventured into AI and encountered a significant challenge that I believe Nexus could help solve. AI model files are notoriously large, ranging from hundreds of megabytes to tens of gigabytes. Each version is a separate file, leading to two major issues:
Storage Capacity: These large files quickly consume local machine storage.
Network Efficiency: If team members require the same model, they must individually download it, which is a slow and bandwidth-intensive process.
I propose utilizing a local network-based Nexus repository specifically for AI models. This approach offers several benefits:
Single Download: Models are downloaded from the internet once, saving time and bandwidth.
Local Network Access: Team members can access these models at LAN speeds.
Storage Optimization: Users can delete local copies without worrying about re-download times.
I’m keen to hear your thoughts on this approach. Have any of you implemented something similar, or do you have suggestions on optimizing this setup using Nexus? Looking forward to a fruitful discussion!
I don’t see why you couldn’t do this? Seems like putting the files in a Raw (Hosted) repository would do the trick. There’s not any “metadata” around them but if you controlled the directory structure it should be relatively idiomatic.
Model > version_folder > model_filename_with_version
Take a look at JFrog Artifactory(requires license) provides versioning for models & acts a package manager for models, it directly interfaces with HuggingFace and acts as a proxy for any requested models. Simply specify the artificactory repo as an endpoint locally.
DVC is free & sits on top of git allowing you to version models, it provides support for several backends, ideally a local shared volume would be most optimal. It does support Cloud Storage but it is slower.
09/14 Edit:
As a team supporting MLOps we’ve settled on KitOps to address model versioning & management. What Docker is to applications, KitOps is to Model management
It allows you to package a model with the code & data used to train it by with a Kitfile(YAML Configuration).
Once packaged you can push & pull from any OCI Compliant registry including ECR, Dockerhub, Nexus, etc…
There is a command line tool:
kit login tlgevers/kitops-eval
kit package . -t tlgevers/kitops-eval:tagname
kit push tlgevers/kitops-eval:tagname
## retrieving model and is cached locally:
kit pull tlgevers/kitops-eval:tagname
## interact with model locally:
kit dev start
It does not paint you into a corner, supports most model formats
Kitfile example:
manifestVersion: v1.0.0
package:
authors:
- Jozu
description: Updated model to analyze flight trait and passenger satisfaction data
license: Apache-2.0
name: FlightSatML
code:
- description: Jupyter notebook with model training code in Python
path: ./notebooks
model:
description: Flight satisfaction and trait analysis model using Scikit-learn
framework: Scikit-learn
license: Apache-2.0
name: joblib Model
path: ./models/scikit_class_model_v2.joblib
version: 1.0.0
datasets:
- description: Flight traits and traveller satisfaction training data (tabular)
name: training data
path: ./data/train.csv
- description: validation data (tabular)
name: validation data
path: ./data/test.csv