Read Nexus repository assets in a single API request

Hello,

We have a need to browse Nexus repository assets (files/folders) from a custom script for CI/CD purposes. Basically, currently we are using extDirect API (coreui_Browse.read) to fetch the repository content, however it only supports fetching a single node. It doesn’t support fetching a complete list of repository items in one http request.
So for each folder in our repository we need to send a separate http request to the Nexus server. This is very unoptimized and when we have thousands of folder levels and assets in a repository, we are sending tens of thousands of requests overloading the server in a short timeframe. As a result we often have problems with the server performance and even crashing.

I found out that this feature was supported in older version of Nexus (< 3.17.0-01) with coreui_Component.readAssets method. In later versions of Nexus (from 3.17.0-01) this method was removed and this is no longer possible. It is now recommended to use Rest API pagination call to get the assets, which is still very unoptimized, as one request can return only 10 assets at a time. So still we would need to send thousands of requests in order to fetch the full content tree of a very big repository.

My question:
Is there a more optimized way to get the structure of the Nexus repository (file/folder tree) without sending a request for each node, or for each 10 assets?

Thank you.

Regards,
Ivan

Ivan, the mechanics of what you’re asking for are clear, thanks for writing this out.

Can I ask what problem you’re trying to solve that you need to export the entire component list on a regular basis?

Hello,

Thank you for replying.

Basically, we have a need to list the repository content on a regular basis as part of our CI/CD pipeline jobs. For example, a script that first lists the files in a repository and then downloads each of them. So if this script is run from multiple locations by multiple users on regular basis, we get a large number of requests in a small timeframe sent to Nexus server. There are also other use cases, but all of them are similar in a way that they need to list the content of the repository before doing some actions (download/upload/delete etc.).

Thank you.

Regards,
Ivan

Yes, that makes sense at a mechanical level, but I was hoping to learn something about the problems you’re solving with that automation. Are you downloading everything to mirror it to some other location for performance reasons? Are you comparing the list with a database of deployed applications for a cleanup process that removes old, non-deployed binaries? That sort of thing. What are the scripts doing for you?

Yes, we have a synchronization in place where one team uploads files to the Nexus repository and the other team is regularly scanning the contents of the repository for new changes. And then downloads the newly uploaded files. So this scanning part is listing the repository content regularly. We have also other use cases, but this is the one I could find.

Thank you.

For that use-case I wonder if the webhook functionality might be better. I believe you could configure a webhook that listens to the new component/asset events to get a list of things added to the repository but I haven’t used the webhooks so it’s possible I’m mistaken. See the docs here: Webhooks

Hello,

Thanks for reply.

I checked the Webhooks documentation and didn’t find it really helpful in this case. We also have other important use cases where this approach wouldn’t work. For example, we have a need for a custom GUI desktop client which would fetch and list the repository content in real time, while the user is browsing the repository. So we have also other use cases, and all of them rely on listing the repository assets in a safe and optimized manner. Currently, I cannot find this functionality.

@mprescott @mmartz: Could this information be retrieved by directly querying the database (assuming NXRM3 is using the external PostgreSQL database)?

I understand that relying on application ‘internals’ is discouraged, but constraints breed creativity…

For something that lists all repository content you could certainly look at the appropriate database tables to handle that sort of thing.

Another alternative would be to develop a custom plugin to expose the list of repository content in a way that suites your use-cases. That might allow you to tune things for improved performance; for example, you could do something similar to the webhooks to ship off component/asset/blob events to some other system to protect the postgres or internal db. See Bundle Development for more details about custom plugins.

I found this ticket in our JIRA instance about allowing the page size to be specified. Consider voting for that ticket if you think it would solve your issue. If there is something else we can improve about the REST API that would help feel free to open a new jira ticket against the “Dev - Nexus Repo” project to help us track interest in a feature like this.

Hello,

Ok, thanks for the hint regarding the database. We currently have 3.24.x version installed, and I think I saw the new database model is introduced in 3.31, is this correct? Is it possible to access the database info to get repository assets list in the older version e.g. 3.24.x?

As for the Jira ticket, I already voted for it. It is not solving the issue, but it could help at least to reduce the number of requests sent to the server.

Correct, the new database options were released in 3.31. Older versions of nexus repository manager use OrientDB 2.x for the embedded database. I don’t know exactly what you would do in order to access it, but I know it can be done. If you are a paid customer, I’d highly recommend upgrading to a newer version to get access to the external postgresql database option as that would be the easiest way to access the database.