Groovy script performance for StorageTx `findAssets` in Maven repo

rbender · September 26, 2022, 3:06am

We are using a custom Groovy script to retrieve and filter assets from a Maven repository. We are trying to optimize the performance for the script and noticed the storage query’s execution is the culprit, regularly taking 1.5-2+ seconds to complete.

def repo = repository.repositoryManager.get("my-repo-name")
StorageFacet storageFacet = repo.facet(StorageFacet)
StorageTx tx = storageFacet.txSupplier().get()

try {
    tx.begin()

    Query.Builder query = Query.builder()
        .where("component.group").eq(request.groupId)
        .and("component.name").eq(request.artifactId)
        .and("blob_created > ").param(request.startDate)

    log.info("before query")
    Iterable<Asset> results = tx.findAssets(query.build(), [repo])
    log.info("after query")
} finally {
    tx.close()
}

We are wondering if performance is related to our usage of component.group and component.name; are these non-indexed fields?

Our custom script is primarily so we can retrieve only “recent” artifacts through a client-provided startDate value (we didn’t see a similar way to accomplish with the Search API; no way to sort by date). If there is a different way to accomplish, we are interested to know!

We are using Nexus OSS with fairly standard install config. We have artifact purging for the repository for ~6 months.

Thanks!

mmartz · September 26, 2022, 1:29pm

It’s likely the blob_created field that is causing slowness. I don’t see an index against that field. You’ll probably get better performance out of a search query against elasticsearch. It’s not documented in the Search API, but it looks like you might be able to pass in lastBlobUpdated to the search API. Since it’s not documented I can’t guarantee it won’t break in a future release, but searching via <host>/service/rest/v1/search?sort=lastBlobUpdated&direction=desc or <host>/service/rest/v1/search/assets?sort=lastBlobUpdated&direction=desc seems to put the most recent components first.

rbender · October 1, 2022, 3:29pm

Thanks for the response, @mmartz!

Is there a way to limit the results, either by number or assets/components created/modified after a certain date?

I would try using only the first X results returned, but from what I can tell a search request seems to execute the whole search on the first API call, even if results are split across multiple pages. Subsequent requests to retrieve pages seem immediate where the first request seems to take longer.

Thanks!

mmartz · October 3, 2022, 2:55am

I don’t believe there is any way to pass in a limit on the query.