Working with Components in a Bundle

bmandel · September 18, 2018, 6:53pm

Hello,

I’ve got some questions about Assets and Components in the Nexus 3 API. The goal of my Bundle is to go through all the binaries uploaded to a users Repository and scan them. I need to scan things like zip files, jar files, etc. In Nexus 2, these were all stored on the file system and could be easily accessed. In Nexus 3, it seems that everything is now stored in a DB. My questions are:

How can I retrieve these items? In my task, I plan on using the repository and accessing everything through the StorageTx. Is this correct or is there an easier way? From there, I will get all the Assets/Components associated with the repository. Should I get the Asset or Component if I want the binaries?

Once I get the Asset/Component, how do I go about converting these into files so that I can scan them? Do I want to get the Blob from the associated item and then work with the InputStream? Since this method of storage is new in Nexus 3, I’m not exactly sure where the actual Artifacts are being stored.

Once I’ve scanned items, I plan on saving new data to the item. I believe this is done very similar to retrieving items but instead I call the storageTx.saveAsset method.

All of what I’ve done so far can be found here, blackduck-nexus3/QueryManager.java at master · blackducksoftware/blackduck-nexus3 · GitHub. If any of this seems off or incorrect let me know. Alternatively, if you have an easier class I can use to achieve everything I need, I’d love to know about it.

mprescott · September 18, 2018, 7:58pm

In NXRM 3, ‘Component’ is a generic concept - in maven, it maps to a single GAV (group+artifactId+version), in NuGet, to a package ID + version, and so on.

For maven in particular, a Component may have multiple files - a jar, a source jar, some hash files, the pom, and so on. We store information about those files as Asset records in the database, and we store the actual files themselves as Blobs.

So yes, to scan binaries, you will need to get a Blob input stream for the Assets you care about.

Some formats have supplementary files that don’t relate to a single component - maven-metadata.xml, npm package-level metadata, for instance. Those are also stored as Assets, but they’re not attached to Components.

bmandel · September 18, 2018, 8:31pm

Thanks a bunch! That helps ensure I’m following the right path.

In regards to getting and updating the Assets, is the best way to access the StorageTx class like in the example I sent and pull them from the DB that way? Or is there a better, easier class made by you guys I can use?

wwannemacher · September 18, 2018, 9:15pm

I took a look at your code and there are a couple more things to consider. For one, iterating all of the assets in a repository can be problematic. We tend to favor paging for a couple of reasons. First, if you issue a query that returns more than 10,000 results, the database will log a message suggesting that you should narrow your query. Second, it is best to keep your transactions short. Here is a snippet that will page through the assets in a repository, and allow further processing outside of the transaction:

Repository repo = ...
int limit = 256;
Optional<String> lastName = Optional.empty();

final List<Asset> needsProcessed = new ArrayList<>(limit);
int resultSize = 1;
while(resultSize > 0) {
  final boolean hasLastName = lastName.isPresent();
  final Map<String, Object> params =
      hasLastName ? ImmutableMap.of("lastName", lastName.get()) : Collections.emptyMap();
  log.debug("querying assets by page using name {}", lastName.orElse("NONE"));
  Transactional.operation
      .withDb(repo.facet(StorageFacet.class).txSupplier())
      .call(() -> {
        StorageTx tx = UnitOfWork.currentTx();
        Iterables.addAll(needsProcessed,
            tx.findAssets(hasLastName ? "name > :lastName" : null, params,
                ImmutableList.of(repo), String.format("ORDER BY name LIMIT %d", limit)));
        return null;
      });

  for (Asset asset : needsProcessed) {
    // TODO - process your assets here!!
  }

  resultSize = needsProcessed.size();
  log.trace("page of assets contained {} results", resultSize);
  if (resultSize > 0) {
    Asset last = needsProcessed.get(needsProcessed.size() - 1);
    lastName = Optional.of(last.name());
    log.debug("clearing process buffer and setting name to {}", lastName.orElse("NONE"));
    needsProcessed.clear();
  }
}

In the above, I paged based on the “name” field because there is an index on that field in the table. This is example is a bit verbose, but hopefully you can adapt it to your needs.

bmandel · September 19, 2018, 4:26pm

Wow, thanks a lot for this. We definitely didn’t have it fully fleshed out yet but we’ll use this to base our code off of.

I really appreciate it guys.

bmandel · October 1, 2018, 8:30pm

I’ve started working on creating my query and am curious if you guys have any information about DB structure I can take a look at? I’m trying to filter out specific assets and wanted to do it when querying the DB instead of getting everything then filtering.

Some things we want to filter on are artifact extensions, last updated/created, and some of the attributes in the asset panel. I’ve been following the PurgeUnusedSnapshotsFacetImpl.java class as an example in your nexus-public repo but am having trouble creating my own where statement

Do you have any recommendations?

rbender · October 2, 2018, 12:25am

I had similar questions when first getting going. Maybe my findings on this thread might help point you in a direction: Google Groups

bmandel · October 5, 2018, 8:49pm

Thanks for that link. It has a lot of helpful information in there. I’ve built out my query but seem to be struggling with using name field to filter. Keeping getting an error from orientdb when it tries to parse. This is the where statement

SELECT FROM asset WHERE component IS NOT NULL AND attributes.content.last_modified > 1451606400000 AND attributes.My.status IS NULL AND attributes.My.status = FAILURE AND attributes.My.status = SUCCESS AND (name LIKE %.zip OR name LIKE %.nupkg) ORDER BY name LIMIT 100

Have you come across similar issues? I’ve tried changing it to component.name like in your example (although I’m sure it’s not the value I’ll actually need) but it doesn’t seem to get parsed properly. Any thoughts?

mdodgson · October 9, 2018, 2:40am

Have you tried putting quotes around your zip and nupkg references.

SELECT FROM asset WHERE component IS NOT NULL AND attributes.content.last_modified > 1451606400000 AND attributes.My.status IS NULL AND attributes.My.status = FAILURE AND attributes.My.status = SUCCESS AND (name LIKE ‘%.zip’ OR name LIKE ‘%.nupkg’) ORDER BY name LIMIT 100

bmandel · October 10, 2018, 4:55pm

Worked! Thank you! I thought I had tried that before but it must’ve been a different scenario.