Cannot index record found duplicated key

Hi,

I’m getting a series of errors when repositories are indexing
Cannot index record #23:536515: found duplicated key, and many other records as well.

I’ve tried to repair/rebuild the index using the orient db console tool, but get the same error there as well.

Not sure if I can just delete the duplicate or not

Thanks in advance!

Hi,

try dropping the index or delete the index and create again
see if this work

Thanks
DJM

Deleting the index didn’t help. When I created it again, I got exactly the same error. I had to truncate browse_node to be able to create the index again.

Upon restarting nexus though, when it starts to run all its internal indexing I’m seeing a lot of

2021-08-03 15:21:00,622+0100 WARN [quartz-3-thread-11] *SYSTEM org.sonatype.nexus.repository.browse.internal.BrowseNodeManager - Problem generating browse nodes for Asset{metadata=AttachedEntityMetadata{schema=asset, document=#20:857684{bucket:#16:1,format:maven2,last_updated:Wed Jul 31 10:19:20 BST 2019,attributes:[5],component:null,name:com/fasterxml/jackson/dataformat/jackson-dataformat-xml/maven-metadata.xml.md5,size:32,content_type:text/plain,created_by:anonymous,created_by_ip:10.4.6.41,blob_ref:default@FA7D0C3D-91F6966F-CF38671D-50407B63-61CD00C0:6f2a4ecc-2c5e-43e9-a315-4fdb6ec06be0,last_downloaded:null,blob_created:Tue Jul 30 10:35:57 BST 2019,blob_updated:Tue Jul 30 10:35:57 BST 2019} v2}, name=com/fasterxml/jackson/dataformat/jackson-dataformat-xml/maven-metadata.xml.md5}
com.orientechnologies.orient.core.storage.ORecordDuplicatedException: Cannot index record #24:867217: found duplicated key ‘#20:857684’ in index ‘browse_node_asset_id_idx’ previously assigned to the record #23:56359
DB name=“component”

(240 and counting)

I assume that nexus is actually filling browse_node with duplicate date for some reason.

I would hazard the table(s) browse nodes is built from are corrupted and have duplicate identifiers.

Do you know how can identify and fix this?

I believe this suggests that there may be two entries in the Asset table with the #20:857684 ID

Do you have any idea how that might have happened? Is it safe to delete duplicates from that table?

Its impossible to really know why it may have happened, common problems tend to be killing Nexus rather than stopping the service, “backing up” the database by copying files while Nexus is running, etc. This is partly why we’ve been working on enabling some alternative database options.

In terms of safety, I’d suggest looking at the records to see if they appear to be for the same asset (i.e. file, it would have the same bucket and path). If they’re for the same thing then removing one should be fine (you may want to look at which one seems the newest, and remove the oldest), if they’re for different files then you may want to try to remove one and re-insert it so you don’t lose an asset.

I’ve run the query
SELECT FROM (select name,count(*) as conteggio from asset group by bucket, name) where conteggio > 1
Which didn’t produce any results.

The following:
SELECT FROM (select name,count(*) as conteggio from asset group by name) where conteggio > 1
Did produce many results - all in different buckets

Is the second one correct? - you did mention buckets in your answer.

Bucket is roughly the analog for a repository in that table. Its only relevant for trying to understand what the duplicated IDs are.

If you can’t find duplicates the only thing I can really suggest is rebuilding the browse node table if you haven’t tried that already.

we’re having similar issues but when trying to connect to the orient db as root (nexus runs as container in vm) it messes with permissions to files, how should we connect to it and how can we delete the index and recreate it? and is it the best option?

How would I go about rebuilding the browse node table?

Hi Martin,
Go to Administration → Tasks → Create task, select Repair - Rebuild repository browse and either schedule it or just run manually.