11-21-2016 05:22 AM
The entity indentity field returned via Navigator API calls appears to be a MD5 hash of key entity fields. E.g. from looking at the navigator SDK source an HDFS entity's identity should be the hash of the sourceId and the fileSystemPath (separated by ##):
But if I try to reproduce a sample entity's identity value using that approach I can't get the same identity value that actually comes back in API for that entity.
Any ideas why the identities might differ? Is a salt value used in the production version when hashing, or does the separator differ in some cases?
I'm testing on CDH 5.8.0.
11-21-2016 02:39 PM
The issue is that the guava MD5 hasher hashes a string as a series of utf-16 chars if not passed an encoding:
I was testing using a different MD5 implementation which treated the string as utf-8 encoded. Once I changed my code to treat the same way guava does, the IDs matched.
12-02-2016 03:52 PM
Although this might work right now, we're using a different algorithm to calculate the entity ID starting in C5.10, which ships next month. We'll automatically translate from the old entity ID to the new one transparently, but you're best off using our API to determine the entity ID -- specifically:
I hope this helps. Regards, Mark.
12-02-2016 04:48 PM
Thanks for the info.
I don't have a real need to generate ids this way at the moment -- mostly I was just curious how it was done. I'd seen the parent link in navigator from an HDFS entity to its parent and figured you must be generating that functionally from other available info since the parent ID isn't part of the entity payload (later I realized the page could probably just be using a relation query to get it). Also, I was wondering whether I could create custom entities without using the java plugin.
I've been using the api as you suggest and for the most part I've found what I've needed between the docs, the sdk code, and a fair amount of experimentation.