Reply
Highlighted
New Contributor
Posts: 2
Registered: ‎08-02-2017

Cloudera Navigator pre-registered metadata is not being applied

According to documentation I try to pre-register metadata for certain file to apply tags when it's extracted from HDFS.

 

1. I  extract the identity of HDFS source

[root@cloudera ~]# curl http://cloudera.sorn:7187/api/v11/entities?query='(type:SOURCE)AND(sourceType:HDFS)' -u admin:admin
[ {
"originalName" : "hdfs",
"originalDescription" : null,
"sourceId" : null,
"firstClassParentId" : null,
"parentPath" : null,
"deleteTime" : null,
"extractorRunId" : null,
"customProperties" : null,
"name" : "HDFS",
"description" : null,
"tags" : null,
"properties" : null,
"technicalProperties" : null,
"clusterName" : "cluster",
"sourceUrl" : "hdfs://cloudera.sorn:8020",
"sourceType" : "HDFS",
"sourceExtractIteration" : 3,
"sourceTemplate" : null,
"hmsDbHost" : null,
"hmsDbName" : null,
"hmsDbPort" : null,
"hmsDbUser" : null,
"type" : "SOURCE",
"deleted" : null,
"userEntity" : false,
"metaClassName" : "source",
"packageName" : "nav",
"identity" : "8",
"internalType" : "source"
}

2. I use the identiy shown previously to create preregistration for file /user/mark/newfile with tags nav and properties priority:medium

 

cat hdfspreregistration 
{
          "sourceId":"8",
          "parentPath":"/user/mark",
          "originalName":"newfile",
          "name":"newfile",
          "description":"This is going to be an awesome file.",
          "tags":["fav"],
          "properties":{"priority":"medium"}
}

[mark@cloudera ~]$ curl http://cloudera.sorn:7187/api/v11/entities -u admin:admin -X POST -H "Content-Type: application/json" -d "$(cat hdfspreregistration)"
{
  "originalName" : "newfile",
  "originalDescription" : null,
  "sourceId" : "8",
  "firstClassParentId" : null,
  "parentPath" : "/user/mark",
  "deleteTime" : null,
  "extractorRunId" : null,
  "customProperties" : null,
  "name" : "newfile",
  "description" : "This is going to be an awesome file.",
  "tags" : [ "fav" ],
  "properties" : {
    "priority" : "medium"
  },
  "technicalProperties" : null,
  "sourceType" : null,
  "type" : null,
  "deleted" : null,
  "userEntity" : false,
  "metaClassName" : "UNDEFINED",
  "packageName" : "nav",
  "identity" : "3247",
  "internalType" : "UNDEFINED"
}

3. Verify that entity is preregistered

 

[mark@cloudera ~]$ curl http://cloudera.sorn:7187/api/v11/entities/?query=-internalType:*  -X GET  -u admin:admin
[ {
  "originalName" : "newfile",
  "originalDescription" : null,
  "sourceId" : "8",
  "firstClassParentId" : null,
  "parentPath" : "/user/mark",
  "deleteTime" : null,
  "extractorRunId" : null,
  "customProperties" : null,
  "name" : "newfile",
  "description" : "This is going to be an awesome file.",
  "tags" : [ "fav" ],
  "properties" : {
    "priority" : "medium"
  },
  "technicalProperties" : null,
  "sourceType" : null,
  "type" : null,
  "deleted" : null,
  "userEntity" : false,
  "metaClassName" : "UNDEFINED",
  "packageName" : "nav",
  "identity" : "3247",
  "internalType" : "UNDEFINED"
} ]

4. Copy new file

 

hdfs dfs -copyFromLocal newfile /user/mark/newfile

5. After checkpoint and extract poll time I can see in logs that there was a problem processing this file.

 

2017-08-16 11:05:56,005 INFO com.cloudera.nav.hdfs.client.InotifyClient [CDHExecutor-0-CDHUrlClassLoader@574e34fe]: Processing inotify event, starting with tx id 4640
2017-08-16 11:05:59,911 ERROR com.cloudera.nav.hdfs.client.InotifyClient [CDHExecutor-0-CDHUrlClassLoader@574e34fe]: Error handling event (txid: 5050): Renamed /user/mark/newfile._COPYING_ to /user/mark/newfile at time 1502877858201
2017-08-16 11:05:59,913 ERROR com.cloudera.nav.hdfs.client.InotifyClient [CDHExecutor-0-CDHUrlClassLoader@574e34fe]: Error handling RENAME event
java.lang.ClassCastException: com.cloudera.nav.core.model.GenericEntity cannot be cast to com.cloudera.nav.hdfs.model.FSEntity
        at com.cloudera.nav.hdfs.extractor.HdfsOperationHandler.getEntry(HdfsOperationHandler.java:642)
        at com.cloudera.nav.hdfs.extractor.HdfsOperationHandler.rename(HdfsOperationHandler.java:276)
        at com.cloudera.nav.hdfs.client.InotifyClient.handleRenameEvent(InotifyClient.java:283)
        at com.cloudera.nav.hdfs.client.InotifyClient.handleEvent(InotifyClient.java:129)
        at com.cloudera.nav.hdfs.client.InotifyClient.doImport(InotifyClient.java:75)
        at com.cloudera.nav.hdfs.client.InotifyExtractor.doImport(InotifyExtractor.java:34)
        at com.cloudera.nav.hdfs.extractor.HdfsExtractorShim$1.run(HdfsExtractorShim.java:276)
        at com.cloudera.nav.hdfs.extractor.HdfsExtractorShim$1.run(HdfsExtractorShim.java:273)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at com.cloudera.cmf.cdh5client.security.UserGroupInformationImpl.doAs(UserGroupInformationImpl.java:44)
        at com.cloudera.nav.hdfs.extractor.HdfsExtractorShim.doImport(HdfsExtractorShim.java:273)
        at com.cloudera.nav.hdfs.extractor.HdfsExtractorShim.doExtraction(HdfsExtractorShim.java:235)
        at com.cloudera.nav.hdfs.extractor.HdfsExtractorShim.run(HdfsExtractorShim.java:141)
        at com.cloudera.cmf.cdhclient.CdhExecutor$RunnableWrapper.call(CdhExecutor.java:221)
        at com.cloudera.cmf.cdhclient.CdhExecutor$RunnableWrapper.call(CdhExecutor.java:211)
        at com.cloudera.cmf.cdhclient.CdhExecutor$CallableWrapper.doWork(CdhExecutor.java:236)
        at com.cloudera.cmf.cdhclient.CdhExecutor$1.call(CdhExecutor.java:125)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
2017-08-16 11:06:00,365 INFO com.cloudera.nav.hdfs.client.InotifyClient [CDHExecutor-0-CDHUrlClassLoader@574e34fe]: Processing done, next start id = 5061.

After that the file doesn't show up in navigator search.

 

 

Anybody has idea what am I doing wrong and how to fix it?

 

Regards

 

 

 

 

Announcements
New solutions