Created 12-20-2016 04:42 PM
I am getting files from HDFS using the GetHDFS processor and pushing them into solrcloud using the PutSolrContentStream processor. I want to push the Path of the file i am retrieving to new field in SolrCloud. If i check the attributes of the files retrieved by the GetHDFS processor, i can't see an attribute containing the full path of the file. If i use the GetFile processor however, there is an attribute named "absolute.path" which contains the path of the file.
How can i get the path attribute of the files i am retrieving from HDFS using the GetHDFS processor?
Created 12-20-2016 04:47 PM
FlolwFiles generated from the GetHDFS processor should have a "path" attribute set on them:
The path is set to the relative path of the file's directory on HDFS. For example, if the Directory property is set to /tmp, then files picked up from /tmp will have the path attribute set to "./". If the Recurse Subdirectories property is set to true and a file is picked up from /tmp/abc/1/2/3, then the path attribute will be set to "abc/1/2/3".
Since it is only the relative path and not an absolute path, you would need to use an UpdateAttribute processor to prepend the configured directory path the that relative path if you need the absolute path for use later in your flow.
Thanks,
Matt
Created 12-20-2016 04:45 PM
You can concatenate the Directory, path, and filename attributes using an UpdateAttribute processor.
Created 12-20-2016 04:47 PM
FlolwFiles generated from the GetHDFS processor should have a "path" attribute set on them:
The path is set to the relative path of the file's directory on HDFS. For example, if the Directory property is set to /tmp, then files picked up from /tmp will have the path attribute set to "./". If the Recurse Subdirectories property is set to true and a file is picked up from /tmp/abc/1/2/3, then the path attribute will be set to "abc/1/2/3".
Since it is only the relative path and not an absolute path, you would need to use an UpdateAttribute processor to prepend the configured directory path the that relative path if you need the absolute path for use later in your flow.
Thanks,
Matt