Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

GetHDFS Path Field

avatar
Expert Contributor

I am getting files from HDFS using the GetHDFS processor and pushing them into solrcloud using the PutSolrContentStream processor. I want to push the Path of the file i am retrieving to new field in SolrCloud. If i check the attributes of the files retrieved by the GetHDFS processor, i can't see an attribute containing the full path of the file. If i use the GetFile processor however, there is an attribute named "absolute.path" which contains the path of the file.

How can i get the path attribute of the files i am retrieving from HDFS using the GetHDFS processor?

1 ACCEPTED SOLUTION

avatar
Super Mentor
@Ahmad Debbas

FlolwFiles generated from the GetHDFS processor should have a "path" attribute set on them:

The path is set to the relative path of the file's directory on HDFS. For example, if the Directory property is set to /tmp, then files picked up from /tmp will have the path attribute set to "./". If the Recurse Subdirectories property is set to true and a file is picked up from /tmp/abc/1/2/3, then the path attribute will be set to "abc/1/2/3".

Since it is only the relative path and not an absolute path, you would need to use an UpdateAttribute processor to prepend the configured directory path the that relative path if you need the absolute path for use later in your flow.

Thanks,

Matt

View solution in original post

2 REPLIES 2

avatar

You can concatenate the Directory, path, and filename attributes using an UpdateAttribute processor.

avatar
Super Mentor
@Ahmad Debbas

FlolwFiles generated from the GetHDFS processor should have a "path" attribute set on them:

The path is set to the relative path of the file's directory on HDFS. For example, if the Directory property is set to /tmp, then files picked up from /tmp will have the path attribute set to "./". If the Recurse Subdirectories property is set to true and a file is picked up from /tmp/abc/1/2/3, then the path attribute will be set to "abc/1/2/3".

Since it is only the relative path and not an absolute path, you would need to use an UpdateAttribute processor to prepend the configured directory path the that relative path if you need the absolute path for use later in your flow.

Thanks,

Matt