Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

GetHDFS Path Field

Solved Go to solution
Highlighted

GetHDFS Path Field

Expert Contributor

I am getting files from HDFS using the GetHDFS processor and pushing them into solrcloud using the PutSolrContentStream processor. I want to push the Path of the file i am retrieving to new field in SolrCloud. If i check the attributes of the files retrieved by the GetHDFS processor, i can't see an attribute containing the full path of the file. If i use the GetFile processor however, there is an attribute named "absolute.path" which contains the path of the file.

How can i get the path attribute of the files i am retrieving from HDFS using the GetHDFS processor?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: GetHDFS Path Field

Master Guru
@Ahmad Debbas

FlolwFiles generated from the GetHDFS processor should have a "path" attribute set on them:

The path is set to the relative path of the file's directory on HDFS. For example, if the Directory property is set to /tmp, then files picked up from /tmp will have the path attribute set to "./". If the Recurse Subdirectories property is set to true and a file is picked up from /tmp/abc/1/2/3, then the path attribute will be set to "abc/1/2/3".

Since it is only the relative path and not an absolute path, you would need to use an UpdateAttribute processor to prepend the configured directory path the that relative path if you need the absolute path for use later in your flow.

Thanks,

Matt

View solution in original post

2 REPLIES 2
Highlighted

Re: GetHDFS Path Field

You can concatenate the Directory, path, and filename attributes using an UpdateAttribute processor.

Highlighted

Re: GetHDFS Path Field

Master Guru
@Ahmad Debbas

FlolwFiles generated from the GetHDFS processor should have a "path" attribute set on them:

The path is set to the relative path of the file's directory on HDFS. For example, if the Directory property is set to /tmp, then files picked up from /tmp will have the path attribute set to "./". If the Recurse Subdirectories property is set to true and a file is picked up from /tmp/abc/1/2/3, then the path attribute will be set to "abc/1/2/3".

Since it is only the relative path and not an absolute path, you would need to use an UpdateAttribute processor to prepend the configured directory path the that relative path if you need the absolute path for use later in your flow.

Thanks,

Matt

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here