Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Is it possible to use upstream attributes in GetHDFS?

avatar
New Contributor

I'm working on a Nifi process that will retrieve files from HDFS. I'm using the GetHDFS processor to pull all the files from a specific directory. Ideally, I'd like to use an attribute from an XML properties file as the HDFS directory in the event we need to change directories. This would allow us to do that without having to change Nifi.

The problem I am having is how to get the GetHDFS processor to recognize the attributes created by my upstream EvaluateXPath processor since GetHDFS does not accept upstream connections.

I'm still relatively new to Nifi so I'm completely stumped as to how, if at all, I can get this to work. Any ideas?

1 ACCEPTED SOLUTION

avatar
New Contributor

In order to resolve my issue I had to write a bash script to retrieve a listing of the files names from my HDFS folder. That gave me the list of ALL files currently in the directory. I am then able to use the FetHDFS processor to retrieve each of the files by file name.

View solution in original post

3 REPLIES 3

avatar

@Mike Bailey

Have you tried ListHDFS/FetchHDFS to implement your logic:

EvaluateXPath parses the XML file and get you the name of the directory. You use this to list all files in that directory and then fetchHDFS to get the data. The state you were refering to is to avoir get the same file several time and get only new file from a directory. I expect this to be the desired behavior.

avatar
New Contributor

In order to resolve my issue I had to write a bash script to retrieve a listing of the files names from my HDFS folder. That gave me the list of ALL files currently in the directory. I am then able to use the FetHDFS processor to retrieve each of the files by file name.

avatar
New Contributor

Hi Abdelkrim,

Can you explain how EvaluateXPath will pass directory attribute to ListHDFS? What is best practices when demand is to provide directory for ListHDFS dynamically?

Thanks,

Algis