Created 10-04-2017 12:36 PM
I'm working on a Nifi process that will retrieve files from HDFS. I'm using the GetHDFS processor to pull all the files from a specific directory. Ideally, I'd like to use an attribute from an XML properties file as the HDFS directory in the event we need to change directories. This would allow us to do that without having to change Nifi.
The problem I am having is how to get the GetHDFS processor to recognize the attributes created by my upstream EvaluateXPath processor since GetHDFS does not accept upstream connections.
I'm still relatively new to Nifi so I'm completely stumped as to how, if at all, I can get this to work. Any ideas?
Created 10-05-2017 12:14 PM
In order to resolve my issue I had to write a bash script to retrieve a listing of the files names from my HDFS folder. That gave me the list of ALL files currently in the directory. I am then able to use the FetHDFS processor to retrieve each of the files by file name.
Created 10-05-2017 06:03 AM
Have you tried ListHDFS/FetchHDFS to implement your logic:
EvaluateXPath parses the XML file and get you the name of the directory. You use this to list all files in that directory and then fetchHDFS to get the data. The state you were refering to is to avoir get the same file several time and get only new file from a directory. I expect this to be the desired behavior.
Created 10-05-2017 12:14 PM
In order to resolve my issue I had to write a bash script to retrieve a listing of the files names from my HDFS folder. That gave me the list of ALL files currently in the directory. I am then able to use the FetHDFS processor to retrieve each of the files by file name.
Created 02-26-2018 11:19 AM
Hi Abdelkrim,
Can you explain how EvaluateXPath will pass directory attribute to ListHDFS? What is best practices when demand is to provide directory for ListHDFS dynamically?
Thanks,
Algis