Created 10-04-2017 04:34 PM
My workflow is like below.
ListenHTTP(i get a directory name here) --> SplitText --> ExtractText(directory name added as attribute)
Now after this i will have to use that attribute directoryname and extract all the files in that local dir and put that into HDFS. I understand GetFile/ListFile could do this, but how do we provide a dynamic directory name to that processor?
Created on 10-04-2017 05:21 PM - edited 08-17-2019 09:36 PM
Using the ListFile processor as an example, this is how you would use the attribute directoryname to specify the directory to list files
Then you would pass the list of files to a FetchFile processor.
Created 10-04-2017 05:29 PM
But how do I provide the parameter directoryname. As far as I understand GetFile doesn't accept an incoming flow.
Created on 10-04-2017 05:36 PM - edited 08-17-2019 09:36 PM
You won't be using GetFile, FetchFile will replace it. Use ListFile to create the list of files and then pass the list of files to a FetchFile processor, it will pick up the files. If you look at the FetchFile processor, it will use the attributes from the list of files to pull the files.
If you are running a cluster, make sure to configure the ListFile processor to run on the primary node only.
Created 10-05-2017 04:37 PM
Did this resolve your issue?
Created 10-06-2017 02:02 AM
I did. Used a executescript processor in which I will pass the directoryname as a dynamic parameter which I get from the listenHTTP and after the executescript I used fetchfile to read a single file and then used puthdfs to load into HDFS.