Created 05-15-2023 12:49 AM
I am trying to use Nifi to automatically create external Impala tables if there are new subfolders under a certain directory. For this I have the directory structure
data/<timestamp>/<tables>/*.parquet
I am therefore currently trying to first get a list of folders under data. Then I want to query all subfolders in these folders again, so that I can use them as table names in a SQL query that I send to Impala.
However, the ListFiles processor only returns files, I am not interested in the file names, I only need the folders.
Is there a way to do this with NiFi or is my plan complete nonsense?
Created 05-15-2023 05:33 AM
@DrManu,
I do not think that you will find a processor in NiFi which will extract only the folder name out of your location 😞 You will either have to write your own processor or use a combination of several others, already part of NiFi.
How I would honestly try the mentioned scenario:
- An ExecuteStreamCommand Processor in which you have defined a custom made script which will read your folder structure and generate a JSON File, where each row is basically a complete path to a specific Folder.
- Afterwards, you could use an SplitJson to generate a single FlowFile for each Folder and send it down your stream for further processing.
Created 05-15-2023 05:33 AM
@DrManu,
I do not think that you will find a processor in NiFi which will extract only the folder name out of your location 😞 You will either have to write your own processor or use a combination of several others, already part of NiFi.
How I would honestly try the mentioned scenario:
- An ExecuteStreamCommand Processor in which you have defined a custom made script which will read your folder structure and generate a JSON File, where each row is basically a complete path to a specific Folder.
- Afterwards, you could use an SplitJson to generate a single FlowFile for each Folder and send it down your stream for further processing.
Created 05-15-2023 10:22 PM
Thank you very much!
This gives me confidence that my attempts are not going in the wrong direction.I already use ExecuteStreamCommand Processors excessively and it is a pleasure to use them here as well.