Created on 10-10-2017 10:19 AM - edited 08-17-2019 09:10 PM
I have a directory with zip archives in the local filesystem of the Nifi server, and i would like to create a flow that unzips these archives with a bash script and then puts them in HDFS.
The problem i have is that i cannot direct the output of the bash script in a correct way to the PutHDFS processor so that it parses the unzipped files.
1) With the use of ExecuteStreamCommand processor i have 2 options for the outgoing flow, the original relationship that contains the initial zipped archive and the outputstream relationship which it should be what i am looking for but it transfers only an empty file with the same name with the original. How should be this processor be configured when it runs a bash script/command to correctly contain the files produced from this script/command?
2) With the use of ExecuteProcess processor, where there is only a success/failure relationship and also this does not help to pass the outgoing flow as input of the PutHDFS processor to move the unzipped files to HDFS.
Any help would be greately appreciated!
Created 10-11-2017 06:20 AM
Why not use UnpackContent or CompressContent instead of Execute process ?
Does UnpackContent suits your need ?
Created 10-11-2017 06:20 AM
Why not use UnpackContent or CompressContent instead of Execute process ?
Does UnpackContent suits your need ?
Created 10-12-2017 09:05 AM
CompressContent works fine for gzip archives. Thanks a lot, still exploring Nifi processors possibilities
Created 10-11-2017 06:22 PM
@Foivos A The output.stream relation from ExecuteStreamCommand contains the stdout from the command executed.
Unless you do cat <unzipped_file> at the end of your script you won't see anything on that relation. And this would only work if you only have 1 unzipped file of course.
The way I did this was to have the script "echo" at the end the names of the local files, one per line. This output will go to the output.stream relation and from there you can do SplitText to split the output by line followed by a FetchFile -> PutHDFS.
If you're still interested, I can share my flow and the scripts, but as Abdelkrim mentioned, UnpackContent should do the job, even for very large files as UnpackContent followed by PutHDFS will be streamed so will not affect the NiFi heap.
Created 10-12-2017 02:25 PM
Hi @Alexandru Anghel, ive uploaded a new question with my whole use case and logic here.
Any help really appreciated!