Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Nifi bash scripts execution and output

avatar
Expert Contributor

I have a directory with zip archives in the local filesystem of the Nifi server, and i would like to create a flow that unzips these archives with a bash script and then puts them in HDFS.

The problem i have is that i cannot direct the output of the bash script in a correct way to the PutHDFS processor so that it parses the unzipped files.

1) With the use of ExecuteStreamCommand processor i have 2 options for the outgoing flow, the original relationship that contains the initial zipped archive and the outputstream relationship which it should be what i am looking for but it transfers only an empty file with the same name with the original. How should be this processor be configured when it runs a bash script/command to correctly contain the files produced from this script/command?

2) With the use of ExecuteProcess processor, where there is only a success/failure relationship and also this does not help to pass the outgoing flow as input of the PutHDFS processor to move the unzipped files to HDFS.

Any help would be greately appreciated!

39733-nifiq.png

1 ACCEPTED SOLUTION
4 REPLIES 4

avatar

avatar
Expert Contributor

CompressContent works fine for gzip archives. Thanks a lot, still exploring Nifi processors possibilities

avatar
Super Collaborator

@Foivos A The output.stream relation from ExecuteStreamCommand contains the stdout from the command executed.

Unless you do cat <unzipped_file> at the end of your script you won't see anything on that relation. And this would only work if you only have 1 unzipped file of course.

The way I did this was to have the script "echo" at the end the names of the local files, one per line. This output will go to the output.stream relation and from there you can do SplitText to split the output by line followed by a FetchFile -> PutHDFS.

If you're still interested, I can share my flow and the scripts, but as Abdelkrim mentioned, UnpackContent should do the job, even for very large files as UnpackContent followed by PutHDFS will be streamed so will not affect the NiFi heap.


avatar
Expert Contributor

Hi @Alexandru Anghel, ive uploaded a new question with my whole use case and logic here.

Any help really appreciated!