Created 11-07-2017 10:11 PM
Hi all,
thanks in advance!
My issue is regarding Apache Nifi:
Whats the best way to decompress/extract different types of incoming files?
In my use case I am getting a lot of files which are differently compressed (e.g. tar.gz, .zip, .rar, .tar or non-compressed .txt/.json), but I need all of them decompressed:
What I tried is to have every file running through every possible Compress/UnpackContent processor, but it is actually not working and probably not the best way performance-wise:
GetFile -> (...) -> CompressContent (uncompressing gzip) -> UnpackContent (extracting .tar) -> UnpackContent (extracting .zip) -> (...) -> PutFile
for example: a "*.json"-file should run through those processors and nothing should happen, a "tar.gz"-file should get uncompressed (changes his name to ".tar") and after that getting extracted in an UnpackProcessor, so I get an uncompressed file after all.
I hope there is a good solution, thanks once again.
best regards
Created 11-08-2017 08:15 AM
You can set the "Compression format" to "use mime.type". This way, the processor will look for an attribute called mime.type and dynamically infer the format and hence the decompression algorithm.
For this to work, you need to use an UpdateAttribute to add an attribute mime.type and set it's value following your logic. Keep in mind that UpdateAttribute have rules logic in the advanced configuration that can be useful for your use case : https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-update-attribute-nar/1.4.0/or...
Created 11-08-2017 01:54 PM
@Salda Murrah I forgot to tell about IdentifyMimeType that can be used to automatically identify the type of your file https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.4.0/org.apache...
Created on 11-09-2017 08:32 AM - edited 08-18-2019 12:24 AM
I managed to solve it by using those identify.mime.type-processors.
But there is still a setting which leads to fill my hard drive completely.
I guess UnpackContent is always getting new files to extract them, but when getting files in GetFile-processor I need to keep the source files, do you have any idea what I need to change?
Created on 11-08-2017 01:00 PM - edited 08-18-2019 12:24 AM
Thanks so far.
But do I still need both processors (UnpackContent and CompressContent) for my use case?
I am not sure how it should work out: If I add an attribute mime.type, will the UnpackContent processor get what I want?
I tried to set the mime.type attribute to application/${filename:substringAfterLast('.') and it extracted .zip and .tar succesfully, but I still got those compressed .gz files.
It looks like this:
Created 11-08-2017 01:51 PM
I am not sure I understand your use case. Why do you use UnpackContent after Compress ? CompressContent can decompress your gz file with the decompress option.
Created 11-09-2017 08:34 AM
But I didnt manage to extract .tar nor .zip files with CompressContent.
I have many different files (.tar.gz, .tar, .zip ...) which should all be decompressed/extracted at the end.
I thought first of all I get to know all .gz files to decompress them (see first two processors in my screenshot), after that I want to extract all other files (.tar and .zip) what happens in the following two processors.
For example: Getting an 'test.tar.gz', decompressing it to 'test.tar' and extracting it to 'test' afterwards.