Support Questions

Find answers, ask questions, and share your expertise

Decompress .json.gz inside folder in NIFI

avatar
New Contributor

I receive a response from InvokeHTTP as a folder->another_folder->file.json.gz and I want to do some operations on the json file using SplitJson processor, so, I need to decompress the file.json.gz

 

I tried to UpdateAttribute to rename the parent folder to be folder.zip then UnpackContent but it's not supporting .gz format 

 

Then, I tried instead of UnpackContent, the ExecuteStreamCommand with unzip command with a plan to use PutFile then ListFile to get the file.json.gz and pass to SplitJson but I got this error in ExecuteStreamCommand step

 

Failed to write flow file to stdin due to Broken pipe: java.io.IOException: Broken pipe

 

 

What's the solution for this? I tried to set Connection Timeout propriety to 120s but still the same error, What is the best way to get file.json.gz, decompress it and pass it to SplitJson processor? 

5 REPLIES 5

avatar
Super Guru

@nada ,

 

Could you share a sample of the InvokeHTTP response and also the flow that you currently have?

 

Cheers,

André

 

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
New Contributor

the InvokeHTTP response:

from the queue list:

Screenshot from 2022-06-09 12-15-25.png

after I download: 

Screenshot from 2022-06-09 12-17-13.png

Now I have a flow file in the nonzero status queue from ExecuteStreamCommand as follows

Screenshot from 2022-06-09 12-20-51.png

the whole cycle:
Screenshot from 2022-06-09 12-22-55.png

avatar
Master Mentor

@nada 

You can use the CompressContent processor to decompress gzip files.
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.16.2/org.apach...

Set "Mode" to "Decompress", "compression format" to "gzip", and "Update Filename" to "True".

If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post.

Thank you,

Matt

 

avatar
New Contributor

@MattWho 

Hi Matt, 

 

my incoming flowfile is basically a folder that contains another folder that contains a .json.jz file

(folder->another_folder->file.json.gz), CompressContent processor is meant for .gz incoming file..

 

How to recursively decompress so it decompresses the .json.jz that is part of the subfolder? or how to unpack the first folder without touching subfolders/files? 

avatar
Super Guru

@nada ,

 

Please check this solution: https://community.cloudera.com/t5/Community-Articles/Decompressing-nested-ZIP-files-in-NiFi/ta-p/346...

 

Cheers,

André

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.