Created 05-19-2025 04:48 AM
Hello,
I am attempting to unzip a series of simple .zip files in my NiFi flow. All of my flow executes with no issues except when I try to unpack. There is nothing special about these zip files (i.e. no passwords, etc...). But no matter which way I try to unpack the zip files, I get the error that the zip file does not contain any entries. I have verified that the zip files do contain entries. I can easily unzip with WinZip. I can also easily unzip with our old custom C# application that we are replacing with NiFi.
I have tried adjusting the UnpackContent settings and even have added an UpdateAttribute to add the application/zip mime type as well. Still keep getting the failure. I have also tried the CompressContent processor and still get these errors on simple WinZip files.
I am attaching my flow, my processor settings and error message.
Created 05-20-2025 10:16 AM
@BobKing
ListFile only creates 0 byte (no content) FlowFiles that must be sent to a FetchFile processor to retrieve and add the content to the FlowFile. The List<abcC> and Fetch<abc> type processors should be used instead of Get<abc> type processor when working In a multi-node NiFi cluster setup. These processor allow you to run list<abc> on primary node only, load-balance the listed FlowFiles across all nodes in the NIFi cluster, and then fetch<abc> the content. Spread the workload across the cluster for ingest types that don't support cluster setups (ListFile, ListSFTP, ListFTP, etc).
TIP:
Your attached image was so small I could not read the processor types. When you add an image to a new post you can click on it and expand it size to make it larger by dragging form the corner of the image before hitting "Reply".
Thanks,
Matt
Created 05-19-2025 05:22 AM
@BobKing
Welcome to the Cloudera Community.
It is going to be difficult to determine what is going on here without a sample failing zip file to reproduce with.
What can you tell me about these WinZip files?
Thank you,
Matt
Created 05-19-2025 06:23 AM
These are WinZip files we get from a government website that are setup for public use and reference. Some may contain a directory structure and some may not. I have verified that files do exist in the zip files.
I am unable to attach a zip file because the .zip extension is not allowed or supported on the Cloudera community site. So I am unable to upload a zip file.
"The file type (.zip) is not supported. Valid file types are: .docx, .xlsx, .pptx, .pdf, .txt, .csv, .png, jpg, .jpeg, .gif, docx, xlsx, pptx, pdf, txt, csv, png, jpeg, gif."
Accessing the link on the government site will require credentials so posting the link will not be helpful either.
Even if I create a WinZip file with a single file in it, I still get the same error.
I also tried CompressContent using the mime type I did in the UpdateAttribute. While I do not get an error, the FlowFile output is a list of files with the same name as the zip file. Does not matter if I have the Update Filename property set to True or False.
Created 05-19-2025 10:22 AM
@BobKing
I have not been able to reproduce. I downloaded WinZip, created a simple text file and then zipped it. I then consumed that .zip file using CFM 2.1.7.1001 and was able to successfully use UnpackContent to unpack the text file.
I recommend, if you have a support contract with Cloudera, that you create a support case where you can share more detail about the problematic zip files and an example file if possible. I suspect the issue is specific to the zip files you are working with.
UnpackContent does not support mulit-part zip files (NIFI-10654).
Thank you,
Matt
Created 05-20-2025 06:52 AM
Matt,
I figured it out. It was not so much the UnpackContent processor, but how I was bringing in the flowfile itself. I was using ListFile and passing to the UnpackContent processor. I switched to a GetFile processor with a File Filter of .*\.zip and the unzip worked perfectly. So looks like I will need to get the file with GetFile, unzip, and archive the file with PutFile.
Created 05-20-2025 10:16 AM
@BobKing
ListFile only creates 0 byte (no content) FlowFiles that must be sent to a FetchFile processor to retrieve and add the content to the FlowFile. The List<abcC> and Fetch<abc> type processors should be used instead of Get<abc> type processor when working In a multi-node NiFi cluster setup. These processor allow you to run list<abc> on primary node only, load-balance the listed FlowFiles across all nodes in the NIFi cluster, and then fetch<abc> the content. Spread the workload across the cluster for ingest types that don't support cluster setups (ListFile, ListSFTP, ListFTP, etc).
TIP:
Your attached image was so small I could not read the processor types. When you add an image to a new post you can click on it and expand it size to make it larger by dragging form the corner of the image before hitting "Reply".
Thanks,
Matt
Created 05-26-2025 07:28 AM