Created on 04-09-2021 09:09 AM - edited 04-09-2021 09:15 AM
Hello,
I am trying Apache Nifi for testing purpose. I create a flow that need to unzip a file that contains an .xml file. I chain those processors: ListFIle => IdentifyMimeType => UnpackContent.
All files I have test are failled. And I have a java error on my logs:
2021-04-09 17:56:40,839 ERROR [Timer-Driven Process Thread-7] o.a.n.processors.standard.UnpackContent UnpackContent[id=b746ede2-0178-1000-dd20-22c2c827cbb7] Unable to unpack StandardFlowFileRecord[uuid=cfe7807c-d6ad-4127-b779-75b2f57c0ba6,claim=,offset=0,name=data.zip,size=0] due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from UnpackContent[id=b746ede2-0178-1000-dd20-22c2c827cbb7]: net.lingala.zip4j.exception.ZipException: Could not fill buffer; routing to failure: org.apache.nifi.processor.exception.ProcessException: IOException thrown from UnpackContent[id=b746ede2-0178-1000-dd20-22c2c827cbb7]: net.lingala.zip4j.exception.ZipException: Could not fill buffer
org.apache.nifi.processor.exception.ProcessException: IOException thrown from UnpackContent[id=b746ede2-0178-1000-dd20-22c2c827cbb7]: net.lingala.zip4j.exception.ZipException: Could not fill buffer
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2388)
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2333)
at org.apache.nifi.processors.standard.UnpackContent$ZipUnpacker.unpack(UnpackContent.java:409)
at org.apache.nifi.processors.standard.UnpackContent.onTrigger(UnpackContent.java:292)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1173)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:214)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: net.lingala.zip4j.exception.ZipException: Could not fill buffer
at net.lingala.zip4j.util.RawIO.readFully(RawIO.java:155)
at net.lingala.zip4j.util.RawIO.readIntLittleEndian(RawIO.java:85)
at net.lingala.zip4j.headers.HeaderReader.readLocalFileHeader(HeaderReader.java:529)
at net.lingala.zip4j.io.inputstream.ZipInputStream.getNextEntry(ZipInputStream.java:85)
at net.lingala.zip4j.io.inputstream.ZipInputStream.getNextEntry(ZipInputStream.java:77)
at org.apache.nifi.processors.standard.UnpackContent$ZipUnpacker$1.process(UnpackContent.java:415)
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2365)
... 14 common frames omitted
Maybe I do something wrong... Can you help me?
Thanks,
Kind regards,
Created 06-08-2021 07:27 AM
@Leopol
Welcome to NiFi!
The ListFile [1] processor is only designed to create a 0 byte NiFi FlowFile (no content is fetched). This created NiFi FlowFile simply has a a bunch of Attributes created on the FlowFile that can be used later to actually retrieve the content via the FetchFile [2] processor.
The combination of these two processors allow NiFi to spread the heavy work across multiple nodes in a cluster when the source of the data may not be cluster friendly (for example a remote disk mounted to all nodes in a NiFi cluster). The ListFile processor would be configured to execute on "Primary Node" only and its success relationship would be routed via a connection to the FetchFile. That connection would be configured to load balance the 0 byte FlowFiles produced by ListFile. Then the FetchSFTP processors executing on all nodes would get the now distributed files and fetch the content. There are other similar list/fetch combinations.
Since you have left the FetchFile processor out of yoru dataflow, you are not passing any content to the UnpackContent processor thus resulting in the exception you are seeing. In that exception you will see details on the FlowFile trying to be unpacked:
StandardFlowFileRecord[uuid=cfe7807c-d6ad-4127-b779-75b2f57c0ba6,claim=,offset=0,name=data.zip,size=0]
You'll notice the "size=0" which menas it is 0 bytes which is expected since you have not fetched the content for this file yet.
If you found this helped with your query, please take a moment to login and click "Accept" on this solution.
Thank you,
Matt
Created 06-08-2021 07:27 AM
@Leopol
Welcome to NiFi!
The ListFile [1] processor is only designed to create a 0 byte NiFi FlowFile (no content is fetched). This created NiFi FlowFile simply has a a bunch of Attributes created on the FlowFile that can be used later to actually retrieve the content via the FetchFile [2] processor.
The combination of these two processors allow NiFi to spread the heavy work across multiple nodes in a cluster when the source of the data may not be cluster friendly (for example a remote disk mounted to all nodes in a NiFi cluster). The ListFile processor would be configured to execute on "Primary Node" only and its success relationship would be routed via a connection to the FetchFile. That connection would be configured to load balance the 0 byte FlowFiles produced by ListFile. Then the FetchSFTP processors executing on all nodes would get the now distributed files and fetch the content. There are other similar list/fetch combinations.
Since you have left the FetchFile processor out of yoru dataflow, you are not passing any content to the UnpackContent processor thus resulting in the exception you are seeing. In that exception you will see details on the FlowFile trying to be unpacked:
StandardFlowFileRecord[uuid=cfe7807c-d6ad-4127-b779-75b2f57c0ba6,claim=,offset=0,name=data.zip,size=0]
You'll notice the "size=0" which menas it is 0 bytes which is expected since you have not fetched the content for this file yet.
If you found this helped with your query, please take a moment to login and click "Accept" on this solution.
Thank you,
Matt
Created 06-13-2021 11:00 PM
@Leopol, did @MattWho's response resolve your issue? If so, can you please mark it as a solution? It will make it easier for others to find the answer in the future.
Regards,
Vidya Sargur,Created 06-14-2021 11:52 PM
Hi Leopol,
Hope I understand your needs.
You may try using compressContent processor instead of UnpackContent, select uncompress mode and choose the compression format.