Support Questions

Find answers, ask questions, and share your expertise

[Apahe Nifi] Module UnpackContent

avatar
New Contributor

Hello,

 

I am trying Apache Nifi for testing purpose. I create a flow that need to unzip a file that contains an .xml file. I chain those processors: ListFIle => IdentifyMimeType => UnpackContent.

 

All files I have test are failled. And I have a java error on my logs:

 

2021-04-09 17:56:40,839 ERROR [Timer-Driven Process Thread-7] o.a.n.processors.standard.UnpackContent UnpackContent[id=b746ede2-0178-1000-dd20-22c2c827cbb7] Unable to unpack StandardFlowFileRecord[uuid=cfe7807c-d6ad-4127-b779-75b2f57c0ba6,claim=,offset=0,name=data.zip,size=0] due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from UnpackContent[id=b746ede2-0178-1000-dd20-22c2c827cbb7]: net.lingala.zip4j.exception.ZipException: Could not fill buffer; routing to failure: org.apache.nifi.processor.exception.ProcessException: IOException thrown from UnpackContent[id=b746ede2-0178-1000-dd20-22c2c827cbb7]: net.lingala.zip4j.exception.ZipException: Could not fill buffer
org.apache.nifi.processor.exception.ProcessException: IOException thrown from UnpackContent[id=b746ede2-0178-1000-dd20-22c2c827cbb7]: net.lingala.zip4j.exception.ZipException: Could not fill buffer
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2388)
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2333)
at org.apache.nifi.processors.standard.UnpackContent$ZipUnpacker.unpack(UnpackContent.java:409)
at org.apache.nifi.processors.standard.UnpackContent.onTrigger(UnpackContent.java:292)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1173)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:214)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: net.lingala.zip4j.exception.ZipException: Could not fill buffer
at net.lingala.zip4j.util.RawIO.readFully(RawIO.java:155)
at net.lingala.zip4j.util.RawIO.readIntLittleEndian(RawIO.java:85)
at net.lingala.zip4j.headers.HeaderReader.readLocalFileHeader(HeaderReader.java:529)
at net.lingala.zip4j.io.inputstream.ZipInputStream.getNextEntry(ZipInputStream.java:85)
at net.lingala.zip4j.io.inputstream.ZipInputStream.getNextEntry(ZipInputStream.java:77)
at org.apache.nifi.processors.standard.UnpackContent$ZipUnpacker$1.process(UnpackContent.java:415)
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2365)
... 14 common frames omitted

Maybe I do something wrong... Can you help me?

Thanks,

Kind regards,

 

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Leopol 

Welcome to NiFi!

The ListFile [1] processor is only designed to create a 0 byte NiFi FlowFile (no content is fetched). This created NiFi FlowFile simply has a a bunch of Attributes created on the FlowFile that can be used later to actually retrieve the content via the FetchFile [2] processor.

 

The combination of these two processors allow NiFi to spread the heavy work across multiple nodes in a cluster when the source of the data may not be cluster friendly (for example a remote disk mounted to all nodes in a NiFi cluster).  The ListFile processor would be configured to execute on "Primary Node" only and its success relationship would be routed via a connection to the FetchFile.  That connection would be configured to load balance the 0 byte FlowFiles produced by ListFile.  Then the FetchSFTP processors executing on all nodes would get the now distributed files and fetch the content.  There are other similar list/fetch combinations.

Since you have left the FetchFile processor out of yoru dataflow, you are not passing any content to the UnpackContent processor thus resulting in the exception you are seeing.  In that exception you will see details on the FlowFile trying to be unpacked:

StandardFlowFileRecord[uuid=cfe7807c-d6ad-4127-b779-75b2f57c0ba6,claim=,offset=0,name=data.zip,size=0]

You'll notice the "size=0" which menas it is 0 bytes which is expected since you have not fetched the content for this file yet.

 

[1] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.13.2/org.apach...

[2] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.13.2/org.apach...

 

If you found this helped with your query, please take a moment to login and click "Accept" on this solution.
Thank you,

Matt

View solution in original post

3 REPLIES 3

avatar
Master Mentor

@Leopol 

Welcome to NiFi!

The ListFile [1] processor is only designed to create a 0 byte NiFi FlowFile (no content is fetched). This created NiFi FlowFile simply has a a bunch of Attributes created on the FlowFile that can be used later to actually retrieve the content via the FetchFile [2] processor.

 

The combination of these two processors allow NiFi to spread the heavy work across multiple nodes in a cluster when the source of the data may not be cluster friendly (for example a remote disk mounted to all nodes in a NiFi cluster).  The ListFile processor would be configured to execute on "Primary Node" only and its success relationship would be routed via a connection to the FetchFile.  That connection would be configured to load balance the 0 byte FlowFiles produced by ListFile.  Then the FetchSFTP processors executing on all nodes would get the now distributed files and fetch the content.  There are other similar list/fetch combinations.

Since you have left the FetchFile processor out of yoru dataflow, you are not passing any content to the UnpackContent processor thus resulting in the exception you are seeing.  In that exception you will see details on the FlowFile trying to be unpacked:

StandardFlowFileRecord[uuid=cfe7807c-d6ad-4127-b779-75b2f57c0ba6,claim=,offset=0,name=data.zip,size=0]

You'll notice the "size=0" which menas it is 0 bytes which is expected since you have not fetched the content for this file yet.

 

[1] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.13.2/org.apach...

[2] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.13.2/org.apach...

 

If you found this helped with your query, please take a moment to login and click "Accept" on this solution.
Thank you,

Matt

avatar
Community Manager

@Leopol, did @MattWho's response resolve your issue? If so, can you please mark it as a solution? It will make it easier for others to find the answer in the future. 



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
New Contributor

Hi Leopol,

Hope I understand your needs.

You may try using compressContent processor instead of UnpackContent, select uncompress mode and choose the compression format.