Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to merge two pdf files from two flow files into one pdf file in Nifi

avatar
Expert Contributor

HI ,

     Could you please help me How to merge two pdf files  which are coming from two flow files into one pdf file in Nifi.

 

I'm trying to merge two files (invoice file and QR File) from two flowfiles and used mergecontent   processor but only single file(first file from 2 input files but not merged file) coming as merged file, with the combined file size.

 

Files flow:

PradNiFi1236_1-1671116150957.png

for the first flow file updateattribute maintained fragment.index=0 and second flowfile udpateattribute maintained fragment.index=1 and for combined flowfile updateattribute Before merge processor  given fragment.count=2

 

Mergecontent Config:

PradNiFi1236_0-1671115906902.png

 

 

Can anyone help on this?

 

Regards,

Pradeep

3 REPLIES 3

avatar
Expert Contributor

Hello,

 

the problem you have with the merge is that your property "Merge Strategy" needs to be set to "Defragment" after that it should work if both flowfiles also have the same fragment.identifier set.

 

To merge those pdfs Im not 100% sure how it can be easily done, but one solution would be to set also the property "Merge Format" to tar/zip of your MergeContent processor and do it later via script unpack and merge together.

 

Greetings

 

avatar
Master Mentor

@PradNiFi1236 

NiFi is designed to be data agnostic.  So content that NiFi ingested is preserved in binary format wrapped in a NIFi FlowFile.  It then becomes the responsibility of an individual processor that needs to operate against the content to understand the content type.  

The mergeContent processor does not care about the content type.  This processor numerous merge strategies:
- Binary concatenation simply writes the binary data from one FlowFile to the end of the binary data from another.  There is no specific handling based on the content type.  So combining two PDF files in this manor is going to leave you with unreadable data which explains why even with the larger content type of the merged FlowFile, you still only see the first PDF in the merged binary.
- Tar and zip combines multiple pieces of content in to a single tar file.  You can then later untar or unzip this to extract the multiple separate pieces of content it contains.  So would preserve both independent PDF files.
- FlowFile stream is  unique to NiFi and merges multiple NiFi FLowFiles (A FlowFile consist of content and FlowFile metadata.  This strategy is only used to preserve that NiFi metadata with the content for future access by another NiFi.
- Avro expects the content being merged is already of Avro format.  This will properly merge Avro type data in to single new Avro content.

So the question here is first, how would you accomplish the merging of two PDF outside of NiFi.  Then investigate how to accomplish the same within NiFi, if possible. 
TAR and ZIP will work to get you one output file; however, if your use case is to produce 1 new PDF form 2 original PDFs, mergeContent is not going to do that.

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt

avatar
Super Collaborator

Apache PDFBox is supposed to allow you to merge PDF content. Since this is a Java library you can create a scripted Groovy processor to merge the files for you. 

https://pdfbox.apache.org

https://javadoc.io/doc/org.apache.pdfbox/pdfbox/2.0.27/index.html