I am not clear of the benefit to merge all these XML files into a single huge XML file that seems to be in order for tens of GB. NiFi has a default limit of 1 GB per flowfile, but that can be changed, however, tens of GB is a huge single file. What happens with this file eventually? What efficient method is used to ingest such a file instead of multiple files. Any tool I know ingests better multiple files sized properly such that parallelism can be properly achieved. XML is not the most optimal format for a large file ingest. I'd love to hear more about the reasoning around this one big file to be ingested and why does it have to be still XML and not a more efficient format. NiFi could have converted XML to something else. An alternative of NiFi for this task would be to use Spark with XML processing framework.
... View more