Support Questions
Find answers, ask questions, and share your expertise

Split 10GB size of XML/CSV file into multiple files. Extracted files should be valid ones

Highlighted

Split 10GB size of XML/CSV file into multiple files. Extracted files should be valid ones

New Contributor

I have scenario, where I'm not able to find the processors for fulfilling the requirement.
Scenario: I have an XML/CSV file of size 10 GB. I have to split the file into multiple files which each are of size maximum 50MB.
My system configurations are 16GB RAM, 160GB HDD and Apache NiFi 1.5.0, Java 8, Linux in a dedicated server.

1 REPLY 1
Highlighted

Re: Split 10GB size of XML/CSV file into multiple files. Extracted files should be valid ones

Super Guru
@Raju Chigicherla

Instead of splitting the file in one SplitText processor try with series of SplitText/SplitContent processors to split the 10GB file.

(or)

Use record oriented processors like SplitRecord and configure the processor to records per split that gives 50MB files, if you are still having issues with SplitRecord processor then use series of SplitRecord processors to get 50MB files.

In addition to split xml files NiFi 1.7 introduced XmlReader/Writer controller services by using them we can split xml data in split record processor.

Refer to this and this links to split big file by using series of Split processors.