Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Split 10GB size of XML/CSV file into multiple files. Extracted files should be valid ones

Highlighted

Split 10GB size of XML/CSV file into multiple files. Extracted files should be valid ones

New Contributor

I have scenario, where I'm not able to find the processors for fulfilling the requirement.
Scenario: I have an XML/CSV file of size 10 GB. I have to split the file into multiple files which each are of size maximum 50MB.
My system configurations are 16GB RAM, 160GB HDD and Apache NiFi 1.5.0, Java 8, Linux in a dedicated server.

1 REPLY 1

Re: Split 10GB size of XML/CSV file into multiple files. Extracted files should be valid ones

Super Guru
@Raju Chigicherla

Instead of splitting the file in one SplitText processor try with series of SplitText/SplitContent processors to split the 10GB file.

(or)

Use record oriented processors like SplitRecord and configure the processor to records per split that gives 50MB files, if you are still having issues with SplitRecord processor then use series of SplitRecord processors to get 50MB files.

In addition to split xml files NiFi 1.7 introduced XmlReader/Writer controller services by using them we can split xml data in split record processor.

Refer to this and this links to split big file by using series of Split processors.

Don't have an account?
Coming from Hortonworks? Activate your account here