Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Converting Large CSV into JSON

Solved Go to solution

Converting Large CSV into JSON

New Contributor

I have a relatively large CSV (~80GB) I need to transform into multiple JSON documents/records. I'm using a ConvertRecord processor with a CSVReader and AvroRecordSetWriter and that's where my CSV gets stuck. What's the best approach? Break up the CSV prior to converting it or try to get more horsepower on the server?

  • Server Mem: 16GB
  • Cores: 4
  • Maximum Timer Driven Thread Count : 16
  • Java Min/Max Heap: 2GB/10GB
1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Converting Large CSV into JSON

Super Guru

@Bill Miller

Try with series of SplitRecord processors to create smaller chunks of files.

Follow the similar approach mentioned in this thread and see if you get any performance with this approach.


View solution in original post

1 REPLY 1
Highlighted

Re: Converting Large CSV into JSON

Super Guru

@Bill Miller

Try with series of SplitRecord processors to create smaller chunks of files.

Follow the similar approach mentioned in this thread and see if you get any performance with this approach.


View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here