Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark SQL Dataset 2.3 single nested json parent object passed around the whole application? is it best practice to use a single typed dataset and pass around different classes or do we split the nested json element to a child pojo object dataset?

Highlighted

Spark SQL Dataset 2.3 single nested json parent object passed around the whole application? is it best practice to use a single typed dataset and pass around different classes or do we split the nested json element to a child pojo object dataset?

Contributor

Hi all,

Using java 8 & spark 2.3 typed dataset api for our single parent nested json object of around 150 elements and soem nested elements as well in the same json. My question is we created a single java pojo object of the parent json and we perform lot of transformations and actions on this single dataset<jsonobject>. its been passed around atleast 15 java classes filtered and aggregated. Just wondering about the memory issues in spark when we pass the beafy java object.. plz suggest

Don't have an account?
Coming from Hortonworks? Activate your account here