Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark reading json to data set taking long time

Highlighted

Spark reading json to data set taking long time

New Contributor

Hi,

I have a list of json files (single line json) below one HDFS folder, when I try to read json in data set using sparkContext.read.json("/x/y/z/*") and do count operation, it takes around 50 minutes for 3 millions record. Kindly let me know, how can I optimize it.

Regards

Mamta Chawla