Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark submit Tuning

Highlighted

Spark submit Tuning

Contributor

Hi All,

Please help me out in the following Scenario,

Im having data for 1 TB in 10 Partitions as a Sequence file. and reading it as sequence file and converting as Dataframe. then finally Writing the File in to Hive ORC table with insertoverwrite statement.

My Cluster is based on capacity scheduler.

in this case what should be the best configuration for configuring NumberofExecutors, DriverMemory and ExecutorMemory to load data faster in Hive tables.

Tanx and Regards

MJ