Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark submit Tuning

Spark submit Tuning


Hi All,

Please help me out in the following Scenario,

Im having data for 1 TB in 10 Partitions as a Sequence file. and reading it as sequence file and converting as Dataframe. then finally Writing the File in to Hive ORC table with insertoverwrite statement.

My Cluster is based on capacity scheduler.

in this case what should be the best configuration for configuring NumberofExecutors, DriverMemory and ExecutorMemory to load data faster in Hive tables.

Tanx and Regards


Don't have an account?
Coming from Hortonworks? Activate your account here