Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark job Parallelism Parameter


Spark job Parallelism Parameter



Our dataset size is 205gb


is ot ok to set the parallesim parameter to 205000/128 == 1601


going by the instruction that default partition size for HDFS is 128mb

Don't have an account?
Coming from Hortonworks? Activate your account here