Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark job Parallelism Parameter

Highlighted

Spark job Parallelism Parameter

Explorer

 

Our dataset size is 205gb

 

is ot ok to set the parallesim parameter to 205000/128 == 1601

 

going by the instruction that default partition size for HDFS is 128mb

Don't have an account?
Coming from Hortonworks? Activate your account here