Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark2 performance tuning running on YARN


Spark2 performance tuning running on YARN


we are using spark2.3 spark sql, java8 running on Yarn. In spark ui we can observe task straggling, data skew and lot of GC. We are currently using default java serialisation. hopefully there will be some performance improvement when we move to kryo.

in terms of gc plz suggest best practices to tune gc. we are planning for g1gc.

data skew and task straggler we are planning to use salting any salting examples will be helpful and can we use bucketing but we are not using Hive here just file based and spark sql typed dataset will spark bucketing work for file based? any bucketing example would be great plz.

tried parallelising with partitions and spark sql shuffle partitions no luck.

only 160mb of data but its growing as a massive data when loaded and performing aggregations in spark not sure why its exponentially growing? thank u

Don't have an account?
Coming from Hortonworks? Activate your account here