Support Questions

Find answers, ask questions, and share your expertise

what are the common faults of developer while using Apache Spark?


1) Management of DAG's- People often do mistakes in DAG controlling. Always try to use reducebykey instead of groupbykey. The ReduceByKey and GroupByKey can perform almost similar functions, but GroupByKey contains large data. Hence, try to use ReduceByKey to the most. Always try to lower the side of maps as much as possible. Try not to waste more time in Partitioning.Try not to shuffle more. Try to keep away from Skews as well as partitions too.

2) Maintain the required size of the shuffle blocks.