Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

what are the common faults of developer while using Apache Spark?

Highlighted

what are the common faults of developer while using Apache Spark?

Explorer
 
1 REPLY 1

Re: what are the common faults of developer while using Apache Spark?

1) Management of DAG's- People often do mistakes in DAG controlling. Always try to use reducebykey instead of groupbykey. The ReduceByKey and GroupByKey can perform almost similar functions, but GroupByKey contains large data. Hence, try to use ReduceByKey to the most. Always try to lower the side of maps as much as possible. Try not to waste more time in Partitioning.Try not to shuffle more. Try to keep away from Skews as well as partitions too.

2) Maintain the required size of the shuffle blocks.

Don't have an account?
Coming from Hortonworks? Activate your account here