Support Questions
Find answers, ask questions, and share your expertise

Spark Performance Tuning

What are the main tuning parameters I can set to make my Spark Streaming application faster? I have a 200 data node cluster with 400 GB RAM on each data node. I have set executor-cores to 5 and executor memory to 20GB and concurrent.tasks to 10. My writes to hive are slow and which is slowing down the whole processing.



Can you describe your process of writing to Hive in more detail. Are you leveraging Hive Streaming API?

Tuning number of transactions and batches per write can help you scale writes to Hive.

Rising Star

Expert Contributor has some tuning info for Spark streaming--not for Hive specifically, but maybe the general info will be helpful. The following link is for Spark 2.0.1: