What are the main tuning parameters I can set to make my Spark Streaming application faster? I have a 200 data node cluster with 400 GB RAM on each data node. I have set executor-cores to 5 and executor memory to 20GB and concurrent.tasks to 10. My writes to hive are slow and which is slowing down the whole processing.
Can you describe your process of writing to Hive in more detail. Are you leveraging Hive Streaming API? https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest
Tuning number of transactions and batches per write can help you scale writes to Hive.
spark.apache.org has some tuning info for Spark streaming--not for Hive specifically, but maybe the general info will be helpful. The following link is for Spark 2.0.1: