Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Running Spark on Beeline

Running Spark on Beeline


Hello, I have the following two questions:

1. I know in spark Client mode, the driver program runs in the local machine from where the shell is opened. Now, when I run from beeline, where does the driver run? I read Spark Thrift Server runs in client mode only. So when I connect from M1 to STS that is running on M2, the driver program runs on M1 or M2?

2. I read we don't have to set the number of executors in spark when dynamic allocation is enabled. Is that correct? How do I test that? Are there any other parameters I have to set? Because, I set dynamic allocation and started spark shell as:

spark-shell --master yarn --deploy-mode client

When I do this, I see in Ambari that the spark shell starts with only 3 containers. Then when I run a query, the number of running containers remains 3 throughout and the query execution is very slow. I was hoping the number will increase during the query run time and the execution time will be fastest.

Instead when I specify number of executors and executor memory at the start time, I get the best query execution times. The query I use for best performance is:

spark-shell --master yarn --num-executors 46 --executor-memory 6g --executor-cores 2 --driver-memory 4g

What am I missing here?

Don't have an account?
Coming from Hortonworks? Activate your account here