Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Running Spark on Beeline

Running Spark on Beeline

Explorer

Hello, I have the following two questions:

1. I know in spark Client mode, the driver program runs in the local machine from where the shell is opened. Now, when I run from beeline, where does the driver run? I read Spark Thrift Server runs in client mode only. So when I connect from M1 to STS that is running on M2, the driver program runs on M1 or M2?

2. I read we don't have to set the number of executors in spark when dynamic allocation is enabled. Is that correct? How do I test that? Are there any other parameters I have to set? Because, I set dynamic allocation and started spark shell as:

spark-shell --master yarn --deploy-mode client

When I do this, I see in Ambari that the spark shell starts with only 3 containers. Then when I run a query, the number of running containers remains 3 throughout and the query execution is very slow. I was hoping the number will increase during the query run time and the execution time will be fastest.

Instead when I specify number of executors and executor memory at the start time, I get the best query execution times. The query I use for best performance is:

spark-shell --master yarn --num-executors 46 --executor-memory 6g --executor-cores 2 --driver-memory 4g

What am I missing here?

Don't have an account?
Coming from Hortonworks? Activate your account here