Support Questions

Find answers, ask questions, and share your expertise

Cannot get Apache Spark to start

avatar
Contributor

Hello again friends -

I am working the tutorial "A Lap Around Apache Spark" and running into an issue. I am executing the following command:

./bin/spark-shell --master yarn-client --driver-memory 512m--executor-memory 512m

Which seems to start OK - but apparently not, as it seems to get stuck and repeat the following message every second:

3615-virtualbox-hortonworks-sandbox-with-hdp-232-22-04.png

And it will keep repeating the same message over and over again until I hit CTRL+C.

I am wondering about the $JAVA_HOME variable. I have tried remedying the above condition with different variations, none of which seem to have any effect. The current value of $JAVA_HOME is as follows:

3616-gowbd.png

Any thoughts? I hate the thought of giving up on this module until I understand what is creating this error.

Thanking all of you in advance - your response to my inquiries in the past have been spectacular, and I am very grateful.

Thanks,

Mike

1 ACCEPTED SOLUTION

avatar
Master Guru

Check do you have enough Yarn memory and what's your yarn.scheduler.minimum-allocation-mb. Even with driver/executor memory set to 512m, another 384m are needed for the overhead, meaning 896m for the driver and each executor. Also try using only one executor: "--num-executors 1".

View solution in original post

10 REPLIES 10

avatar

This is a normal behaviour. Spark waits until YARN let's the Application Master Container start. Are there any further infos in the log file?

avatar
Contributor

Hello Bernhard - thank you for responding. I must apologize for the delay in my response - I was on one of those vacations where the wife asks you "you are leaving that laptop at home, right?"

There are some HUGE logs files generated from these attempts. Is there anything in particular I should be looking for?

Many thanks,

Mike Vogt

avatar

Can you look into YARN, whether there are pending containers/applications?

avatar
Contributor

Hello again - I am not seeing any pending container or applications. I am looking in the Ambari Dashboard. Is there a better source?

Thanks again,

Mike

avatar

Which Sandbox version are you running? What is the host memory where Sandbox is launched?

avatar
Expert Contributor

To build on what Bernhard said this is normal yarn behavior if it cant assign new containers. You said you are running virtual box, the sandbox defaults to 8gb of RAM, if you are running on a laptop with the same amount that could be an issue. If this is the case drop the virtualbox to only have 4GB of ram and restart the "node"

avatar
Contributor

Hi Chris - this is a pretty substantial machine - it has 16Gb of RAM.

I must apologize for the delay in my response - as I shared with Bernhard, I was on one of those vacations where the wife asks you "you are leaving that laptop at home, right?"

avatar
Master Guru

Check do you have enough Yarn memory and what's your yarn.scheduler.minimum-allocation-mb. Even with driver/executor memory set to 512m, another 384m are needed for the overhead, meaning 896m for the driver and each executor. Also try using only one executor: "--num-executors 1".

avatar
Contributor

Hi Predrag - you a true guru!

Changed the command line to 1024B and set the number or executors - and it worked!

I have been stuck at this point for a week - thank you very much for your assistance.