Support Questions

Find answers, ask questions, and share your expertise

Unable to complete Spark Pi example tutorial

Explorer

output.txtHi,

I am new to HDP and have downloaded the HDP sandbox 2.4 and trying out the tutorial

http://hortonworks.com/hadoop-tutorial/a-lap-around-apache-spark/

But I am not able to complete the Spark Pi example as a message keeps looping through the message "INFO Client: Application report for application_1457875462527_0006 (state: ACCEPTED)" and does not show the result as expected.

I have tried following the tutorial with and without the installation of spark_2_3_4_1_10-master and spark_2_3_4_1_10-python as I see that HDP 2.4 already come with spark 2.4.0.0-169.

Attached is the output from the command. Hope someone can help.

Thanks,

1 ACCEPTED SOLUTION

Explorer

Hi Guys,

Thanks a lot for your help. Just to summarize the issue so that someone else knows what to do:

If using HDP sandbox 2.4, for the tutorial "A Lap Around Apache Spark", there is no need to install spark, everything is in.

When I had problem starting spark, it was because there is another spark instance running. Started because I was running the "Spark on Zeppelin" tutorial.

To stop the spark session, use the "yarn application -list" and "yarn application -kill" command to kill of other spark on yarn sessions.

All is well now. Takes again for your help.

View solution in original post

11 REPLIES 11

Mentor

It says you already have a spark session. You don't need to set anything up just start your tutorial at the sparkpi example when your sandbox boots up.

Explorer

Hi Artem,

Thanks for your reply. Yes, I did see that Spark is already available and started in sandbox 2.4. But how do I start the Spark shell using the already started instance? I just followed the instructions in the tutorial to try getting to the Scala prompt but encountered the same issue.

Hope you can point me in the correct direction. Thanks a lot.

@Teng Geok Keh

You are getting the following exception. The 4040 is having a Bind Exception. You already have a spark context running on your local machine. I would say a simple restart of spark should solve it for now. Probably you have the spark-shell running under another user.

16/03/13 14:07:53 INFO SparkEnv: Registering OutputCommitCoordinator
16/03/13 14:07:53 INFO Server: jetty-8.y.z-SNAPSHOT
16/03/13 14:07:54 WARN AbstractLifeCycle: FAILED SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already in use
java.net.BindException: Address already in use
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:463) 

at sun.nio.ch.Net.bind(Net.java:455)

@Teng Geok Keh

You are getting the following exception. The 4040 is having a Bind Exception. You already have a spark context running on your local machine. I would say a simple restart of spark should solve it for now. Probably you have the spark-shell running under another user.

16/03/13 14:07:53 INFO SparkEnv: Registering OutputCommitCoordinator
16/03/13 14:07:53 INFO Server: jetty-8.y.z-SNAPSHOT
16/03/13 14:07:54 WARN AbstractLifeCycle: FAILED SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already in use
java.net.BindException: Address already in use
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:463) 

at sun.nio.ch.Net.bind(Net.java:455)

Explorer

Hi @Shivaji,

Thanks. A new instance of the spark shell actually started on port 4041 since 4040 is already in use. I have also access the URL through port 4041 to see that the job is created.

The problem I have is that I do not see that the job gets completed and it keeps going. Even starting a spark shell (./bin/spark-shell --master yarn-client --driver-memory 512m--executor-memory 512m) goes into such a "loop" and I am not able to see the scala prompt to start trying out the commands.

Or could the problem I am facing be from YARN and not SPARK?

Thanks again for your help.

Rising Star

Hello @Teng Geok Keh,

It's strange that there is already something running on port 4040. The spark component started on sandbox is the spark history server that runs on port 18080:

[root@sandbox spark-client]# for i in $(ps ax | grep spark | grep -v grep | grep -v zeppelin| awk '{ print $1}'); do netstat -lnptu | grep $i; done

tcp 0 0 0.0.0.0:18080 0.0.0.0:* LISTEN 3636/java

Perhaps you started something and left it running as part of another job or tutorial?

On a freshly imported and started sandbox I ran the commands listed in the tutorial and it worked just fine. Notice the first comand is the netstat to see if anything is running on port 4040 that returns no results, followed by the actual spark commands: https://gist.github.com/paul-lupu/9d2536511e996f2e57dd

My suggestions is trying on a fresh import and see if it works.

If not please do a #ps ax | grep spark and post the output here.

Thanks,

Explorer

Hi @glupu ,

Only saw your answer after I posted. As mentioned, you are right, the spark session got started because I was running the spark on zeppelin tutorial.

Attached is the ps ax | grep spark output, before and after I ran the zeppelin notebook.

4 processes got started. Killing them does not seem to help as 2 of them will just get restarted. And after killing them, even Zeppeline stopped working properly.

Would you have any idea how I can gracefully stop the spark session started by zeppelin?

Thanks a lot.

Explorer

So sorry, forgot to attached the ps ax output. Please find attached: ps-axoutput.txt

Explorer

Hi Guys,

Thanks for helping me with this. I have confirmed the issue. There is another spark session running and somehow that caused the problem. The session was started when I ran the spark tutorial on Zeppelin.

Would you guys know how I can stop the spark session without having to restart the sandbox?

Mentor

You can list the app in yarn CLI and then kill that particular job https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YarnCommands.html

Explorer

Hi Guys,

Thanks a lot for your help. Just to summarize the issue so that someone else knows what to do:

If using HDP sandbox 2.4, for the tutorial "A Lap Around Apache Spark", there is no need to install spark, everything is in.

When I had problem starting spark, it was because there is another spark instance running. Started because I was running the "Spark on Zeppelin" tutorial.

To stop the spark session, use the "yarn application -list" and "yarn application -kill" command to kill of other spark on yarn sessions.

All is well now. Takes again for your help.