Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark 2.1 with Zeppelin 0.7.0 on HDP 2.6

Highlighted

Spark 2.1 with Zeppelin 0.7.0 on HDP 2.6

New Contributor

I am trying to get Spark 2.1 working on Zeppelin 0.7.0 running on HDP 2.6.0.3. The spark interpreter seems to either timeout when trying to start or it is throwing an exception that I am yet to be able to observe. I found guidance related to a pre-release HDP suggesting that I needed to comment out SPARK_HOME from the zeppelin-env.sh file. If I do this YARN fails to with the message that SPARK_HOME is not defined.

So I have also tried configuring zeppelin to use the HDPs Spark2 installation. If I set SPARK_HOME to /usr/hdp/current/spark2-client/ I get the following exception:

INFO [2017-04-14 12:19:41,497] ({pool-2-thread-2} RemoteInterpreterManagedProcess.java[start]:126) - Run interpreter process [/usr/hdp/current/zeppelin-server/bin/interpreter.sh, -d, /usr/hdp/current/zep pelin-server/interpreter/spark, -p, 41150, -l, /usr/hdp/current/zeppelin-server/local-repo/2C3P5E7QX, -g, spark] INFO [2017-04-14 12:19:42,389] ({Exec Default Executor} RemoteInterpreterManagedProcess.java[onProcessComplete]:170) - Interpreter process exited 0 ERROR [2017-04-14 12:20:11,602] ({pool-2-thread-2} Job.java[run]:188) - Job failed org.apache.zeppelin.interpreter.InterpreterException: org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:213) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:377) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:105) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:387) at org.apache.zeppelin.scheduler.Job.run(Job.java:175) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused at org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:53) at org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37) at org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60) at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861) at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435) at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:90) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:211) ... 12 more Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused at org.apache.thrift.transport.TSocket.open(TSocket.java:187) at org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51) ... 19 more Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.thrift.transport.TSocket.open(TSocket.java:182) ... 20 more

What is the correct setup to make Spark 2.1 work with zeppelin 0.7? Also is there anything else I can turn on to provide more useful diagnostics?

4 REPLIES 4

Re: Spark 2.1 with Zeppelin 0.7.0 on HDP 2.6

Guru

@Steve Severance, can you please explain how did you install spark 2.1 and zeppelin? did you use Ambari to install ?

You can also try Hwx Cloud to start Spark 2.1 and zeppelin as below.

https://hortonworks.com/blog/try-apache-spark-2-1-zeppelin-hortonworks-data-cloud/

Re: Spark 2.1 with Zeppelin 0.7.0 on HDP 2.6

Rising Star

Spark 2.1 and Zeppelin 0.7 can be run as standalone as follows:

Spark Installation

Do the followings:

1.Go to http://spark.apache.org/downloads.html and download the latest file.

2.Unzip the file to the appropriate location.

3.Read https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spar... and follow the instruction.

4.After the installation, go to Spark's bin directory in the command window and run spark-shell to see scala prompt. You can close the command window.

*above step 3 summary:

- Download winutils.exe binary from https://github.com/steveloughran/winutils repository. (You should select the version of Hadoop the Spark distribution.)

- Save winutils.exe binary to a directory of your choice, e.g. c:\hadoop\bin.

- Set HADOOP_HOME to reflect the directory with winutils.exe (without bin). e.g. set HADOOP_HOME=c:\hadoop

- Set PATH environment variable to include %HADOOP_HOME%\bin as follows: set PATH=%HADOOP_HOME%\bin;%PATH%

- Create c:\tmp\hive directory.

- Execute winutils.exe chmod -R 777\tmp\hive command and the check with winutils.exe ls \tmp\hive command.

Zeppelin Installation

Do the followings:

1.Go to http://zeppelin.apache.org/download.html and download the latest file.

2.Unzip the file to the appropriate location.

3.Go to https://github.com/elodina/zeppelin-notebooks/blob/master/conf/interpreter.json.

4.Copy the content of interpreter.json and save it into conf/interpreter.json file. If you don't find the file in conf directory, create it.

5.Learn how to start and stop Zeppelin in http://zeppelin.apache.org/docs/0.7.1/install/install.html.

6.Go to http://localhost:8080 and click anonymous user at the top/right and click Interpreter. Look for Spark section and click edit button at the right.

7.Update master value to local[*], save, and restart the Spark interpreter. Restart button is next to edit button.

8.Don't use the tutorial. It does not work; instead, use Spark's latest tutorial: http://spark.apache.org/docs/latest/sql-programming-guide.html

9.When you code with scala, you don't need to specify any interpreter such as %xyz, but use %sql when you use Spark SQL.

Re: Spark 2.1 with Zeppelin 0.7.0 on HDP 2.6

@steve Severance

I have tested out the same at my site with HDP 2.6 and Ambari 2.5.0.3. I did not change any configs - all default. This works fine my end. See the screenshot below:

14697-spark2.png

It would be nice to know:

a) how you did install HDP 2.6 - is it a fresh install or an upgrade

b) what you changed in terms of spark2 service, zeppelin service, spark2 interpreter

Re: Spark 2.1 with Zeppelin 0.7.0 on HDP 2.6

New Contributor

I had done an upgrade from 2.5. I removed zeppelin entirely and reinstalled it on a different host and now it works fine.

Don't have an account?
Coming from Hortonworks? Activate your account here