Support Questions
Find answers, ask questions, and share your expertise

HDP 2.5 Zeppelin and Spark

Contributor

Hi all,

I have a kerberized cluster with HDP 2.5. I would like to use Zeppelin 0.6 and Spark2, but I have seen that there are many restrictions and problems, so at least I would like to try Zeppelin 0.6 with Spark 1.6

I followed the instructions and I configured Zeppelin with my AD. Also I would like to use impersonation, I think it is mandatory to execute a job with the user and not with a common zeppelin user (specially in order to read and write to HDFS).

I followed many other threads and is not working and nothing is clear.

Is anyone here with a HDP 2.5 and Zeppelin working with Spark and livy (for impersonation)?

In my case, when I try the following is Zeppelin:

%livy.pyspark
sc.version

I obtain:

Interpreter died:
Traceback (most recent call last):
  File "/grid/4/hadoop/yarn/local/usercache/jmolero/appcache/application_1486188076080_0234/container_e14_1486188076080_0234_01_000001/tmp/7818688309791970952", line 469, in <module>
    sys.exit(main())
  File "/grid/4/hadoop/yarn/local/usercache/jmolero/appcache/application_1486188076080_0234/container_e14_1486188076080_0234_01_000001/tmp/7818688309791970952", line 394, in main
    exec 'from pyspark.shell import sc' in global_dict
  File "<string>", line 1, in <module>
  File "/grid/4/hadoop/yarn/local/usercache/jmolero/appcache/application_1486188076080_0234/container_e14_1486188076080_0234_01_000001/pyspark.zip/pyspark/shell.py", line 43, in <module>
  File "/grid/4/hadoop/yarn/local/usercache/jmolero/appcache/application_1486188076080_0234/container_e14_1486188076080_0234_01_000001/pyspark.zip/pyspark/context.py", line 115, in __init__
  File "/grid/4/hadoop/yarn/local/usercache/jmolero/appcache/application_1486188076080_0234/container_e14_1486188076080_0234_01_000001/pyspark.zip/pyspark/context.py", line 172, in _do_init
  File "/grid/4/hadoop/yarn/local/usercache/jmolero/appcache/application_1486188076080_0234/container_e14_1486188076080_0234_01_000001/pyspark.zip/pyspark/context.py", line 235, in _initialize_context
  File "/grid/4/hadoop/yarn/local/usercache/jmolero/appcache/application_1486188076080_0234/container_e14_1486188076080_0234_01_000001/py4j-0.9-src.zip/py4j/java_gateway.py", line 1064, in __call__
  File "/grid/4/hadoop/yarn/local/usercache/jmolero/appcache/application_1486188076080_0234/container_e14_1486188076080_0234_01_000001/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.io.FileNotFoundException: Added file file:/usr/hdp/current/spark-client/conf/hive-site.xml does not exist.
	at org.apache.spark.SparkContext.addFile(SparkContext.scala:1388)
	at org.apache.spark.SparkContext.addFile(SparkContext.scala:1364)
	at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
	at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
	at scala.collection.immutable.List.foreach(List.scala:318)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:491)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
	at py4j.Gateway.invoke(Gateway.java:214)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
	at py4j.GatewayConnection.run(GatewayConnection.java:209)
	at java.lang.Thread.run(Thread.java:745)

traceback: 
{}

I could upgrade from HDP 2.5 to HDP 2.6 but I know that most probably is not going to work and the problem will be worst (and even zeppelin will continue not working)

Thanks in advance

1 REPLY 1

Super Collaborator

@Jose Molero

Please install spark-client on all the nodemanager, error you see with livy.pyspark is due to the missing spark-clients on nodemanager. Make sure to refresh clients after installation for the configs to copy in hosts.