Created 04-20-2017 05:57 PM
Hi all,
I have a kerberized cluster with HDP 2.5. I would like to use Zeppelin 0.6 and Spark2, but I have seen that there are many restrictions and problems, so at least I would like to try Zeppelin 0.6 with Spark 1.6
I followed the instructions and I configured Zeppelin with my AD. Also I would like to use impersonation, I think it is mandatory to execute a job with the user and not with a common zeppelin user (specially in order to read and write to HDFS).
I followed many other threads and is not working and nothing is clear.
Is anyone here with a HDP 2.5 and Zeppelin working with Spark and livy (for impersonation)?
In my case, when I try the following is Zeppelin:
%livy.pyspark sc.version
I obtain:
Interpreter died: Traceback (most recent call last): File "/grid/4/hadoop/yarn/local/usercache/jmolero/appcache/application_1486188076080_0234/container_e14_1486188076080_0234_01_000001/tmp/7818688309791970952", line 469, in <module> sys.exit(main()) File "/grid/4/hadoop/yarn/local/usercache/jmolero/appcache/application_1486188076080_0234/container_e14_1486188076080_0234_01_000001/tmp/7818688309791970952", line 394, in main exec 'from pyspark.shell import sc' in global_dict File "<string>", line 1, in <module> File "/grid/4/hadoop/yarn/local/usercache/jmolero/appcache/application_1486188076080_0234/container_e14_1486188076080_0234_01_000001/pyspark.zip/pyspark/shell.py", line 43, in <module> File "/grid/4/hadoop/yarn/local/usercache/jmolero/appcache/application_1486188076080_0234/container_e14_1486188076080_0234_01_000001/pyspark.zip/pyspark/context.py", line 115, in __init__ File "/grid/4/hadoop/yarn/local/usercache/jmolero/appcache/application_1486188076080_0234/container_e14_1486188076080_0234_01_000001/pyspark.zip/pyspark/context.py", line 172, in _do_init File "/grid/4/hadoop/yarn/local/usercache/jmolero/appcache/application_1486188076080_0234/container_e14_1486188076080_0234_01_000001/pyspark.zip/pyspark/context.py", line 235, in _initialize_context File "/grid/4/hadoop/yarn/local/usercache/jmolero/appcache/application_1486188076080_0234/container_e14_1486188076080_0234_01_000001/py4j-0.9-src.zip/py4j/java_gateway.py", line 1064, in __call__ File "/grid/4/hadoop/yarn/local/usercache/jmolero/appcache/application_1486188076080_0234/container_e14_1486188076080_0234_01_000001/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.io.FileNotFoundException: Added file file:/usr/hdp/current/spark-client/conf/hive-site.xml does not exist. at org.apache.spark.SparkContext.addFile(SparkContext.scala:1388) at org.apache.spark.SparkContext.addFile(SparkContext.scala:1364) at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491) at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.SparkContext.<init>(SparkContext.scala:491) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:214) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745) traceback: {}
I could upgrade from HDP 2.5 to HDP 2.6 but I know that most probably is not going to work and the problem will be worst (and even zeppelin will continue not working)
Thanks in advance
Created 04-20-2017 06:18 PM
Please install spark-client on all the nodemanager, error you see with livy.pyspark is due to the missing spark-clients on nodemanager. Make sure to refresh clients after installation for the configs to copy in hosts.