Created on 03-14-2017 01:54 PM
Environment
- HDP 2.5.3
- Kerberos disabled
Problem
I have a problem to use hiveContext with zeppelin. For example this code does not works:
%pyspark from pyspark.sql import HiveContext sqlContext = HiveContext(sc) sample07 = sqlContext.table("default.sample_07") sample07.show()Here is the error displayed:
You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly Py4JJavaError: An error occurred while calling None.org.apache.spark.sql.hive.HiveContext. : java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: Permission denied at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:204) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238) at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:225) at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:215) at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:480) at org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:479) at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala:40) at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:330) at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90) at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:214) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: java.io.IOException: Permission denied at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:515) ... 21 more Caused by: java.io.IOException: Permission denied at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createTempFile(File.java:2001) at org.apache.hadoop.hive.ql.session.SessionState.createTempFile(SessionState.java:818) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:513) ... 21 more (<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError(u'An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o125), <traceback object at 0x17682d8>)
Solution
Even though you are logged in to Zeppelin UI as a user from AD/LDAP/local, the query gets executed as zeppelin user. Hence, zeppelin user needs to have a written permission to where hive.exec.local.scratchdir parameter indicates. As default, it is set to /tmp/<userName>. So, the following needs to exist in zeppelin node:
[root@dan2 ~]# ls -lrt /tmp drwxr-xr-x. 20 zeppelin zeppelin 4096 Mar 10 16:46 zeppelin
User | Count |
---|---|
758 | |
379 | |
316 | |
309 | |
268 |