Community Articles

dkozlowski · ‎03-14-2017

Environment

- HDP 2.5.3

- Kerberos disabled

Problem

I have a problem to use hiveContext with zeppelin. For example this code does not works:

%pyspark
from pyspark.sql import HiveContext 
sqlContext = HiveContext(sc)
sample07 = sqlContext.table("default.sample_07")
sample07.show()

Here is the error displayed:

You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly 
Py4JJavaError: An error occurred while calling None.org.apache.spark.sql.hive.HiveContext. 
: java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: Permission denied 
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) 
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:204) 
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238) 
at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:225) 
at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:215) 
at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:480) 
at org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:479) 
at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala:40) 
at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:330) 
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90) 
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101) 
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) 
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
at java.lang.reflect.Constructor.newInstance(Constructor.java:526) 
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) 
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) 
at py4j.Gateway.invoke(Gateway.java:214) 
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) 
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) 
at py4j.GatewayConnection.run(GatewayConnection.java:209) 
at java.lang.Thread.run(Thread.java:745) 
Caused by: java.lang.RuntimeException: java.io.IOException: Permission denied 
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:515) 
... 21 more 
Caused by: java.io.IOException: Permission denied 
at java.io.UnixFileSystem.createFileExclusively(Native Method) 
at java.io.File.createTempFile(File.java:2001) 
at org.apache.hadoop.hive.ql.session.SessionState.createTempFile(SessionState.java:818) 
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:513) 
... 21 more 

(<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError(u'An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o125), <traceback object at 0x17682d8>)

Solution

Even though you are logged in to Zeppelin UI as a user from AD/LDAP/local, the query gets executed as zeppelin user. Hence, zeppelin user needs to have a written permission to where hive.exec.local.scratchdir parameter indicates. As default, it is set to /tmp/<userName>. So, the following needs to exist in zeppelin node:

[root@dan2 ~]# ls -lrt /tmp
drwxr-xr-x. 20 zeppelin  zeppelin   4096 Mar 10 16:46 zeppelin

Cloudera Community

Community Articles

Running %pyspark in zeppelin returns "Permission denied" in non-kerberized cluster

Apache Spark

Apache Zeppelin