Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Zeppelin JDBC (hive ) Interpreter fails to executes as user when used as Default Interpreter in the notebook when user impersonation is enabled

I am currently working on a client project where we have installed HDPv2.6 and there is an issue with the Zeppelin user impersonation for the hive interpreter.

If the Interpreter name is mentioned the query executes correctly. However, if it executes as a default interpreter then there seems to be an issue with Impersonation.

Hive application also accepts impersonation, the users registry is managed by Ranger.

I have attached the interpreter settings as a screenshot.

Zeppelin Properties 
export JAVA_HOME=
export JAVA_HOME=/usr/lib/java/jdk1.8.0_121
# export MASTER=                              # Spark master url. eg. spark://master_addr:7077. Leave empty if you want to use local mode.
export MASTER=yarn-client
export SPARK_YARN_JAR=/apps/zeppelin/zeppelin-spark-0.5.5-SNAPSHOT.jar
# export ZEPPELIN_JAVA_OPTS                   # Additional jvm options. for example, export ZEPPELIN_JAVA_OPTS="-Dspark.executor.memory=8g -Dspark.cores.max=16"
export ZEPPELIN_JAVA_OPTS="-Dorg.xerial.snappy.tempdir=/webserver/tmp/zeppelin_tmp -Dhdp.version=2.6.0.3-8 -Dspark.driver.memory=512m -Dspark.executor.memory=1024m -Dspark.executor.instances=2 -Dspark.cores.max=8 -Dspark.dynamicAllocation.enabled=true -Dspark.dynamicAllocation.initialExecutors=1 -Dspark.dynamicAllocation.minExecutors=2  -Dspark.dynamicAllocation.maxExecutors=5"
# export ZEPPELIN_MEM                         # Zeppelin jvm mem options Default -Xms1024m -Xmx1024m -XX:MaxPermSize=512m
export ZEPPELIN_MEM="-Xms512m -Xmx2G -XX:MaxPermSize=512m -XX:MaxMetaspaceSize=512m"
# export ZEPPELIN_INTP_MEM                    # zeppelin interpreter process jvm mem options. Default -Xms1024m -Xmx1024m -XX:MaxPermSize=512m
export ZEPPELIN_INTP_MEM="-Xms512m -Xmx2G -XX:MaxPermSize=512m -XX:MaxMetaspaceSize=512m"
# export ZEPPELIN_INTP_JAVA_OPTS              # zeppelin interpreter process jvm options.
export ZEPPELIN_INTP_JAVA_OPTS="-Dorg.xerial.snappy.tempdir=/webserver/tmp/zeppelin_tmp -Xms512m -Xmx2G -XX:MaxPermSize=512m -XX:MaxMetaspaceSize=512m"
# export ZEPPELIN_SSL_PORT                    # ssl port (used when ssl environment variable is set to true)


# export ZEPPELIN_LOG_DIR                     # Where log files are stored.  PWD by default.
export ZEPPELIN_LOG_DIR=/webserver/logs/var/log/zeppelin
# export ZEPPELIN_PID_DIR                     # The pid files are stored. ${ZEPPELIN_HOME}/run by default.
export ZEPPELIN_PID_DIR=/var/run/zeppelin
# export ZEPPELIN_WAR_TEMPDIR                 # The location of jetty temporary directory.
# export ZEPPELIN_NOTEBOOK_DIR                # Where notebook saved
# export ZEPPELIN_NOTEBOOK_HOMESCREEN         # Id of notebook to be displayed in homescreen. ex) 2A94M5J1Z
# export ZEPPELIN_NOTEBOOK_HOMESCREEN_HIDE    # hide homescreen notebook from list when this value set to "true". default "false"
# export ZEPPELIN_NOTEBOOK_S3_BUCKET          # Bucket where notebook saved
# export ZEPPELIN_NOTEBOOK_S3_ENDPOINT        # Endpoint of the bucket
# export ZEPPELIN_NOTEBOOK_S3_USER            # User in bucket where notebook saved. For example bucket/user/notebook/2A94M5J1Z/note.json
# export ZEPPELIN_IDENT_STRING                # A string representing this instance of zeppelin. $USER by default.
# export ZEPPELIN_NICENESS                    # The scheduling priority for daemons. Defaults to 0.
# export ZEPPELIN_INTERPRETER_LOCALREPO       # Local repository for interpreter's additional dependency loading
# export ZEPPELIN_NOTEBOOK_STORAGE            # Refers to pluggable notebook storage class, can have two classes simultaneously with a sync between them (e.g. local and remote).
# export ZEPPELIN_NOTEBOOK_ONE_WAY_SYNC       # If there are multiple notebook storages, should we treat the first one as the only source of truth?
# export ZEPPELIN_NOTEBOOK_PUBLIC             # Make notebook public by default when created, private otherwise
export ZEPPELIN_INTP_CLASSPATH_OVERRIDES="/etc/zeppelin/conf/external-dependency-conf"


#### Spark interpreter configuration ####


## Use provided spark installation ##
## defining SPARK_HOME makes Zeppelin run spark interpreter process using spark-submit
##
# export SPARK_HOME                           # (required) When it is defined, load it instead of Zeppelin embedded Spark libraries
export SPARK_HOME=/usr/hdp/current/spark-client
# export SPARK_SUBMIT_OPTIONS                 # (optional) extra options to pass to spark submit. eg) "--driver-memory 512M --executor-memory 1G".
# export SPARK_APP_NAME                       # (optional) The name of spark application.


## Use embedded spark binaries ##
## without SPARK_HOME defined, Zeppelin still able to run spark interpreter process using embedded spark binaries.
## however, it is not encouraged when you can define SPARK_HOME
##
# Options read in YARN client mode
# export HADOOP_CONF_DIR                      # yarn-site.xml is located in configuration directory in HADOOP_CONF_DIR.
export HADOOP_CONF_DIR=/etc/hadoop/conf
# Pyspark (supported with Spark 1.2.1 and above)
# To configure pyspark, you need to set spark distribution's path to 'spark.home' property in Interpreter setting screen in Zeppelin GUI
# export PYSPARK_PYTHON                       # path to the python command. must be the same path on the driver(Zeppelin) and all workers.
# export PYTHONPATH
export PYSPARK_PYTHON=/usr/bin/python2.6
#export PYTHONPATH="${SPARK_HOME}/python:${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip"
export PYTHONPATH="${SPARK_HOME}/python:${SPARK_HOME}/python/lib/py4j-0.9-src.zip"
export SPARK_YARN_USER_ENV="PYTHONPATH=${PYTHONPATH}"


## Spark interpreter options ##
##
# export ZEPPELIN_SPARK_USEHIVECONTEXT        # Use HiveContext instead of SQLContext if set true. true by default.
# export ZEPPELIN_SPARK_CONCURRENTSQL         # Execute multiple SQL concurrently if set true. false by default.
# export ZEPPELIN_SPARK_IMPORTIMPLICIT        # Import implicits, UDF collection, and sql if set true. true by default.
# export ZEPPELIN_SPARK_MAXRESULT             # Max number of Spark SQL result to display. 1000 by default.
# export ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE       # Size in characters of the maximum text message to be received by websocket. Defaults to 1024000




#### HBase interpreter configuration ####


## To connect to HBase running on a cluster, either HBASE_HOME or HBASE_CONF_DIR must be set


# export HBASE_HOME=                          # (require) Under which HBase scripts and configuration should be
# export HBASE_CONF_DIR=                      # (optional) Alternatively, configuration directory can be set to point to the directory that has hbase-site.xml


#export ZEPPELIN_IMPERSONATE_CMD ='sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c '


#export SPARK_HOME=/usr/hdp/current/spark-client

43797-zepp-2.png

43796-impersonation.png

10 REPLIES 10

Expert Contributor
@Abhijit Nayak

This will help you with the Zeppelin User Impersonation issues: https://github.com/sudheer0553/zeppelin-user-impersonation

@Venkata Sudheer Kumar M

- unfortunately this is an error even with the admin account . Any other suggestions?

java.net.ConnectException: Connection refused (Connection refused)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:589)
	at org.apache.thrift.transport.TSocket.open(TSocket.java:182)
	at org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51)
	at org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37)
	at org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60)
	at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
	at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
	at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
	at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:90)
	at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:211)
	at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:377)
	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:105)
	at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:387)
	at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
	at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Expert Contributor
@Abhijit Nayak

This is not the complete error, before this error you can see the actual cause for the error from /var/log/zeppelin/ you can see the the below log files:

zeppelin-zeppelin-<server-name>.log

zeppelin-interpreter-<interpreter-name>-<user-name>-spark-zeppelin-<server-name>.log

and have you followed the complete steps given in here

@Venkata Sudheer Kumar M - All the performed all the steps mentioned here. I am aware of where the log files are. The logs dont show any other warning or any useful information. This is the only error message.

Expert Contributor
@Abhijit Nayak

Can you please provide the logs.

@Abhijit Nayak,

I see a semi colon missing in the default.url before socketTimeout. Can you try changing it and run

zookeeperNamespace=hiveserver2;socketTimeout=....

@Aditya Sirna - it works fine without hive impersonation.

@Venkata Sudheer Kumar M - here you go attached the .out and the .log file in debug mode.

zeppelin-zeppelin-1.zip (log file split into two parts)

zeppelin-zeppelin-sss.txt (.out file)

@Venkata Sudheer Kumar M

- this looks like a bug to me.

Expert Contributor
@Abhijit Nayak

Not sure about that, i didn't get a chance to test it, i will test and let you know if i can see the similar issue or any workaround.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.