Reply
Explorer
Posts: 69
Registered: ‎01-24-2017

Using Hive from Spark 2.0.0 in JupyterHub, CDH 5.10.0.1

Hi All,

 

I am having trouble using Hive from Spark 2.0 on JupyterHub.

 

It works in pyspark2 or spark2-submit but not in JupyterHub.

 

It works in JupyterHub for an older version 1.6 of Spark (and I do not remember that I had to do anything to make it work).

 

Apparently some environmental variable is missing from JupyterHub Spark 2.0 kernel:

=========

$ cat /usr/local/share/jupyter/kernels/pyspark2/kernel.json
{
"display_name": "pySpark (Spark 2.0.0)",
"language": "python",
"argv": [
 "/usr/bin/python",
 "-m",
 "ipykernel",
 "-f",
 "{connection_file}"
],
"env": {
 "PYSPARK_PYTHON": "/usr/bin/python",
 "SPARK_HOME": "/opt/cloudera/parcels/SPARK2/lib/spark2",
 "HADOOP_CONF_DIR": "/etc/hadoop/conf",
 "PYTHONPATH": "/opt/cloudera/parcels/SPARK2/lib/spark2/python/lib/py4j-0.10.3-src.zip:/opt/cloudera/parcels/SPARK2/lib/spark2/python/",
 "PYTHONSTARTUP": "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/shell.py",
 "PYSPARK_SUBMIT_ARGS": " --master yarn --deploy-mode client pyspark-shell"
}
}

=========

Is there a way to set path to metastore manually from inside a shell? What parameter controls it? I think, it is trying to look into $HOME.

 

The command that I am trying to execute is:

=========

sqlCtx.sql("show tables").show()

=========

It returns the expected list of tables when used inside pyspark2 shell but returns empty list in JupyterHub.

 

Thank you,

Igor

 

 

Explorer
Posts: 69
Registered: ‎01-24-2017

Re: Using Hive from Spark 2.0.0 in JupyterHub, CDH 5.10.0.1

I solved this problem. I looked at the environment set in 

/etc/spark2/conf/yarn-conf/hive-env.sh

and set the corresponding variables in JupyterHub kernel. In particular:

 

 "HADOOP_CONF_DIR":"/etc/spark2/conf/yarn-conf",
 "HIVE_AUX_JARS_PATH":"/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar",
 "HADOOP_CLIENT_OPTS":"-Xmx2147483648 -XX:MaxPermSize=512M -Djava.net.preferIPv4Stack=true",

I think, HADOOP_CONF_DIR is the most important one because previously I had it set to a different directory that does not have hive-site.xml.

Announcements