Support Questions
Find answers, ask questions, and share your expertise

How to connect CML to Hive using Python

New Contributor

Hello everyone,


we setup a Cloudera Environment which inherits a DataHub of type "7.1.0 - Data Engineering: Apache Spark, Apache Hive, Apache Oozie". We managed to connect to HIVE via a JDBC connection from our local machines.

But so far we were not able to connect from CML to HIVE via JDBC

I use the JayDeBeApi as follows:




conn_hive = jaydebeapi.connect('org.apache.hive.jdbc.HiveDriver', 'jdbc:hive2://dataengineering-master0.......:443/;ssl=1;transportMode=http;httpPath=dataengineering/cdp-proxy-api/hive;AuthMech=3;', \
	{'UID': "user_name", 'PWD': "password"}, '/home/cdsw/drivers/HiveJDBC41.jar',)




The error message is




TypeError: Class org.apache.hive.jdbc.HiveDriver is not found




I set the environment variable CLASSPATH to


which is were the jar actually rests. Hence I wanted to check if JAVA_HOME is set correctly and yes, there the env. variable is set to


Howerver when i run the command !java --version I get an error

Unrecognized option: --version
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

Is this normal and can i expect JAVA to still work as expected, or could this be the source of my problem?



Since connecting via JDBC did not work. I also tried connecting via a SparkSession as I saw in yesterdays "CDP Priavte Cloud Partner Edition". The presented code looks as follows




from pyspark.sql import SparkSession

# Instantiate Spark-on-K8s Cluster
spark = SparkSession.builder.appName("Simple Spark Test") \
	.config("spark.executor.memory", "8g") \
    .config("spark.executor.cores", "2") \
    .config("spark.driver.memory", "2g") \
    .config("spark.executor.instances", "2") \

# Validate Spark Connectivity
spark.sql("SHOW databases").show()
spark.sql("use default")
spark.sql("show tables").show()
spark.sql('create table testcml (abc integer)').show()
spark.sql("insert into table testcml select t.* from (select 1) t").show()
spark.sql("select * from testcml").show()
spark.sql("drop table testcml").show()

# Stop Spark Session




Listing the databases and the tables of a DB, as well as creating the "testcml" tables works fine. But the insert into testcml failes due to




Caused by: java.lang.IllegalStateException: Authentication with IDBroker failed.  Please ensure you have a Kerberos token by using kinit.
	at org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.deployUnbonded(
	at org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.bindToAnyDelegationToken(
	at org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.serviceStart(
	at org.apache.hadoop.service.AbstractService.start(
	at org.apache.hadoop.fs.s3a.S3AFileSystem.bindAWSClient(
	at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(
	at org.apache.hadoop.fs.FileSystem.createFileSystem(
	at org.apache.hadoop.fs.FileSystem.access$200(
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(
	at org.apache.hadoop.fs.FileSystem$Cache.get(
	at org.apache.hadoop.fs.FileSystem.get(
	at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitterFactory.getDestinationFileSystem(
	at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitterFactory.createOutputCommitter(
	at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.getOutputCommitter(
	at org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol.setupCommitter(SQLHadoopMapReduceCommitProtocol.scala:40)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:229)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1289)
	at org.apache.spark.executor.Executor$
	at java.util.concurrent.ThreadPoolExecutor.runWorker(
	at java.util.concurrent.ThreadPoolExecutor$




With this my problem is, that I dont know how to pass this token or where to get it. I checked if any DENY rules of Ranger was active, but I did not see any.


I appreciat your help and thank you in advance.




New Contributor

Hello @pvidal !


So as usually the error was infront of the screen! I didnt actually check the Path within the JAR file which actually is "com.cloudera.hive.jdbc41.HS2Driver" after changing it everything works fine.


Sorry for the confusion and thanks for your support

View solution in original post


Cloudera Employee

New Contributor

Hi @pvidal,

thanks for the fast reply.


Yes indeed i saw this particular post. my implementation looks very similar - just not impala but hive:


!pip3 install JayDeBeApi
import jaydebeapi

conn_hive = jaydebeapi.connect('org.apache.hive.jdbc.HiveDriver', 'jdbc:hive2://our_host:443/;ssl=1;transportMode=http;httpPath=dataengineering/cdp-proxy-api/hive;AuthMech=3;', {'UID': "our_usre", 'PWD': "our_password"},jars='/home/cdsw/drivers/hive/HiveJDBC41.jar',)

curs_hive = conn_hive.cursor()


env variable CLASSPATH is set to the jar with which the connection via Java or DBeaver works:

'CLASSPATH': '/home/cdsw/drivers/HiveJDBC41.jar'

 Still i get the error. Any further ideas?


Cloudera Employee

Did you actually run the export in a terminal session, as follows?





New Contributor

Yes I did, but I had to add an "!" in order for the comand to be accepted



conn_hive = jaydebeapi.connect('org.apache.hive.jdbc.HiveDriver', 'jdbc:hive2://host:443/;ssl=1;transportMode=http;httpPath=dataengineering/cdp-proxy-api/hive;AuthMech=3;', \
    {'UID': "our_user", 'PWD': "our_pw"}, jars='/home/cdsw/drivers/hive/HiveJDBC41.jar',)

TypeError: Class org.apache.hive.jdbc.HiveDriver is not found

TypeError                                 Traceback (most recent call last)
in engine
----> 1 conn_hive = jaydebeapi.connect('org.apache.hive.jdbc.HiveDriver', 'jdbc:hive2://our_host:443/;ssl=1;transportMode=http;httpPath=dataengineering/cdp-proxy-api/hive;AuthMech=3;', 	{'UID': "our_user", 'PWD': "our_pw"}, jars='/home/cdsw/drivers/hive/HiveJDBC41.jar',)

/home/cdsw/.local/lib/python3.6/site-packages/jaydebeapi/ in connect(jclassname, url, driver_args, jars, libs)
    410     else:
    411         libs = []
--> 412     jconn = _jdbc_connect(jclassname, url, driver_args, jars, libs)
    413     return Connection(jconn, _converters)

/home/cdsw/.local/lib/python3.6/site-packages/jaydebeapi/ in _jdbc_connect_jpype(jclassname, url, driver_args, jars, libs)
    219             return jpype.JArray(jpype.JByte, 1)(data)
    220     # register driver for DriverManager
--> 221     jpype.JClass(jclassname)
    222     if isinstance(driver_args, dict):
    223         Properties =

/home/cdsw/.local/lib/python3.6/site-packages/jpype/ in __new__(cls, jc, loader, initialize)
     98         # Pass to class factory to create the type
---> 99         return _jpype._getClass(jc)

TypeError: Class org.apache.hive.jdbc.HiveDriver is not found



total 11864
-rwx------ 1 cdsw 12146136 Sep 25 07:07 HiveJDBC41.jar*


Still the same error

Cloudera Employee

Do me a favor and try this:


- open a terminal session (do not use !)

- run the following commands:

chmod a+r /home/cdsw/drivers/hive/HiveJDBC41.jar

- close the session and try to run your python code 

New Contributor

So I started a new Session within CML, from which I started a terminal session via ">_Terminal Access" and ran the commands you postet.

I verfied if the CLASSPATH was set by running


, which resulted in the expted output i.e.


I then closed the Terminal Session and ran the code within my CML Session.

Howerver the error stayed the same.



New Contributor

Hello @pvidal !


So as usually the error was infront of the screen! I didnt actually check the Path within the JAR file which actually is "com.cloudera.hive.jdbc41.HS2Driver" after changing it everything works fine.


Sorry for the confusion and thanks for your support

Cloudera Employee

Ha! Good catch!

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.