Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to connect CML to Hive using Python

avatar
Explorer

Hello everyone,

 

we setup a Cloudera Environment which inherits a DataHub of type "7.1.0 - Data Engineering: Apache Spark, Apache Hive, Apache Oozie". We managed to connect to HIVE via a JDBC connection from our local machines.

But so far we were not able to connect from CML to HIVE via JDBC

I use the JayDeBeApi as follows:

 

 

 

conn_hive = jaydebeapi.connect('org.apache.hive.jdbc.HiveDriver', 'jdbc:hive2://dataengineering-master0.......:443/;ssl=1;transportMode=http;httpPath=dataengineering/cdp-proxy-api/hive;AuthMech=3;', \
	{'UID': "user_name", 'PWD': "password"}, '/home/cdsw/drivers/HiveJDBC41.jar',)

 

 

 

The error message is

 

 

 

TypeError: Class org.apache.hive.jdbc.HiveDriver is not found

 

 

 

I set the environment variable CLASSPATH to

'/home/cdsw/drivers/HiveJDBC41.jar'

which is were the jar actually rests. Hence I wanted to check if JAVA_HOME is set correctly and yes, there the env. variable is set to

'/usr/lib/jvm/java-8-openjdk-amd64/'

Howerver when i run the command !java --version I get an error

Unrecognized option: --version
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

Is this normal and can i expect JAVA to still work as expected, or could this be the source of my problem?

 

 

Since connecting via JDBC did not work. I also tried connecting via a SparkSession as I saw in yesterdays "CDP Priavte Cloud Partner Edition". The presented code looks as follows

 

 

 

from pyspark.sql import SparkSession

# Instantiate Spark-on-K8s Cluster
spark = SparkSession.builder.appName("Simple Spark Test") \
	.config("spark.executor.memory", "8g") \
    .config("spark.executor.cores", "2") \
    .config("spark.driver.memory", "2g") \
    .config("spark.executor.instances", "2") \
	.getOrCreate()

# Validate Spark Connectivity
spark.sql("SHOW databases").show()
spark.sql("use default")
spark.sql("show tables").show()
spark.sql('create table testcml (abc integer)').show()
spark.sql("insert into table testcml select t.* from (select 1) t").show()
spark.sql("select * from testcml").show()
spark.sql("drop table testcml").show()

# Stop Spark Session
spark.stop()

 

 

 

Listing the databases and the tables of a DB, as well as creating the "testcml" tables works fine. But the insert into testcml failes due to

 

 

 

Caused by: java.lang.IllegalStateException: Authentication with IDBroker failed.  Please ensure you have a Kerberos token by using kinit.
	at org.apache.knox.gateway.cloud.idbroker.s3a.IDBDelegationTokenBinding.getNewKnoxDelegationTokenSession(IDBDelegationTokenBinding.java:461)
	at org.apache.knox.gateway.cloud.idbroker.s3a.IDBDelegationTokenBinding.requestNewKnoxToken(IDBDelegationTokenBinding.java:406)
	at org.apache.knox.gateway.cloud.idbroker.s3a.IDBDelegationTokenBinding.getNewKnoxToken(IDBDelegationTokenBinding.java:484)
	at org.apache.knox.gateway.cloud.idbroker.s3a.IDBDelegationTokenBinding.maybeRenewAccessToken(IDBDelegationTokenBinding.java:476)
	at org.apache.knox.gateway.cloud.idbroker.s3a.IDBDelegationTokenBinding.deployUnbonded(IDBDelegationTokenBinding.java:335)
	at org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.deployUnbonded(S3ADelegationTokens.java:245)
	at org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.bindToAnyDelegationToken(S3ADelegationTokens.java:278)
	at org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.serviceStart(S3ADelegationTokens.java:199)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.bindAWSClient(S3AFileSystem.java:608)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:388)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3396)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3456)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3424)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:518)
	at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitterFactory.getDestinationFileSystem(AbstractS3ACommitterFactory.java:73)
	at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitterFactory.createOutputCommitter(AbstractS3ACommitterFactory.java:45)
	at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.getOutputCommitter(FileOutputFormat.java:338)
	at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupCommitter(HadoopMapReduceCommitProtocol.scala:100)
	at org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol.setupCommitter(SQLHadoopMapReduceCommitProtocol.scala:40)
	at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupTask(HadoopMapReduceCommitProtocol.scala:217)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:229)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1289)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

 

 

 

With this my problem is, that I dont know how to pass this token or where to get it. I checked if any DENY rules of Ranger was active, but I did not see any.

 

I appreciat your help and thank you in advance.

Regards,
Dominic

 

1 ACCEPTED SOLUTION

avatar
Explorer

Hello @pvidal !

 

So as usually the error was infront of the screen! I didnt actually check the Path within the JAR file which actually is "com.cloudera.hive.jdbc41.HS2Driver" after changing it everything works fine.

 

Sorry for the confusion and thanks for your support

View solution in original post

8 REPLIES 8

avatar
Rising Star

avatar
Explorer

Hi @pvidal,

thanks for the fast reply.

 

Yes indeed i saw this particular post. my implementation looks very similar - just not impala but hive:

 

!pip3 install JayDeBeApi
import jaydebeapi

conn_hive = jaydebeapi.connect('org.apache.hive.jdbc.HiveDriver', 'jdbc:hive2://our_host:443/;ssl=1;transportMode=http;httpPath=dataengineering/cdp-proxy-api/hive;AuthMech=3;', {'UID': "our_usre", 'PWD': "our_password"},jars='/home/cdsw/drivers/hive/HiveJDBC41.jar',)

curs_hive = conn_hive.cursor()

 

env variable CLASSPATH is set to the jar with which the connection via Java or DBeaver works:

'CLASSPATH': '/home/cdsw/drivers/HiveJDBC41.jar'

 Still i get the error. Any further ideas?

 

avatar
Rising Star

Did you actually run the export in a terminal session, as follows?

CLASSPATH=.:/home/cdsw/drivers/HiveJDBC41.jar

export CLASSPATH

 

 

avatar
Explorer

Yes I did, but I had to add an "!" in order for the comand to be accepted

!CLASSPATH=/home/cdsw/drivers/hive/HiveJDBC41.jar

!export CLASSPATH

conn_hive = jaydebeapi.connect('org.apache.hive.jdbc.HiveDriver', 'jdbc:hive2://host:443/;ssl=1;transportMode=http;httpPath=dataengineering/cdp-proxy-api/hive;AuthMech=3;', \
    {'UID': "our_user", 'PWD': "our_pw"}, jars='/home/cdsw/drivers/hive/HiveJDBC41.jar',)

TypeError: Class org.apache.hive.jdbc.HiveDriver is not found

TypeError                                 Traceback (most recent call last)
in engine
----> 1 conn_hive = jaydebeapi.connect('org.apache.hive.jdbc.HiveDriver', 'jdbc:hive2://our_host:443/;ssl=1;transportMode=http;httpPath=dataengineering/cdp-proxy-api/hive;AuthMech=3;', 	{'UID': "our_user", 'PWD': "our_pw"}, jars='/home/cdsw/drivers/hive/HiveJDBC41.jar',)

/home/cdsw/.local/lib/python3.6/site-packages/jaydebeapi/__init__.py in connect(jclassname, url, driver_args, jars, libs)
    410     else:
    411         libs = []
--> 412     jconn = _jdbc_connect(jclassname, url, driver_args, jars, libs)
    413     return Connection(jconn, _converters)
    414 

/home/cdsw/.local/lib/python3.6/site-packages/jaydebeapi/__init__.py in _jdbc_connect_jpype(jclassname, url, driver_args, jars, libs)
    219             return jpype.JArray(jpype.JByte, 1)(data)
    220     # register driver for DriverManager
--> 221     jpype.JClass(jclassname)
    222     if isinstance(driver_args, dict):
    223         Properties = jpype.java.util.Properties

/home/cdsw/.local/lib/python3.6/site-packages/jpype/_jclass.py in __new__(cls, jc, loader, initialize)
     97 
     98         # Pass to class factory to create the type
---> 99         return _jpype._getClass(jc)
    100 
    101 

TypeError: Class org.apache.hive.jdbc.HiveDriver is not found

/home/cdsw/drivers/hive

ll

total 11864
-rwx------ 1 cdsw 12146136 Sep 25 07:07 HiveJDBC41.jar*

 

Still the same error

avatar
Rising Star

Do me a favor and try this:

 

- open a terminal session (do not use !)

- run the following commands:

chmod a+r /home/cdsw/drivers/hive/HiveJDBC41.jar
CLASSPATH=.:/home/cdsw/drivers/hive/HiveJDBC41.jar
export CLASSPATH

- close the session and try to run your python code 

avatar
Explorer

So I started a new Session within CML, from which I started a terminal session via ">_Terminal Access" and ran the commands you postet.

I verfied if the CLASSPATH was set by running

echo "$CLASSPATH"

, which resulted in the expted output i.e.

.:/home/cdsw/drivers/hive/HiveJDBC41.jar

I then closed the Terminal Session and ran the code within my CML Session.

Howerver the error stayed the same.

 

 

avatar
Explorer

Hello @pvidal !

 

So as usually the error was infront of the screen! I didnt actually check the Path within the JAR file which actually is "com.cloudera.hive.jdbc41.HS2Driver" after changing it everything works fine.

 

Sorry for the confusion and thanks for your support

avatar
Rising Star

Ha! Good catch!