Created on 09-24-2020 01:35 AM - last edited on 09-24-2020 06:55 AM by VidyaSargur
Hello everyone,
we setup a Cloudera Environment which inherits a DataHub of type "7.1.0 - Data Engineering: Apache Spark, Apache Hive, Apache Oozie". We managed to connect to HIVE via a JDBC connection from our local machines.
But so far we were not able to connect from CML to HIVE via JDBC
I use the JayDeBeApi as follows:
conn_hive = jaydebeapi.connect('org.apache.hive.jdbc.HiveDriver', 'jdbc:hive2://dataengineering-master0.......:443/;ssl=1;transportMode=http;httpPath=dataengineering/cdp-proxy-api/hive;AuthMech=3;', \
{'UID': "user_name", 'PWD': "password"}, '/home/cdsw/drivers/HiveJDBC41.jar',)
The error message is
TypeError: Class org.apache.hive.jdbc.HiveDriver is not found
I set the environment variable CLASSPATH to
'/home/cdsw/drivers/HiveJDBC41.jar'
which is were the jar actually rests. Hence I wanted to check if JAVA_HOME is set correctly and yes, there the env. variable is set to
'/usr/lib/jvm/java-8-openjdk-amd64/'
Howerver when i run the command !java --version I get an error
Unrecognized option: --version Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit.
Is this normal and can i expect JAVA to still work as expected, or could this be the source of my problem?
Since connecting via JDBC did not work. I also tried connecting via a SparkSession as I saw in yesterdays "CDP Priavte Cloud Partner Edition". The presented code looks as follows
from pyspark.sql import SparkSession
# Instantiate Spark-on-K8s Cluster
spark = SparkSession.builder.appName("Simple Spark Test") \
.config("spark.executor.memory", "8g") \
.config("spark.executor.cores", "2") \
.config("spark.driver.memory", "2g") \
.config("spark.executor.instances", "2") \
.getOrCreate()
# Validate Spark Connectivity
spark.sql("SHOW databases").show()
spark.sql("use default")
spark.sql("show tables").show()
spark.sql('create table testcml (abc integer)').show()
spark.sql("insert into table testcml select t.* from (select 1) t").show()
spark.sql("select * from testcml").show()
spark.sql("drop table testcml").show()
# Stop Spark Session
spark.stop()
Listing the databases and the tables of a DB, as well as creating the "testcml" tables works fine. But the insert into testcml failes due to
Caused by: java.lang.IllegalStateException: Authentication with IDBroker failed. Please ensure you have a Kerberos token by using kinit.
at org.apache.knox.gateway.cloud.idbroker.s3a.IDBDelegationTokenBinding.getNewKnoxDelegationTokenSession(IDBDelegationTokenBinding.java:461)
at org.apache.knox.gateway.cloud.idbroker.s3a.IDBDelegationTokenBinding.requestNewKnoxToken(IDBDelegationTokenBinding.java:406)
at org.apache.knox.gateway.cloud.idbroker.s3a.IDBDelegationTokenBinding.getNewKnoxToken(IDBDelegationTokenBinding.java:484)
at org.apache.knox.gateway.cloud.idbroker.s3a.IDBDelegationTokenBinding.maybeRenewAccessToken(IDBDelegationTokenBinding.java:476)
at org.apache.knox.gateway.cloud.idbroker.s3a.IDBDelegationTokenBinding.deployUnbonded(IDBDelegationTokenBinding.java:335)
at org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.deployUnbonded(S3ADelegationTokens.java:245)
at org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.bindToAnyDelegationToken(S3ADelegationTokens.java:278)
at org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.serviceStart(S3ADelegationTokens.java:199)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.fs.s3a.S3AFileSystem.bindAWSClient(S3AFileSystem.java:608)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:388)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3396)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3456)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3424)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:518)
at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitterFactory.getDestinationFileSystem(AbstractS3ACommitterFactory.java:73)
at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitterFactory.createOutputCommitter(AbstractS3ACommitterFactory.java:45)
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.getOutputCommitter(FileOutputFormat.java:338)
at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupCommitter(HadoopMapReduceCommitProtocol.scala:100)
at org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol.setupCommitter(SQLHadoopMapReduceCommitProtocol.scala:40)
at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupTask(HadoopMapReduceCommitProtocol.scala:217)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:229)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1289)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
With this my problem is, that I dont know how to pass this token or where to get it. I checked if any DENY rules of Ranger was active, but I did not see any.
I appreciat your help and thank you in advance.
Regards,
Dominic
Created 09-28-2020 01:56 AM
Hello @pvidal !
So as usually the error was infront of the screen! I didnt actually check the Path within the JAR file which actually is "com.cloudera.hive.jdbc41.HS2Driver" after changing it everything works fine.
Sorry for the confusion and thanks for your support
Created 09-24-2020 07:03 AM
Hey,
Have you checked this article?
Created 09-24-2020 09:24 AM
Hi @pvidal,
thanks for the fast reply.
Yes indeed i saw this particular post. my implementation looks very similar - just not impala but hive:
!pip3 install JayDeBeApi
import jaydebeapi
conn_hive = jaydebeapi.connect('org.apache.hive.jdbc.HiveDriver', 'jdbc:hive2://our_host:443/;ssl=1;transportMode=http;httpPath=dataengineering/cdp-proxy-api/hive;AuthMech=3;', {'UID': "our_usre", 'PWD': "our_password"},jars='/home/cdsw/drivers/hive/HiveJDBC41.jar',)
curs_hive = conn_hive.cursor()
env variable CLASSPATH is set to the jar with which the connection via Java or DBeaver works:
'CLASSPATH': '/home/cdsw/drivers/HiveJDBC41.jar'
Still i get the error. Any further ideas?
Created on 09-24-2020 09:42 AM - edited 09-24-2020 10:38 AM
Did you actually run the export in a terminal session, as follows?
CLASSPATH=.:/home/cdsw/drivers/HiveJDBC41.jar
export CLASSPATH
Created 09-25-2020 01:44 AM
Yes I did, but I had to add an "!" in order for the comand to be accepted
!CLASSPATH=/home/cdsw/drivers/hive/HiveJDBC41.jar
!export CLASSPATH
conn_hive = jaydebeapi.connect('org.apache.hive.jdbc.HiveDriver', 'jdbc:hive2://host:443/;ssl=1;transportMode=http;httpPath=dataengineering/cdp-proxy-api/hive;AuthMech=3;', \
{'UID': "our_user", 'PWD': "our_pw"}, jars='/home/cdsw/drivers/hive/HiveJDBC41.jar',)
TypeError: Class org.apache.hive.jdbc.HiveDriver is not found
TypeError Traceback (most recent call last)
in engine
----> 1 conn_hive = jaydebeapi.connect('org.apache.hive.jdbc.HiveDriver', 'jdbc:hive2://our_host:443/;ssl=1;transportMode=http;httpPath=dataengineering/cdp-proxy-api/hive;AuthMech=3;', {'UID': "our_user", 'PWD': "our_pw"}, jars='/home/cdsw/drivers/hive/HiveJDBC41.jar',)
/home/cdsw/.local/lib/python3.6/site-packages/jaydebeapi/__init__.py in connect(jclassname, url, driver_args, jars, libs)
410 else:
411 libs = []
--> 412 jconn = _jdbc_connect(jclassname, url, driver_args, jars, libs)
413 return Connection(jconn, _converters)
414
/home/cdsw/.local/lib/python3.6/site-packages/jaydebeapi/__init__.py in _jdbc_connect_jpype(jclassname, url, driver_args, jars, libs)
219 return jpype.JArray(jpype.JByte, 1)(data)
220 # register driver for DriverManager
--> 221 jpype.JClass(jclassname)
222 if isinstance(driver_args, dict):
223 Properties = jpype.java.util.Properties
/home/cdsw/.local/lib/python3.6/site-packages/jpype/_jclass.py in __new__(cls, jc, loader, initialize)
97
98 # Pass to class factory to create the type
---> 99 return _jpype._getClass(jc)
100
101
TypeError: Class org.apache.hive.jdbc.HiveDriver is not found
/home/cdsw/drivers/hive
ll
total 11864
-rwx------ 1 cdsw 12146136 Sep 25 07:07 HiveJDBC41.jar*
Still the same error
Created 09-25-2020 05:54 AM
Do me a favor and try this:
- open a terminal session (do not use !)
- run the following commands:
chmod a+r /home/cdsw/drivers/hive/HiveJDBC41.jar
CLASSPATH=.:/home/cdsw/drivers/hive/HiveJDBC41.jar
export CLASSPATH
- close the session and try to run your python code
Created 09-28-2020 12:24 AM
So I started a new Session within CML, from which I started a terminal session via ">_Terminal Access" and ran the commands you postet.
I verfied if the CLASSPATH was set by running
echo "$CLASSPATH"
, which resulted in the expted output i.e.
.:/home/cdsw/drivers/hive/HiveJDBC41.jar
I then closed the Terminal Session and ran the code within my CML Session.
Howerver the error stayed the same.
Created 09-28-2020 01:56 AM
Hello @pvidal !
So as usually the error was infront of the screen! I didnt actually check the Path within the JAR file which actually is "com.cloudera.hive.jdbc41.HS2Driver" after changing it everything works fine.
Sorry for the confusion and thanks for your support
Created 09-28-2020 05:14 AM
Ha! Good catch!