Member since
09-23-2020
5
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
14314 | 09-28-2020 01:56 AM |
09-28-2020
01:56 AM
Hello @pvidal ! So as usually the error was infront of the screen! I didnt actually check the Path within the JAR file which actually is "com.cloudera.hive.jdbc41.HS2Driver" after changing it everything works fine. Sorry for the confusion and thanks for your support
... View more
09-28-2020
12:24 AM
So I started a new Session within CML, from which I started a terminal session via ">_Terminal Access" and ran the commands you postet. I verfied if the CLASSPATH was set by running echo "$CLASSPATH" , which resulted in the expted output i.e. .:/home/cdsw/drivers/hive/HiveJDBC41.jar I then closed the Terminal Session and ran the code within my CML Session. Howerver the error stayed the same.
... View more
09-25-2020
01:44 AM
Yes I did, but I had to add an "!" in order for the comand to be accepted !CLASSPATH=/home/cdsw/drivers/hive/HiveJDBC41.jar
!export CLASSPATH
conn_hive = jaydebeapi.connect('org.apache.hive.jdbc.HiveDriver', 'jdbc:hive2://host:443/;ssl=1;transportMode=http;httpPath=dataengineering/cdp-proxy-api/hive;AuthMech=3;', \
{'UID': "our_user", 'PWD': "our_pw"}, jars='/home/cdsw/drivers/hive/HiveJDBC41.jar',)
TypeError: Class org.apache.hive.jdbc.HiveDriver is not found
TypeError Traceback (most recent call last)
in engine
----> 1 conn_hive = jaydebeapi.connect('org.apache.hive.jdbc.HiveDriver', 'jdbc:hive2://our_host:443/;ssl=1;transportMode=http;httpPath=dataengineering/cdp-proxy-api/hive;AuthMech=3;', {'UID': "our_user", 'PWD': "our_pw"}, jars='/home/cdsw/drivers/hive/HiveJDBC41.jar',)
/home/cdsw/.local/lib/python3.6/site-packages/jaydebeapi/__init__.py in connect(jclassname, url, driver_args, jars, libs)
410 else:
411 libs = []
--> 412 jconn = _jdbc_connect(jclassname, url, driver_args, jars, libs)
413 return Connection(jconn, _converters)
414
/home/cdsw/.local/lib/python3.6/site-packages/jaydebeapi/__init__.py in _jdbc_connect_jpype(jclassname, url, driver_args, jars, libs)
219 return jpype.JArray(jpype.JByte, 1)(data)
220 # register driver for DriverManager
--> 221 jpype.JClass(jclassname)
222 if isinstance(driver_args, dict):
223 Properties = jpype.java.util.Properties
/home/cdsw/.local/lib/python3.6/site-packages/jpype/_jclass.py in __new__(cls, jc, loader, initialize)
97
98 # Pass to class factory to create the type
---> 99 return _jpype._getClass(jc)
100
101
TypeError: Class org.apache.hive.jdbc.HiveDriver is not found
/home/cdsw/drivers/hive
ll
total 11864
-rwx------ 1 cdsw 12146136 Sep 25 07:07 HiveJDBC41.jar* Still the same error
... View more
09-24-2020
09:24 AM
Hi @pvidal, thanks for the fast reply. Yes indeed i saw this particular post. my implementation looks very similar - just not impala but hive: !pip3 install JayDeBeApi
import jaydebeapi
conn_hive = jaydebeapi.connect('org.apache.hive.jdbc.HiveDriver', 'jdbc:hive2://our_host:443/;ssl=1;transportMode=http;httpPath=dataengineering/cdp-proxy-api/hive;AuthMech=3;', {'UID': "our_usre", 'PWD': "our_password"},jars='/home/cdsw/drivers/hive/HiveJDBC41.jar',)
curs_hive = conn_hive.cursor() env variable CLASSPATH is set to the jar with which the connection via Java or DBeaver works: 'CLASSPATH': '/home/cdsw/drivers/HiveJDBC41.jar' Still i get the error. Any further ideas?
... View more
09-24-2020
01:35 AM
Hello everyone,
we setup a Cloudera Environment which inherits a DataHub of type "7.1.0 - Data Engineering: Apache Spark, Apache Hive, Apache Oozie". We managed to connect to HIVE via a JDBC connection from our local machines.
But so far we were not able to connect from CML to HIVE via JDBC
I use the JayDeBeApi as follows:
conn_hive = jaydebeapi.connect('org.apache.hive.jdbc.HiveDriver', 'jdbc:hive2://dataengineering-master0.......:443/;ssl=1;transportMode=http;httpPath=dataengineering/cdp-proxy-api/hive;AuthMech=3;', \
{'UID': "user_name", 'PWD': "password"}, '/home/cdsw/drivers/HiveJDBC41.jar',)
The error message is
TypeError: Class org.apache.hive.jdbc.HiveDriver is not found
I set the environment variable CLASSPATH to
'/home/cdsw/drivers/HiveJDBC41.jar'
which is were the jar actually rests. Hence I wanted to check if JAVA_HOME is set correctly and yes, there the env. variable is set to
'/usr/lib/jvm/java-8-openjdk-amd64/'
Howerver when i run the command !java --version I get an error
Unrecognized option: --version
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
Is this normal and can i expect JAVA to still work as expected, or could this be the source of my problem?
Since connecting via JDBC did not work. I also tried connecting via a SparkSession as I saw in yesterdays "CDP Priavte Cloud Partner Edition". The presented code looks as follows
from pyspark.sql import SparkSession
# Instantiate Spark-on-K8s Cluster
spark = SparkSession.builder.appName("Simple Spark Test") \
.config("spark.executor.memory", "8g") \
.config("spark.executor.cores", "2") \
.config("spark.driver.memory", "2g") \
.config("spark.executor.instances", "2") \
.getOrCreate()
# Validate Spark Connectivity
spark.sql("SHOW databases").show()
spark.sql("use default")
spark.sql("show tables").show()
spark.sql('create table testcml (abc integer)').show()
spark.sql("insert into table testcml select t.* from (select 1) t").show()
spark.sql("select * from testcml").show()
spark.sql("drop table testcml").show()
# Stop Spark Session
spark.stop()
Listing the databases and the tables of a DB, as well as creating the "testcml" tables works fine. But the insert into testcml failes due to
Caused by: java.lang.IllegalStateException: Authentication with IDBroker failed. Please ensure you have a Kerberos token by using kinit.
at org.apache.knox.gateway.cloud.idbroker.s3a.IDBDelegationTokenBinding.getNewKnoxDelegationTokenSession(IDBDelegationTokenBinding.java:461)
at org.apache.knox.gateway.cloud.idbroker.s3a.IDBDelegationTokenBinding.requestNewKnoxToken(IDBDelegationTokenBinding.java:406)
at org.apache.knox.gateway.cloud.idbroker.s3a.IDBDelegationTokenBinding.getNewKnoxToken(IDBDelegationTokenBinding.java:484)
at org.apache.knox.gateway.cloud.idbroker.s3a.IDBDelegationTokenBinding.maybeRenewAccessToken(IDBDelegationTokenBinding.java:476)
at org.apache.knox.gateway.cloud.idbroker.s3a.IDBDelegationTokenBinding.deployUnbonded(IDBDelegationTokenBinding.java:335)
at org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.deployUnbonded(S3ADelegationTokens.java:245)
at org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.bindToAnyDelegationToken(S3ADelegationTokens.java:278)
at org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.serviceStart(S3ADelegationTokens.java:199)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.fs.s3a.S3AFileSystem.bindAWSClient(S3AFileSystem.java:608)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:388)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3396)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3456)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3424)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:518)
at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitterFactory.getDestinationFileSystem(AbstractS3ACommitterFactory.java:73)
at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitterFactory.createOutputCommitter(AbstractS3ACommitterFactory.java:45)
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.getOutputCommitter(FileOutputFormat.java:338)
at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupCommitter(HadoopMapReduceCommitProtocol.scala:100)
at org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol.setupCommitter(SQLHadoopMapReduceCommitProtocol.scala:40)
at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupTask(HadoopMapReduceCommitProtocol.scala:217)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:229)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1289)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
With this my problem is, that I dont know how to pass this token or where to get it. I checked if any DENY rules of Ranger was active, but I did not see any.
I appreciat your help and thank you in advance.
Regards, Dominic
... View more