About DominicCap1

DominicCap1 · ‎09-28-2020

Hello @pvidal ! So as usually the error was infront of the screen! I didnt actually check the Path within the JAR file which actually is "com.cloudera.hive.jdbc41.HS2Driver" after changing it everything works fine. Sorry for the confusion and thanks for your support

DominicCap1 · ‎09-28-2020

So I started a new Session within CML, from which I started a terminal session via ">_Terminal Access" and ran the commands you postet. I verfied if the CLASSPATH was set by running echo "$CLASSPATH" , which resulted in the expted output i.e. .:/home/cdsw/drivers/hive/HiveJDBC41.jar I then closed the Terminal Session and ran the code within my CML Session. Howerver the error stayed the same.

DominicCap1 · ‎09-25-2020

Yes I did, but I had to add an "!" in order for the comand to be accepted !CLASSPATH=/home/cdsw/drivers/hive/HiveJDBC41.jar !export CLASSPATH conn_hive = jaydebeapi.connect('org.apache.hive.jdbc.HiveDriver', 'jdbc:hive2://host:443/;ssl=1;transportMode=http;httpPath=dataengineering/cdp-proxy-api/hive;AuthMech=3;', \ {'UID': "our_user", 'PWD': "our_pw"}, jars='/home/cdsw/drivers/hive/HiveJDBC41.jar',) TypeError: Class org.apache.hive.jdbc.HiveDriver is not found TypeError Traceback (most recent call last) in engine ----> 1 conn_hive = jaydebeapi.connect('org.apache.hive.jdbc.HiveDriver', 'jdbc:hive2://our_host:443/;ssl=1;transportMode=http;httpPath=dataengineering/cdp-proxy-api/hive;AuthMech=3;', {'UID': "our_user", 'PWD': "our_pw"}, jars='/home/cdsw/drivers/hive/HiveJDBC41.jar',) /home/cdsw/.local/lib/python3.6/site-packages/jaydebeapi/__init__.py in connect(jclassname, url, driver_args, jars, libs) 410 else: 411 libs = [] --> 412 jconn = _jdbc_connect(jclassname, url, driver_args, jars, libs) 413 return Connection(jconn, _converters) 414 /home/cdsw/.local/lib/python3.6/site-packages/jaydebeapi/__init__.py in _jdbc_connect_jpype(jclassname, url, driver_args, jars, libs) 219 return jpype.JArray(jpype.JByte, 1)(data) 220 # register driver for DriverManager --> 221 jpype.JClass(jclassname) 222 if isinstance(driver_args, dict): 223 Properties = jpype.java.util.Properties /home/cdsw/.local/lib/python3.6/site-packages/jpype/_jclass.py in __new__(cls, jc, loader, initialize) 97 98 # Pass to class factory to create the type ---> 99 return _jpype._getClass(jc) 100 101 TypeError: Class org.apache.hive.jdbc.HiveDriver is not found /home/cdsw/drivers/hive ll total 11864 -rwx------ 1 cdsw 12146136 Sep 25 07:07 HiveJDBC41.jar* Still the same error

DominicCap1 · ‎09-24-2020

Hi @pvidal, thanks for the fast reply. Yes indeed i saw this particular post. my implementation looks very similar - just not impala but hive: !pip3 install JayDeBeApi import jaydebeapi conn_hive = jaydebeapi.connect('org.apache.hive.jdbc.HiveDriver', 'jdbc:hive2://our_host:443/;ssl=1;transportMode=http;httpPath=dataengineering/cdp-proxy-api/hive;AuthMech=3;', {'UID': "our_usre", 'PWD': "our_password"},jars='/home/cdsw/drivers/hive/HiveJDBC41.jar',) curs_hive = conn_hive.cursor() env variable CLASSPATH is set to the jar with which the connection via Java or DBeaver works: 'CLASSPATH': '/home/cdsw/drivers/HiveJDBC41.jar' Still i get the error. Any further ideas?

DominicCap1 · ‎09-24-2020

Hello everyone, we setup a Cloudera Environment which inherits a DataHub of type "7.1.0 - Data Engineering: Apache Spark, Apache Hive, Apache Oozie". We managed to connect to HIVE via a JDBC connection from our local machines. But so far we were not able to connect from CML to HIVE via JDBC I use the JayDeBeApi as follows: conn_hive = jaydebeapi.connect('org.apache.hive.jdbc.HiveDriver', 'jdbc:hive2://dataengineering-master0.......:443/;ssl=1;transportMode=http;httpPath=dataengineering/cdp-proxy-api/hive;AuthMech=3;', \ {'UID': "user_name", 'PWD': "password"}, '/home/cdsw/drivers/HiveJDBC41.jar',) The error message is TypeError: Class org.apache.hive.jdbc.HiveDriver is not found I set the environment variable CLASSPATH to '/home/cdsw/drivers/HiveJDBC41.jar' which is were the jar actually rests. Hence I wanted to check if JAVA_HOME is set correctly and yes, there the env. variable is set to '/usr/lib/jvm/java-8-openjdk-amd64/' Howerver when i run the command !java --version I get an error Unrecognized option: --version Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. Is this normal and can i expect JAVA to still work as expected, or could this be the source of my problem? Since connecting via JDBC did not work. I also tried connecting via a SparkSession as I saw in yesterdays "CDP Priavte Cloud Partner Edition". The presented code looks as follows from pyspark.sql import SparkSession # Instantiate Spark-on-K8s Cluster spark = SparkSession.builder.appName("Simple Spark Test") \ .config("spark.executor.memory", "8g") \ .config("spark.executor.cores", "2") \ .config("spark.driver.memory", "2g") \ .config("spark.executor.instances", "2") \ .getOrCreate() # Validate Spark Connectivity spark.sql("SHOW databases").show() spark.sql("use default") spark.sql("show tables").show() spark.sql('create table testcml (abc integer)').show() spark.sql("insert into table testcml select t.* from (select 1) t").show() spark.sql("select * from testcml").show() spark.sql("drop table testcml").show() # Stop Spark Session spark.stop() Listing the databases and the tables of a DB, as well as creating the "testcml" tables works fine. But the insert into testcml failes due to Caused by: java.lang.IllegalStateException: Authentication with IDBroker failed. Please ensure you have a Kerberos token by using kinit. at org.apache.knox.gateway.cloud.idbroker.s3a.IDBDelegationTokenBinding.getNewKnoxDelegationTokenSession(IDBDelegationTokenBinding.java:461) at org.apache.knox.gateway.cloud.idbroker.s3a.IDBDelegationTokenBinding.requestNewKnoxToken(IDBDelegationTokenBinding.java:406) at org.apache.knox.gateway.cloud.idbroker.s3a.IDBDelegationTokenBinding.getNewKnoxToken(IDBDelegationTokenBinding.java:484) at org.apache.knox.gateway.cloud.idbroker.s3a.IDBDelegationTokenBinding.maybeRenewAccessToken(IDBDelegationTokenBinding.java:476) at org.apache.knox.gateway.cloud.idbroker.s3a.IDBDelegationTokenBinding.deployUnbonded(IDBDelegationTokenBinding.java:335) at org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.deployUnbonded(S3ADelegationTokens.java:245) at org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.bindToAnyDelegationToken(S3ADelegationTokens.java:278) at org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.serviceStart(S3ADelegationTokens.java:199) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.fs.s3a.S3AFileSystem.bindAWSClient(S3AFileSystem.java:608) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:388) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3396) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3456) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3424) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:518) at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitterFactory.getDestinationFileSystem(AbstractS3ACommitterFactory.java:73) at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitterFactory.createOutputCommitter(AbstractS3ACommitterFactory.java:45) at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.getOutputCommitter(FileOutputFormat.java:338) at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupCommitter(HadoopMapReduceCommitProtocol.scala:100) at org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol.setupCommitter(SQLHadoopMapReduceCommitProtocol.scala:40) at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupTask(HadoopMapReduceCommitProtocol.scala:217) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:229) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1289) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) With this my problem is, that I dont know how to pass this token or where to get it. I checked if any DENY rules of Ranger was active, but I did not see any. I appreciat your help and thank you in advance. Regards, Dominic

Online	Offline
Last Visited	‎12-10-2020 08:01 AM

Member Since	‎09-23-2020 11:05 AM
Last Visited	‎12-10-2020 08:01 AM
Posts	5

Cloudera Community

Re: How to connect CML to Hive using Python

Re: How to connect CML to Hive using Python

Re: How to connect CML to Hive using Python

Re: How to connect CML to Hive using Python

Re: How to connect CML to Hive using Python

How to connect CML to Hive using Python