Created 07-14-2023 08:28 AM
Hello,
I am trying to connect to a remote hive cluster which requires kerberos, using spark mysql connector
```
# Imports
from pyspark.sql import SparkSession
# Create SparkSession
spark = SparkSession.builder \
.appName('SparkByExamples.com') \
.config("spark.jars", "/Users/u530241/spark3/spark-3.3.2-bin-hadoop3/jars/mysql-connector-java-5.1.45-bin.jar").enableHiveSupport().getOrCreate()
df = spark.read \
.jdbc("jdbc:hive2://p-cdl-knox-prd-svc.hcscint.net:8443/default;transportMode=http;httpPath=gateway/default/hive;ssl=true;sslTrustStore=/Users/u530241/Downloads/gateway_prod_.jks;trustStorePassword=knoxprod",
"member_connect",
properties={"user": "u530241", "password": "..", "driver":"com.mysql.jdbc.Driver"})
```
I keep getting this error -
```
IllegalArgumentException: requirement failed: The driver could not open a JDBC connection. Check the URL: jdbc:hive2://p-cdl-knox-prd-svc.hcscint.net:8443/default;transportMode=http;httpPath=gateway/default/hive;ssl=true;sslTrustStore=/Users/u530241/Downloads/gateway_prod_.jks;trustStorePassword=knoxprod
```
Could you please give me some suggestions?
Created 07-14-2023 09:32 AM
@pranav007 Welcome to the Cloudera Community!
To help you get the best possible solution, I have tagged our Hive expert @cravani who may be able to assist you further.
Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
Regards,
Diana Torres,Created 07-14-2023 09:55 AM
You seem to be using wrong driver to connect to HiveServer2. MySQL JDBC driver does not work to connect to HS2.
You would need Apache Hive JDBC Driver or Cloudera Hive JDBC Driver to connect to Hive. You can download latest Apache Hive JDBC Driver from: https://repo1.maven.org/maven2/org/apache/hive/hive-jdbc/4.0.0-alpha-2/
File: hive-jdbc-4.0.0-alpha-2-standalone.jar
Add the same jar to your spark.jars.
Created 07-14-2023 01:38 PM
Hi, I changed the file and added it to the jars.
# Imports
from pyspark.sql import SparkSession
# Create SparkSession
spark = SparkSession.builder \
.appName('SparkByExamples.com') \
.config("spark.jars", "/Users/u530241/spark3/spark-3.3.2-bin-hadoop3/jars/hive-jdbc-4.0.0-alpha-2-standalone.jar").enableHiveSupport().getOrCreate()
I am not getting the error -
```
Exception in thread "main" java.lang.IllegalArgumentException: Can't get Kerberos realm at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:575) at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:3746) at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:3736) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3520) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540) at org.apache.spark.util.DependencyUtils$.resolveGlobPath(DependencyUtils.scala:317) at org.apache.spark.util.DependencyUtils$.$anonfun$resolveGlobPaths$2(DependencyUtils.scala:273) at org.apache.spark.util.DependencyUtils$.$anonfun$resolveGlobPaths$2$adapted(DependencyUtils.scala:271) at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293) at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108) at org.apache.spark.util.DependencyUtils$.resolveGlobPaths(DependencyUtils.scala:271) at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$4(SparkSubmit.scala:364) at scala.Option.map(Option.scala:230) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:364) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:901) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading failed at java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.<init>(KerberosPrincipal.java:179) at org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120) at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69) ... 28 more
RuntimeError: Java gateway process exited before sending its port number
```
Created on 07-15-2023 12:03 PM - edited 07-15-2023 12:09 PM
Are you sure that you are trying to connect using Kerberos to Hive? It appears to me that you are trying to connect to Hive over Knox which may have LDAP configured instead of Kerberos?
Can you try below? by replacing your username and password
df = spark.read \
.jdbc("jdbc:hive2://p-cdl-knox-prd-svc.hcscint.net:8443/default;transportMode=http;httpPath=gateway/default/hive;ssl=true;sslTrustStore=/Users/u530241/Downloads/gateway_prod_.jks;trustStorePassword=knoxprod;user=<username>;password=<password>")
Created 07-17-2023 08:00 AM
Yes, I can try this... but the error I am having is on the line right before it where I am creating a spark variable.
Error is here -
# Imports
from pyspark.sql import SparkSession
# Create SparkSession
spark = SparkSession.builder \
.appName('SparkByExamples.com') \
.config("spark.jars", "/Users/u530241/spark3/spark-3.3.2-bin-hadoop3/jars/hive-jdbc-4.0.0-alpha-2-standalone.jar").enableHiveSupport().getOrCreate()
I will make a change here once it is solved -
df = spark.read \
.jdbc("jdbc:hive2://p-cdl-knox-prd-svc.hcscint.net:8443/default;transportMode=http;httpPath=gateway/default/hive;ssl=true;sslTrustStore=/Users/u530241/Downloads/gateway_prod_.jks;trustStorePassword=knoxprod;user=<username>;password=<password>",
"member_connect",
properties={"user": "u530241", "password": "..", "driver":"com.mysql.jdbc.Driver"})
Created 07-21-2023 08:40 AM
@cravani @DianaTorres Are you there? I am still facing issues while connecting
Created 07-17-2023 05:53 AM
I have the same issue. Any tips for solution?
Created 07-25-2023 12:47 PM
Created 03-20-2024 11:40 PM
I have followed the below steps to read the hive table from spark side with credential store:
FROM MYSQL
CREATE USER IF NOT EXISTS 'gopi'@'%' IDENTIFIED BY 'gopi';
GRANT ALL PRIVILEGES ON * . * TO 'gopi'@'%';
FLUSH PRIVILEGES;
create database if not exists test;
use test;
CREATE TABLE test.EMPLOYEE(
id INT,
name varchar(255),
salary DECIMAL,
dob DATE NOT NULL DEFAULT '2021-05-01',
doj TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP(),
PRIMARY KEY (id)
);
INSERT INTO test.EMPLOYEE (id, name, salary, dob, doj) VALUES (1, "gopi", 10000.00, '1988-06-01', '2020-03-16 09:00:01.000000');
INSERT INTO test.EMPLOYEE (id, name, salary,dob) VALUES (2, "Nishanth", 50000.00, '2018-05-29');
INSERT INTO test.EMPLOYEE (id, name, salary) VALUES (3, "Raja", 30000.00);
Create credential store:
sudo -u hive hadoop credential create gopi_user.password -v gopi -provider jceks://hdfs/user/hive/gopi_user.jceks
hadoop credential list -provider jceks://hdfs/user/hive/gopi_user.jceks
sudo -u hive hdfs dfs -chmod 400 /user/hive/gopi_user.jceks
FROM HIVE
USE db_test;
drop table if exists db_test.employee2;
CREATE EXTERNAL TABLE db_test.employee2(
id INT,
name STRING,
salary DOUBLE,
dob DATE,
doj TIMESTAMP
)
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
"hive.sql.database.type" = "MYSQL",
"hive.sql.jdbc.driver" = "com.mysql.cj.jdbc.Driver",
"hive.sql.jdbc.url" = "jdbc:mysql://ccycloud-1.nightly7x-us-gr.root.comops.site:3306/test",
"hive.sql.dbcp.username" = "gopi",
"hive.sql.dbcp.password.keystore" ="jceks://hdfs/user/hive/gopi_user.jceks",
"hive.sql.dbcp.password.key" = "gopi_user.password",
"hive.sql.query" = "select * from test.EMPLOYEE"
);
FROM SPARK
sudo -u hive spark-shell \
--jars /opt/cloudera/parcels/CDH/jars/hive-jdbc-handler-3.1.3000.7.2.18.0-622.jar,/usr/share/java/mysql-connector-java.jar
scala> spark.sql("SELECT * FROM db_test.employee1").show()
+---+--------+-------+----------+-------------------+
| id| name| salary| dob| doj|
+---+--------+-------+----------+-------------------+
| 1| gopi|10000.0|1988-06-01|2020-03-16 09:00:01|
| 2|Nishanth|50000.0|2018-05-29|2024-02-27 10:39:22|
| 3| Raja|30000.0|2021-05-01|2024-02-27 10:39:30|
+---+--------+-------+----------+-------------------+
scala> spark.sql("SELECT * FROM db_test.employee2").show()
+---+--------+-------+----------+-------------------+
| id| name| salary| dob| doj|
+---+--------+-------+----------+-------------------+
| 1| gopi|10000.0|1988-06-01|2020-03-16 09:00:01|
| 2|Nishanth|50000.0|2018-05-29|2024-02-27 10:39:22|
| 3| Raja|30000.0|2021-05-01|2024-02-27 10:39:30|
+---+--------+-------+----------+-------------------+