Support Questions

pranav007 · ‎07-14-2023

Hello,

I am trying to connect to a remote hive cluster which requires kerberos, using spark mysql connector

```

# Imports
from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder \
.appName('SparkByExamples.com') \
.config("spark.jars", "/Users/u530241/spark3/spark-3.3.2-bin-hadoop3/jars/mysql-connector-java-5.1.45-bin.jar").enableHiveSupport().getOrCreate()

df = spark.read \
.jdbc("jdbc:hive2://p-cdl-knox-prd-svc.hcscint.net:8443/default;transportMode=http;httpPath=gateway/default/hive;ssl=true;sslTrustStore=/Users/u530241/Downloads/gateway_prod_.jks;trustStorePassword=knoxprod",
"member_connect",
properties={"user": "u530241", "password": "..", "driver":"com.mysql.jdbc.Driver"})

```

I keep getting this error -
```
IllegalArgumentException: requirement failed: The driver could not open a JDBC connection. Check the URL: jdbc:hive2://p-cdl-knox-prd-svc.hcscint.net:8443/default;transportMode=http;httpPath=gateway/default/hive;ssl=true;sslTrustStore=/Users/u530241/Downloads/gateway_prod_.jks;trustStorePassword=knoxprod
```

Could you please give me some suggestions?

DianaTorres · ‎07-14-2023

@pranav007 Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our Hive expert @cravani who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.

Regards,

Diana Torres,
Senior Community Moderator

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

cravani · ‎07-14-2023

You seem to be using wrong driver to connect to HiveServer2. MySQL JDBC driver does not work to connect to HS2.

You would need Apache Hive JDBC Driver or Cloudera Hive JDBC Driver to connect to Hive. You can download latest Apache Hive JDBC Driver from: https://repo1.maven.org/maven2/org/apache/hive/hive-jdbc/4.0.0-alpha-2/

File: hive-jdbc-4.0.0-alpha-2-standalone.jar

Add the same jar to your spark.jars.

pranav007 · ‎07-14-2023

Hi, I changed the file and added it to the jars.

# Imports
from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder \
.appName('SparkByExamples.com') \
.config("spark.jars", "/Users/u530241/spark3/spark-3.3.2-bin-hadoop3/jars/hive-jdbc-4.0.0-alpha-2-standalone.jar").enableHiveSupport().getOrCreate()

I am not getting the error -

```

Exception in thread "main" java.lang.IllegalArgumentException: Can't get Kerberos realm
	at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71)
	at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315)
	at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
	at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:575)
	at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:3746)
	at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:3736)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3520)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
	at org.apache.spark.util.DependencyUtils$.resolveGlobPath(DependencyUtils.scala:317)
	at org.apache.spark.util.DependencyUtils$.$anonfun$resolveGlobPaths$2(DependencyUtils.scala:273)
	at org.apache.spark.util.DependencyUtils$.$anonfun$resolveGlobPaths$2$adapted(DependencyUtils.scala:271)
	at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
	at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
	at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
	at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
	at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
	at org.apache.spark.util.DependencyUtils$.resolveGlobPaths(DependencyUtils.scala:271)
	at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$4(SparkSubmit.scala:364)
	at scala.Option.map(Option.scala:230)
	at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:364)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:901)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading failed
	at java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.<init>(KerberosPrincipal.java:179)
	at org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120)
	at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69)
	... 28 more

RuntimeError: Java gateway process exited before sending its port number

```

cravani · ‎07-15-2023

Are you sure that you are trying to connect using Kerberos to Hive? It appears to me that you are trying to connect to Hive over Knox which may have LDAP configured instead of Kerberos?

Can you try below? by replacing your username and password

df = spark.read \
.jdbc("jdbc:hive2://p-cdl-knox-prd-svc.hcscint.net:8443/default;transportMode=http;httpPath=gateway/default/hive;ssl=true;sslTrustStore=/Users/u530241/Downloads/gateway_prod_.jks;trustStorePassword=knoxprod;user=<username>;password=<password>")

pranav007 · ‎07-17-2023

Yes, I can try this... but the error I am having is on the line right before it where I am creating a spark variable.

Error is here -

# Imports
from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder \
.appName('SparkByExamples.com') \
.config("spark.jars", "/Users/u530241/spark3/spark-3.3.2-bin-hadoop3/jars/hive-jdbc-4.0.0-alpha-2-standalone.jar").enableHiveSupport().getOrCreate()

I will make a change here once it is solved -

df = spark.read \
.jdbc("jdbc:hive2://p-cdl-knox-prd-svc.hcscint.net:8443/default;transportMode=http;httpPath=gateway/default/hive;ssl=true;sslTrustStore=/Users/u530241/Downloads/gateway_prod_.jks;trustStorePassword=knoxprod;user=<username>;password=<password>",
"member_connect",
properties={"user": "u530241", "password": "..", "driver":"com.mysql.jdbc.Driver"})

pranav007 · ‎07-21-2023

@cravani @DianaTorres Are you there? I am still facing issues while connecting

vrooktreat · ‎07-17-2023

I have the same issue. Any tips for solution?

DianaTorres · ‎07-25-2023

@Bharati @Shmoo Do you have any insights here? Thanks!

Regards,

Diana Torres,
Senior Community Moderator

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

ggangadharan · ‎03-20-2024

I have followed the below steps to read the hive table from spark side with credential store:

FROM MYSQL

CREATE USER IF NOT EXISTS 'gopi'@'%' IDENTIFIED BY 'gopi';
GRANT ALL PRIVILEGES ON * . * TO 'gopi'@'%';
FLUSH PRIVILEGES;

create database if not exists test;
use test;

CREATE TABLE test.EMPLOYEE(
        id INT,
        name varchar(255),
        salary DECIMAL,
        dob DATE NOT NULL DEFAULT '2021-05-01',
        doj TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP(),
        PRIMARY KEY (id)
);

INSERT INTO test.EMPLOYEE (id, name, salary, dob, doj) VALUES (1, "gopi", 10000.00, '1988-06-01', '2020-03-16 09:00:01.000000');
INSERT INTO test.EMPLOYEE (id, name, salary,dob) VALUES (2, "Nishanth", 50000.00, '2018-05-29');
INSERT INTO test.EMPLOYEE (id, name, salary) VALUES (3, "Raja", 30000.00);

Create credential store:

sudo -u hive hadoop credential create gopi_user.password -v gopi -provider jceks://hdfs/user/hive/gopi_user.jceks
hadoop credential list -provider jceks://hdfs/user/hive/gopi_user.jceks
sudo -u hive hdfs dfs -chmod 400 /user/hive/gopi_user.jceks

FROM HIVE

USE db_test;
drop table if exists db_test.employee2;

CREATE EXTERNAL TABLE db_test.employee2(
  id INT,
  name STRING,
  salary DOUBLE,
  dob DATE,
  doj TIMESTAMP
)
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
  "hive.sql.database.type" = "MYSQL",
  "hive.sql.jdbc.driver" = "com.mysql.cj.jdbc.Driver",
  "hive.sql.jdbc.url" = "jdbc:mysql://ccycloud-1.nightly7x-us-gr.root.comops.site:3306/test",
  "hive.sql.dbcp.username" = "gopi",
  "hive.sql.dbcp.password.keystore"  ="jceks://hdfs/user/hive/gopi_user.jceks",
  "hive.sql.dbcp.password.key" = "gopi_user.password",
  "hive.sql.query" = "select * from test.EMPLOYEE"
);

FROM SPARK

sudo -u hive spark-shell \
  --jars /opt/cloudera/parcels/CDH/jars/hive-jdbc-handler-3.1.3000.7.2.18.0-622.jar,/usr/share/java/mysql-connector-java.jar

scala> spark.sql("SELECT * FROM db_test.employee1").show()

+---+--------+-------+----------+-------------------+                           
| id|    name| salary|       dob|                doj|
+---+--------+-------+----------+-------------------+
|  1|   gopi|10000.0|1988-06-01|2020-03-16 09:00:01|
|  2|Nishanth|50000.0|2018-05-29|2024-02-27 10:39:22|
|  3|    Raja|30000.0|2021-05-01|2024-02-27 10:39:30|
+---+--------+-------+----------+-------------------+


scala> spark.sql("SELECT * FROM db_test.employee2").show()
+---+--------+-------+----------+-------------------+                           
| id|    name| salary|       dob|                doj|
+---+--------+-------+----------+-------------------+
|  1|   gopi|10000.0|1988-06-01|2020-03-16 09:00:01|
|  2|Nishanth|50000.0|2018-05-29|2024-02-27 10:39:22|
|  3|    Raja|30000.0|2021-05-01|2024-02-27 10:39:30|
+---+--------+-------+----------+-------------------+

Support Questions

Connecting to Hive using local spark jdbc