Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Connecting to Hive using local spark jdbc

avatar
Explorer

 

Hello,

 

I am trying to connect to a remote hive cluster which requires kerberos, using spark mysql connector 

```

# Imports
from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder \
.appName('SparkByExamples.com') \
.config("spark.jars", "/Users/u530241/spark3/spark-3.3.2-bin-hadoop3/jars/mysql-connector-java-5.1.45-bin.jar").enableHiveSupport().getOrCreate()

 

df = spark.read \
.jdbc("jdbc:hive2://p-cdl-knox-prd-svc.hcscint.net:8443/default;transportMode=http;httpPath=gateway/default/hive;ssl=true;sslTrustStore=/Users/u530241/Downloads/gateway_prod_.jks;trustStorePassword=knoxprod",
"member_connect",
properties={"user": "u530241", "password": "..", "driver":"com.mysql.jdbc.Driver"})

```


I keep getting this error - 
```
IllegalArgumentException: requirement failed: The driver could not open a JDBC connection. Check the URL: jdbc:hive2://p-cdl-knox-prd-svc.hcscint.net:8443/default;transportMode=http;httpPath=gateway/default/hive;ssl=true;sslTrustStore=/Users/u530241/Downloads/gateway_prod_.jks;trustStorePassword=knoxprod
```

 

Could you please give me some suggestions?

10 REPLIES 10

avatar
Community Manager

@pranav007 Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our Hive expert @cravani  who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Super Collaborator

You seem to be using wrong driver to connect to HiveServer2. MySQL JDBC driver does not work to connect to HS2.

 

You would need Apache Hive JDBC Driver or Cloudera Hive JDBC Driver to connect to Hive. You can download latest Apache Hive JDBC Driver from: https://repo1.maven.org/maven2/org/apache/hive/hive-jdbc/4.0.0-alpha-2/

File: hive-jdbc-4.0.0-alpha-2-standalone.jar 

 

Add the same jar to your spark.jars.

avatar
Explorer

Hi, I changed the file and added it to the jars.

# Imports
from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder \
.appName('SparkByExamples.com') \
.config("spark.jars", "/Users/u530241/spark3/spark-3.3.2-bin-hadoop3/jars/hive-jdbc-4.0.0-alpha-2-standalone.jar").enableHiveSupport().getOrCreate()



I am not getting the error - 


```

Exception in thread "main" java.lang.IllegalArgumentException: Can't get Kerberos realm
	at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71)
	at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315)
	at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
	at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:575)
	at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:3746)
	at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:3736)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3520)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
	at org.apache.spark.util.DependencyUtils$.resolveGlobPath(DependencyUtils.scala:317)
	at org.apache.spark.util.DependencyUtils$.$anonfun$resolveGlobPaths$2(DependencyUtils.scala:273)
	at org.apache.spark.util.DependencyUtils$.$anonfun$resolveGlobPaths$2$adapted(DependencyUtils.scala:271)
	at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
	at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
	at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
	at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
	at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
	at org.apache.spark.util.DependencyUtils$.resolveGlobPaths(DependencyUtils.scala:271)
	at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$4(SparkSubmit.scala:364)
	at scala.Option.map(Option.scala:230)
	at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:364)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:901)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading failed
	at java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.<init>(KerberosPrincipal.java:179)
	at org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120)
	at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69)
	... 28 more

RuntimeError: Java gateway process exited before sending its port number

 


```

avatar
Super Collaborator

Are you sure that you are trying to connect using Kerberos to Hive? It appears to me that you are trying to connect to Hive over Knox which may have LDAP configured instead of Kerberos?

 

Can you try below? by replacing your username and password

 

df = spark.read \
.jdbc("jdbc:hive2://p-cdl-knox-prd-svc.hcscint.net:8443/default;transportMode=http;httpPath=gateway/default/hive;ssl=true;sslTrustStore=/Users/u530241/Downloads/gateway_prod_.jks;trustStorePassword=knoxprod;user=<username>;password=<password>")

 

 

avatar
Explorer

Yes, I can try this... but the error I am having is on the line right before it where I am creating a spark variable.

Error is here -

# Imports
from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder \
.appName('SparkByExamples.com') \
.config("spark.jars", "/Users/u530241/spark3/spark-3.3.2-bin-hadoop3/jars/hive-jdbc-4.0.0-alpha-2-standalone.jar").enableHiveSupport().getOrCreate()

 

I will make a change here once it is solved - 

df = spark.read \
.jdbc("jdbc:hive2://p-cdl-knox-prd-svc.hcscint.net:8443/default;transportMode=http;httpPath=gateway/default/hive;ssl=true;sslTrustStore=/Users/u530241/Downloads/gateway_prod_.jks;trustStorePassword=knoxprod;user=<username>;password=<password>",
"member_connect",
properties={"user": "u530241", "password": "..", "driver":"com.mysql.jdbc.Driver"})

avatar
Explorer

@cravani @DianaTorres Are you there? I am still facing issues while connecting 

avatar
New Contributor

I have the same issue. Any tips for solution?

avatar
Community Manager

@Bharati @Shmoo Do you have any insights here? Thanks!


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Expert Contributor

I have followed the below steps to read the hive table from spark side with credential store:

FROM MYSQL 

CREATE USER IF NOT EXISTS 'gopi'@'%' IDENTIFIED BY 'gopi';
GRANT ALL PRIVILEGES ON * . * TO 'gopi'@'%';
FLUSH PRIVILEGES;

create database if not exists test;
use test;

CREATE TABLE test.EMPLOYEE(
        id INT,
        name varchar(255),
        salary DECIMAL,
        dob DATE NOT NULL DEFAULT '2021-05-01',
        doj TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP(),
        PRIMARY KEY (id)
);

INSERT INTO test.EMPLOYEE (id, name, salary, dob, doj) VALUES (1, "gopi", 10000.00, '1988-06-01', '2020-03-16 09:00:01.000000');
INSERT INTO test.EMPLOYEE (id, name, salary,dob) VALUES (2, "Nishanth", 50000.00, '2018-05-29');
INSERT INTO test.EMPLOYEE (id, name, salary) VALUES (3, "Raja", 30000.00);

Create credential store:

sudo -u hive hadoop credential create gopi_user.password -v gopi -provider jceks://hdfs/user/hive/gopi_user.jceks
hadoop credential list -provider jceks://hdfs/user/hive/gopi_user.jceks
sudo -u hive hdfs dfs -chmod 400 /user/hive/gopi_user.jceks

FROM HIVE

USE db_test;
drop table if exists db_test.employee2;

CREATE EXTERNAL TABLE db_test.employee2(
  id INT,
  name STRING,
  salary DOUBLE,
  dob DATE,
  doj TIMESTAMP
)
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
  "hive.sql.database.type" = "MYSQL",
  "hive.sql.jdbc.driver" = "com.mysql.cj.jdbc.Driver",
  "hive.sql.jdbc.url" = "jdbc:mysql://ccycloud-1.nightly7x-us-gr.root.comops.site:3306/test",
  "hive.sql.dbcp.username" = "gopi",
  "hive.sql.dbcp.password.keystore"  ="jceks://hdfs/user/hive/gopi_user.jceks",
  "hive.sql.dbcp.password.key" = "gopi_user.password",
  "hive.sql.query" = "select * from test.EMPLOYEE"
);

FROM SPARK

sudo -u hive spark-shell \
  --jars /opt/cloudera/parcels/CDH/jars/hive-jdbc-handler-3.1.3000.7.2.18.0-622.jar,/usr/share/java/mysql-connector-java.jar

scala> spark.sql("SELECT * FROM db_test.employee1").show()

+---+--------+-------+----------+-------------------+                           
| id|    name| salary|       dob|                doj|
+---+--------+-------+----------+-------------------+
|  1|   gopi|10000.0|1988-06-01|2020-03-16 09:00:01|
|  2|Nishanth|50000.0|2018-05-29|2024-02-27 10:39:22|
|  3|    Raja|30000.0|2021-05-01|2024-02-27 10:39:30|
+---+--------+-------+----------+-------------------+


scala> spark.sql("SELECT * FROM db_test.employee2").show()
+---+--------+-------+----------+-------------------+                           
| id|    name| salary|       dob|                doj|
+---+--------+-------+----------+-------------------+
|  1|   gopi|10000.0|1988-06-01|2020-03-16 09:00:01|
|  2|Nishanth|50000.0|2018-05-29|2024-02-27 10:39:22|
|  3|    Raja|30000.0|2021-05-01|2024-02-27 10:39:30|
+---+--------+-------+----------+-------------------+