Support Questions

Find answers, ask questions, and share your expertise

Can't connect via JDBC to Hiveserver2 with Kerberos+SSL

avatar
New Contributor

I'm trying to get a connection to Hive after enabling SSL. I can already connect with Kerberos, but it's the addition of SSL specifically that's giving me trouble.

The application I need to connect is using JDBC in Java. I don't have much control over the internals, just the connection string. I'm trying to get a working connection via JSQSH first as a proof of concept.

As a simplest possible case I've tried just logging into the cluster as the hive user and running `hive`. That's able to connect, and I can see the Beeline connection info:

Connecting to jdbc:hive2://cluster-head.domain.com:2181,cluster-standby.domain.com:2181,cluster-w-1.domain.com:2181/default;password=hive;principal=hive/_HOST@REALM.COM;serviceDiscoveryMode=zooKeeper;ssl=true;sslTrustStore=/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks;trustStorePassword=...;trustStoreType=jks;user=hive;zooKeeperNamespace=hiveserver2
24/04/30 06:40:03 [main]: INFO jdbc.HiveConnection: Connected to cluster-head.domain.com:10000

I can simplify this considerably and get a working connection via Beeline:

Connecting to jdbc:hive2://cluster-head.domain.com:10000/default;principal=hive/_HOST@DOMAIN.COM;ssl=true;sslTrustStore=/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks
Connected to: Apache Hive (version 3.1.3000.7.1.9.3-4)
Driver: Hive JDBC (version 3.1.3000.7.1.9.3-4)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.3000.7.1.9.3-4 by Apache Hive
0: jdbc:hive2://cluster-head.domain>

However, if I try from JSQSH - despite the fact that it's pulling the driver from the same path - I get a classpath error:

[hive@cluster-head ~]$ jsqsh -d hive2 -u "jdbc:hive2://cluster-head.domain.com:10000/default;\
principal=hive/_HOST@DOMAIN.COM;ssl=true;\
sslTrustStore=/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks"
Password:
2024-04-30 05:49:01,772 main ERROR Unable to locate appender "LOGFILE" for logger config "guru.springframework.blog.log4j2properties"
[DEBUG] 2024-04-30 05:49:01.980 [main] Utils - Resolved authority: cluster-head.domain.com:10000
[DEBUG] 2024-04-30 05:49:02.290 [main] MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, sampleName="Ops", about="", type=DEFAULT, value={"Rate of successful kerberos logins and latency (milliseconds)"}, valueName="Time")
[DEBUG] 2024-04-30 05:49:02.296 [main] MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, sampleName="Ops", about="", type=DEFAULT, value={"Rate of failed kerberos logins and latency (milliseconds)"}, valueName="Time")
[DEBUG] 2024-04-30 05:49:02.297 [main] MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, sampleName="Ops", about="", type=DEFAULT, value={"GetGroups"}, valueName="Time")
[DEBUG] 2024-04-30 05:49:02.297 [main] MutableMetricsFactory - field private org.apache.hadoop.metrics2.lib.MutableGaugeLong org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailuresTotal with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, sampleName="Ops", about="", type=DEFAULT, value={"Renewal failures since startup"}, valueName="Time")
[DEBUG] 2024-04-30 05:49:02.298 [main] MutableMetricsFactory - field private org.apache.hadoop.metrics2.lib.MutableGaugeInt org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailures with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, sampleName="Ops", about="", type=DEFAULT, value={"Renewal failures since last successful login"}, valueName="Time")
[DEBUG] 2024-04-30 05:49:02.304 [main] MetricsSystemImpl - UgiMetrics, User and group related metrics
java.lang.NoClassDefFoundError: com/ctc/wstx/io/InputBootstrapper
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:314)
at org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1997)
at org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:743)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:693)
at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge.createClientWithConf(HadoopThriftAuthBridge.java:95)
at org.apache.hive.service.auth.KerberosSaslHelper.getKerberosTransport(KerberosSaslHelper.java:57)
at org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:841)
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:457)
at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:335)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107)
at org.sqsh.SQLDriverManager$DriverShim.connect(SQLDriverManager.java:136)
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:677)
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:189)
at org.sqsh.SQLDriverManager.connect(SQLDriverManager.java:660)
at org.sqsh.JSqsh.doConnect(JSqsh.java:535)
at org.sqsh.JSqsh.main(JSqsh.java:249)
Caused by: java.lang.ClassNotFoundException: com.ctc.wstx.io.InputBootstrapper
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
... 16 more

When I try from another machine, using the application I want to get working, I get certificate errors:

2024-04-24 08:40:35 | javax.ws.rs.ProcessingException: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

As mentioned, Kerberos-only connections work fine; I'm able to connect with URLs that look like "jdbc:hive2://cluster-head.domain.com:10000/default;principal=hive/_HOST@DOMAIN.COM". I'm trying to understand what the string should look like when SSL is also enabled, and what certs I need to import, or what other setup I need to do beforehand in order for connections to work.

7 REPLIES 7

avatar
Community Manager

@MorganMcEvoy, Welcome to our community! To help you get the best possible answer, I have tagged in our Hive experts   @asish @smruti @RAGHUY @nramanaiah who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Super Collaborator

avatar
New Contributor

Thanks @tj2007. I've seen that article but I'm not clear on how to apply it. The application I'm using doesn't seem to use the Cloudera JDBC driver and I don't think I can add it. I note that the format of the options specified there doesn't resemble the strings used by Beehive, which seems to be the format my application uses as well. Is there a solution that I can use with the Hive/Beehive format?

I've tried using the Cloudera driver before and been unable to get it to work, previously due to driver/class errors, now with this format for new reasons:

[DEBUG] 2024-04-30 08:16:44.942 [main] TSaslTransport - opening transport org.apache.thrift.transport.TSaslClientTransport@2b9f74d0
[DEBUG] 2024-04-30 08:16:44.943 [main] TSaslClientTransport - Sending mechanism name PLAIN and initial response of length 10
[DEBUG] 2024-04-30 08:16:44.946 [main] TSaslTransport - CLIENT: Writing message with status START and payload length 5
[DEBUG] 2024-04-30 08:16:44.951 [main] TSaslTransport - CLIENT: Writing message with status COMPLETE and payload length 10
[DEBUG] 2024-04-30 08:16:44.952 [main] TSaslTransport - CLIENT: Start message handled
[DEBUG] 2024-04-30 08:16:44.952 [main] TSaslTransport - CLIENT: Main negotiation loop complete
[DEBUG] 2024-04-30 08:16:44.952 [main] TSaslTransport - CLIENT: SASL Client receiving last message
[WARN ] 2024-04-30 08:16:44.954 [main] HiveConnection - Failed to connect to...

avatar
Super Collaborator

try adding to your existing connection string that you are using in JDBC:

SSL=1;SSLTrustStore=/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks

avatar
Master Collaborator

@MorganMcEvoy We have two different issues here:

1. ClassNotFoundException: com.ctc.wstx.io.InputBootstrapper

This seems to be due to an incompatible version of the client, where which is using some hadoop libraries that are not compatible with your version of Hive.

2. SunCertPathBuilderException: unable to find valid certification path to requested target

This basically means, you need to specify the truststore file(that contains the root cert of the CA) in the connection string. In case the application you are using is equivalent to beeline, I think adding the following to the connection string should work:

;ssl=true;sslTrustStore=/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks;trustStorePassword=...

SSL=1 might not work because it's is used with Cloudera and Apache JDBC driver. In case you have tried this already and it still fails, could you share the error message?

avatar
New Contributor

@smruti:

1. Agreed, I don't strictly need to get the jsqsh connection working, it'd just be nice to do so in order to test things more directly. I am confused by the client is somehow having compatibility issues despite using the deployed Hive version's own jars, though. (The classpath I'm using - the default in JSQSH - is ${HIVE_HOME}/lib:${HADOOP_HOME}, which works out to /opt/cloudera/parcels/CDH/lib/hive/lib:/opt/cloudera/parcels/CDH/lib/hadoop.) However, we can set that aside.

2. I agree that this seems like the root of the problem, but I can't determine what certs/truststores I need. I've tried copying the truststore that works on the CDP cluster itself, but it doesn't work from the remote host. I've tried importing all the certs I can find, but none seem to work. I'm probably missing something very obvious, but I don't know what.

So for example, I scp

/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks

from the CDP cluster to /tmp/cm-auto-global_truststore.jks on the remote agent. I can use keytool -list to confirm that I have the right password and can see the certs contained:

Keystore type: jks 
Keystore provider: SUN

Your keystore contains 1 entry

cmrootca-0, 23-Apr-2024, trustedCertEntry,
Certificate fingerprint (SHA-256): D8:03:7A:38:7A:D7:B5:AC:A4:FC:78:5D:AB:5A:1B:9F:2D:2A:0E:96:FD:11:10:E4:D4:54:73:97:9A:0A:E9:71

I copy the Hive keytab and kinit and that's all good:

Default principal: hive/cluster-head.domain.com@REALM.COM

Valid starting Expires Service principal
01/05/24 02:51:38 02/05/24 02:51:38 krbtgt/REALM.COM@REALM.COM
renew until 06/05/24 02:51:38

So everything seems present and correct. However the connection attempts look like:

24/05/01 02:58:25 INFO jdbc.HiveConnection: Could not open client transport with JDBC Uri: jdbc:hive2://cluster-head.domain.com:10000/default;user=hive;password=hive;principal=hive/cluster-head.domain.com@REALM.COM;ssl=true;sslTrustStore=/tmp/cm-auto-global_truststore.jks;trustStorePassword=...
24/05/01 02:58:25 INFO jdbc.HiveConnection: Transport Used for JDBC connection: null

2024-05-01 02:55:06 | javax.ws.rs.ProcessingException: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

Are there more or different certs I need to be importing on the remote agent, beyond what's in the keystore that works (in Beehive) when connecting from the same machine?

avatar
Master Collaborator

@MorganMcEvoy what is cluster-head.domain.com? Is this a load balancer or an individual HS2 node?

Also, what's the client tool you are using?  Is it possible that it is not honoring the sslTrustStore parameter? 

A workaround would be to import the root ca cert into the default java truststore in the client machine. ref: https://stackoverflow.com/questions/11700132/how-to-import-a-jks-certificate-in-java-trust-store