Created 04-30-2024 06:54 AM
I'm trying to get a connection to Hive after enabling SSL. I can already connect with Kerberos, but it's the addition of SSL specifically that's giving me trouble.
The application I need to connect is using JDBC in Java. I don't have much control over the internals, just the connection string. I'm trying to get a working connection via JSQSH first as a proof of concept.
As a simplest possible case I've tried just logging into the cluster as the hive user and running `hive`. That's able to connect, and I can see the Beeline connection info:
Connecting to jdbc:hive2://cluster-head.domain.com:2181,cluster-standby.domain.com:2181,cluster-w-1.domain.com:2181/default;password=hive;principal=hive/_HOST@REALM.COM;serviceDiscoveryMode=zooKeeper;ssl=true;sslTrustStore=/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks;trustStorePassword=...;trustStoreType=jks;user=hive;zooKeeperNamespace=hiveserver2
24/04/30 06:40:03 [main]: INFO jdbc.HiveConnection: Connected to cluster-head.domain.com:10000
I can simplify this considerably and get a working connection via Beeline:
Connecting to jdbc:hive2://cluster-head.domain.com:10000/default;principal=hive/_HOST@DOMAIN.COM;ssl=true;sslTrustStore=/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks
Connected to: Apache Hive (version 3.1.3000.7.1.9.3-4)
Driver: Hive JDBC (version 3.1.3000.7.1.9.3-4)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.3000.7.1.9.3-4 by Apache Hive
0: jdbc:hive2://cluster-head.domain>
However, if I try from JSQSH - despite the fact that it's pulling the driver from the same path - I get a classpath error:
[hive@cluster-head ~]$ jsqsh -d hive2 -u "jdbc:hive2://cluster-head.domain.com:10000/default;\
principal=hive/_HOST@DOMAIN.COM;ssl=true;\
sslTrustStore=/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks"
Password:
2024-04-30 05:49:01,772 main ERROR Unable to locate appender "LOGFILE" for logger config "guru.springframework.blog.log4j2properties"
[DEBUG] 2024-04-30 05:49:01.980 [main] Utils - Resolved authority: cluster-head.domain.com:10000
[DEBUG] 2024-04-30 05:49:02.290 [main] MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, sampleName="Ops", about="", type=DEFAULT, value={"Rate of successful kerberos logins and latency (milliseconds)"}, valueName="Time")
[DEBUG] 2024-04-30 05:49:02.296 [main] MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, sampleName="Ops", about="", type=DEFAULT, value={"Rate of failed kerberos logins and latency (milliseconds)"}, valueName="Time")
[DEBUG] 2024-04-30 05:49:02.297 [main] MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, sampleName="Ops", about="", type=DEFAULT, value={"GetGroups"}, valueName="Time")
[DEBUG] 2024-04-30 05:49:02.297 [main] MutableMetricsFactory - field private org.apache.hadoop.metrics2.lib.MutableGaugeLong org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailuresTotal with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, sampleName="Ops", about="", type=DEFAULT, value={"Renewal failures since startup"}, valueName="Time")
[DEBUG] 2024-04-30 05:49:02.298 [main] MutableMetricsFactory - field private org.apache.hadoop.metrics2.lib.MutableGaugeInt org.apache.hadoop.security.UserGroupInformation$UgiMetrics.renewalFailures with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, sampleName="Ops", about="", type=DEFAULT, value={"Renewal failures since last successful login"}, valueName="Time")
[DEBUG] 2024-04-30 05:49:02.304 [main] MetricsSystemImpl - UgiMetrics, User and group related metrics
java.lang.NoClassDefFoundError: com/ctc/wstx/io/InputBootstrapper
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:314)
at org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1997)
at org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:743)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:693)
at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge.createClientWithConf(HadoopThriftAuthBridge.java:95)
at org.apache.hive.service.auth.KerberosSaslHelper.getKerberosTransport(KerberosSaslHelper.java:57)
at org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:841)
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:457)
at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:335)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107)
at org.sqsh.SQLDriverManager$DriverShim.connect(SQLDriverManager.java:136)
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:677)
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:189)
at org.sqsh.SQLDriverManager.connect(SQLDriverManager.java:660)
at org.sqsh.JSqsh.doConnect(JSqsh.java:535)
at org.sqsh.JSqsh.main(JSqsh.java:249)
Caused by: java.lang.ClassNotFoundException: com.ctc.wstx.io.InputBootstrapper
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
... 16 more
When I try from another machine, using the application I want to get working, I get certificate errors:
2024-04-24 08:40:35 | javax.ws.rs.ProcessingException: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
As mentioned, Kerberos-only connections work fine; I'm able to connect with URLs that look like "jdbc:hive2://cluster-head.domain.com:10000/default;principal=hive/_HOST@DOMAIN.COM". I'm trying to understand what the string should look like when SSL is also enabled, and what certs I need to import, or what other setup I need to do beforehand in order for connections to work.
Created 04-30-2024 07:40 AM
@MorganMcEvoy, Welcome to our community! To help you get the best possible answer, I have tagged in our Hive experts @asish @smruti @RAGHUY @nramanaiah who may be able to assist you further.
Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.
Regards,
Vidya Sargur,Created 04-30-2024 07:46 AM
Created on 04-30-2024 08:25 AM - edited 04-30-2024 08:47 AM
Thanks @tj2007. I've seen that article but I'm not clear on how to apply it. The application I'm using doesn't seem to use the Cloudera JDBC driver and I don't think I can add it. I note that the format of the options specified there doesn't resemble the strings used by Beehive, which seems to be the format my application uses as well. Is there a solution that I can use with the Hive/Beehive format?
I've tried using the Cloudera driver before and been unable to get it to work, previously due to driver/class errors, now with this format for new reasons:
[DEBUG] 2024-04-30 08:16:44.942 [main] TSaslTransport - opening transport org.apache.thrift.transport.TSaslClientTransport@2b9f74d0
[DEBUG] 2024-04-30 08:16:44.943 [main] TSaslClientTransport - Sending mechanism name PLAIN and initial response of length 10
[DEBUG] 2024-04-30 08:16:44.946 [main] TSaslTransport - CLIENT: Writing message with status START and payload length 5
[DEBUG] 2024-04-30 08:16:44.951 [main] TSaslTransport - CLIENT: Writing message with status COMPLETE and payload length 10
[DEBUG] 2024-04-30 08:16:44.952 [main] TSaslTransport - CLIENT: Start message handled
[DEBUG] 2024-04-30 08:16:44.952 [main] TSaslTransport - CLIENT: Main negotiation loop complete
[DEBUG] 2024-04-30 08:16:44.952 [main] TSaslTransport - CLIENT: SASL Client receiving last message
[WARN ] 2024-04-30 08:16:44.954 [main] HiveConnection - Failed to connect to...
Created 04-30-2024 09:01 AM
try adding to your existing connection string that you are using in JDBC:
SSL=1;SSLTrustStore=/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks
Created 04-30-2024 11:28 AM
@MorganMcEvoy We have two different issues here:
1. ClassNotFoundException: com.ctc.wstx.io.InputBootstrapper
This seems to be due to an incompatible version of the client, where which is using some hadoop libraries that are not compatible with your version of Hive.
2. SunCertPathBuilderException: unable to find valid certification path to requested target
This basically means, you need to specify the truststore file(that contains the root cert of the CA) in the connection string. In case the application you are using is equivalent to beeline, I think adding the following to the connection string should work:
;ssl=true;sslTrustStore=/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks;trustStorePassword=...
SSL=1 might not work because it's is used with Cloudera and Apache JDBC driver. In case you have tried this already and it still fails, could you share the error message?
Created 05-01-2024 03:15 AM
1. Agreed, I don't strictly need to get the jsqsh connection working, it'd just be nice to do so in order to test things more directly. I am confused by the client is somehow having compatibility issues despite using the deployed Hive version's own jars, though. (The classpath I'm using - the default in JSQSH - is ${HIVE_HOME}/lib:${HADOOP_HOME}, which works out to /opt/cloudera/parcels/CDH/lib/hive/lib:/opt/cloudera/parcels/CDH/lib/hadoop.) However, we can set that aside.
2. I agree that this seems like the root of the problem, but I can't determine what certs/truststores I need. I've tried copying the truststore that works on the CDP cluster itself, but it doesn't work from the remote host. I've tried importing all the certs I can find, but none seem to work. I'm probably missing something very obvious, but I don't know what.
So for example, I scp
/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks
from the CDP cluster to /tmp/cm-auto-global_truststore.jks on the remote agent. I can use keytool -list to confirm that I have the right password and can see the certs contained:
Keystore type: jks
Keystore provider: SUN
Your keystore contains 1 entry
cmrootca-0, 23-Apr-2024, trustedCertEntry,
Certificate fingerprint (SHA-256): D8:03:7A:38:7A:D7:B5:AC:A4:FC:78:5D:AB:5A:1B:9F:2D:2A:0E:96:FD:11:10:E4:D4:54:73:97:9A:0A:E9:71
I copy the Hive keytab and kinit and that's all good:
Default principal: hive/cluster-head.domain.com@REALM.COM
Valid starting Expires Service principal
01/05/24 02:51:38 02/05/24 02:51:38 krbtgt/REALM.COM@REALM.COM
renew until 06/05/24 02:51:38
So everything seems present and correct. However the connection attempts look like:
24/05/01 02:58:25 INFO jdbc.HiveConnection: Could not open client transport with JDBC Uri: jdbc:hive2://cluster-head.domain.com:10000/default;user=hive;password=hive;principal=hive/cluster-head.domain.com@REALM.COM;ssl=true;sslTrustStore=/tmp/cm-auto-global_truststore.jks;trustStorePassword=...
24/05/01 02:58:25 INFO jdbc.HiveConnection: Transport Used for JDBC connection: null
2024-05-01 02:55:06 | javax.ws.rs.ProcessingException: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
Are there more or different certs I need to be importing on the remote agent, beyond what's in the keystore that works (in Beehive) when connecting from the same machine?
Created 05-01-2024 05:57 AM
@MorganMcEvoy what is cluster-head.domain.com? Is this a load balancer or an individual HS2 node?
Also, what's the client tool you are using? Is it possible that it is not honoring the sslTrustStore parameter?
A workaround would be to import the root ca cert into the default java truststore in the client machine. ref: https://stackoverflow.com/questions/11700132/how-to-import-a-jks-certificate-in-java-trust-store