Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Cannot connect 'Streaming Data Ingest' to secured Hive instance

avatar
New Contributor

I've downloaded the Hortonworks Sandbox 2.4 to develop some tools locally on my machine. One of the first things I want to do is load data into Hive. I've first tried to the regular JDBC connector, which worked but was way to slow.

When doing this I ran across the first interesting issue: the sandbox has authentication enabled and controlled by Ranger. So when I connect using beeline and the URL jdbc:hive2://localhost:10000 I was asked for username and password. However, when connecting from Java, this was not required and could read and insert data. Can someone explain this?

public DataSource dataSource() {
	return new SimpleDriverDataSource(new HiveDriver(), "jdbc:hive2://localhost:10000/variantdatabase");
}

Then I learned about the streaming API which seemed a better alternative for loading lot's of data into Hive ( regular load file doesn't work for me ). So I started following this article: https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest#StreamingDataIngest-Streaming... .

Relevant code:

HiveEndPoint hiveEP = new HiveEndPoint("hive2://localhost:10000", "variantdatabase", "variant", null);this.connection = hiveEP.newConnection(true);

However, connecting takes ages, and after a while I get the following message in the client:

17:03:47.742 [main] INFO  org.apache.hive.jdbc.HiveConnection - Will try to open client transport with JDBC Uri: jdbc:hive2://localhost:10000/variantdatabase
17:03:48.518 [main] DEBUG o.a.h.h.streaming.HiveEndPoint - Overriding HiveConf setting : hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
17:03:48.519 [main] DEBUG o.a.h.h.streaming.HiveEndPoint - Overriding HiveConf setting : hive.support.concurrency = true
17:03:48.519 [main] DEBUG o.a.h.h.streaming.HiveEndPoint - Overriding HiveConf setting : hive.metastore.execute.setugi = true
17:03:48.519 [main] DEBUG o.a.h.h.streaming.HiveEndPoint - Overriding HiveConf setting : hive.execution.engine = mr
17:03:48.706 [main] WARN  o.a.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17:03:48.735 [main] INFO  hive.metastore - Trying to connect to metastore with URI hive2://localhost:10000
17:13:48.814 [main] WARN  hive.metastore - set_ugi() not successful, Likely cause: new client talking to old server. Continuing without it.
org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) ~[hive-exec-1.2.1.jar:1.2.1]

When I look in the server log it says something about SASL, but I don't understand why, because JDBC didn't need it? And where I can define any username/password?

Caused by: org.apache.thrift.transport.TTransportException: Invalid status -128
        at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232)
        at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:184)
        at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
1 ACCEPTED SOLUTION

avatar
New Contributor

Ok, so the solution is quite simple here, I tried to connect to the Hive2Server that was running on port 10000 whereas I actually should have connected to the metastore which is running on port 9083 .

hive.server2.authentication is set to NONE and not to NOSASL.

View solution in original post

3 REPLIES 3

avatar
Expert Contributor

@Steven Castelein Hiveserver2 in secure environment by default Authentication mode uses plain SASL.

You can disable it either by setting in In hive-site.xml: hive.server2.authentication= 'NOSASL'

Or To Use SASL: (https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-Integrity/ConfidentialityProtection)

Integrity/Confidentiality Protection

Integrity protection and confidentiality protection (beyond just the default of authentication) for communication between the Hive JDBC driver and HiveServer2 are enabled (Hive 0.12 onward, see HIVE-4911). You can use the SASL QOP property to configure this.

  • This is only when Kerberos is used for the HS2 client (JDBC/ODBC application) authentication with HiveServer2.
  • hive.server2.thrift.sasl.qop in hive-site.xml has to be set to one of the valid QOP values ('auth', 'auth-int' or 'auth-conf').

You can connect via below url

jdbc:hive2://<m/c HS2>:10001/default;principal=<hive princiapl>?transportMode=http;httpPath=cliservice;auth=kerberos;sasl.qop=auth-int (if auth-int is set)

avatar
New Contributor

Thanks for your response but I still can't get it to work. I tried setting the NOSASL value for the hive.server2.authentication property. Now the following happens:

* Connecting via beeline fails, I'm getting asked for a username / password, but the one I used successfully before now doesn't work.

* I cannot open a JDBC connection anymore:

Caused by: java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000/variantdatabase: null
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:231)Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)

Server log:

2016-06-16 13:12:39,018 ERROR [HiveServer2-Handler-Pool: Thread-32]: server.TThreadPoolServer (TThreadPoolServer.java:run(294)) - Thrift error occurred during processing of message.
org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client?
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:228)

Why is everything so counter-intuitive? If I have authentication enabled, JDBC works without specifying any credentials, if disabled it doesn't? Why can't I just specify username/password for a connection from the streaming digest code using the HiveEndPoint constructor or newConnection() method. Also for beeline, I have authentication disabled, but still get asked for a username / password?

avatar
New Contributor

Ok, so the solution is quite simple here, I tried to connect to the Hive2Server that was running on port 10000 whereas I actually should have connected to the metastore which is running on port 9083 .

hive.server2.authentication is set to NONE and not to NOSASL.