Created 05-23-2017 04:30 AM
Hello
I’m having Hortonworks HDP 2.6 installed together with the druid version that ships with it. I have installed Druid on a Kerberos secured cluster and are having problem accessing Druid from Hive. I can create a Druid datasource from hive using a normal “create table” statement, but I cant do a select on it. Once the datasource is created in Druid, I can do a normal json rest query directly to druid and I get the expected result back. So it feels like the Druid part is working as it should.
When I query the data from Hive, I get two different output saying the same thing.
Error: java.io.IOException: org.apache.hive.druid.com.fasterxml.jackson.core.JsonParseException: Invalid type marker byte 0x3c for expected value token at [Source: org.apache.hive.druid.com.metamx.http.client.io.AppendableByteArrayInputStream@245e6e5b; line: -1, column: 1] (state=,code=0)
or
Error: java.io.IOException: java.io.IOException: org.apache.hive.druid.com.fasterxml.jackson.core.JsonParseException: Unexpected character ('<' (code 60)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: org.apache.hive.druid.com.metamx.http.client.io.AppendableByteArrayInputStream@6dd4a729; line: 1, column: 2]
at org.apache.hive.druid.com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1576)
at org.apache.hive.druid.com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:533)
at org.apache.hive.druid.com.fasterxml.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:462)
at org.apache.hive.druid.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2610)
at org.apache.hive.druid.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:841)
at org.apache.hive.druid.com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:737)
at org.apache.hive.druid.com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3090)
at org.apache.hive.druid.com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3036)
at org.apache.hive.druid.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2199)
at org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.distributeSelectQuery(DruidQueryBasedInputFormat.java:227)
at org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.getInputSplits(DruidQueryBasedInputFormat.java:160)
at org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.getSplits(DruidQueryBasedInputFormat.java:104)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:372)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:304)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1932)
at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:482)
at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:311)
at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:856)
at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:552)
at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:715)
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1717)
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1702)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.thrift.server.TServlet.doPost(TServlet.java:83)
at org.apache.hive.service.cli.thrift.ThriftHttpServlet.doPost(ThriftHttpServlet.java:206)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:755)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:565)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:479)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:225)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1031)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:406)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:186)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:965)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
at org.eclipse.jetty.server.Server.handle(Server.java:349)
at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:449)
at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:925)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:76)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:609)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:45)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) (state=,code=0)
Does anybody have any idea about whats wrong here?
Created 05-23-2017 05:44 AM
An update!
I did a network package debug to see what druid is returning to Hive, and I actually get a authentication error (see output further down). Most likely this is due to Kerberos configured in the cluster. Strange that I can create the datasource from Hive without getting this error. Now I’m hoping that it’s just a configuration issue about setting the correct Hive settings to be able to connect to a Kerberos enabled Druid installation. If anybody got a hint, please let me know
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 401 </title>
</head>
<body>
<h2>HTTP ERROR: 401</h2>
<p>Problem accessing /druid/v2/. Reason:
<pre> Authentication required</pre></p>
<hr /><i><small>Powered by Jetty://</small></i>
</body>
</html>
Created 06-08-2017 11:16 PM
Hi seems like your druid instance is a secured cluster, unfortunately the hive druid integration does not support Kerberos yet, it is coming soon.
Created 06-08-2017 11:18 PM
To make sue i am getting this how do you query the data via rest call ? are you including your kerberos credentials ?
Created 06-12-2017 07:25 AM
Yes. When I make the REST call with curl, I include the Kerberos credentials and it work fine then.
To be able to go forward with the rest of the tests, I made a really REALLY ugly solution (not in production, just test environment). I added /druid to druid.hadoop.security.spnego.excludedPaths so the query works from Hive without hitting the unsupported Kerberos problem. Nothing I can recommend to do, but it gives us the possibility test the rest of the solution until the Kerberos problem is fixed.