Created on 05-23-2017 10:24 AM - edited 09-16-2022 04:38 AM
I’m struggling with getting Hive to work with Druid. So far, I’ve got a working connection between the two and I can create datasources in Druid from Hive, and I can also (with some workaround) query the data from Hive. But not using the “avg” aggregation on one of the measures. If I do an explain on the query in hive, I get the following Json (after making it a bit easier to read)
{ "queryType":"select", "dataSource":"druid_sqoop_import_02", "descending":false, "intervals":[ "1900-01-01T00:00:00.000/3000-01-01T00:00:00.000" ], "filter":{ "type":"selector", "dimension":"target_database", "value":"ABC" }, "dimensions":[ "import_type", "target_database", "target_tblname", "source_database", "source_tblschema", "source_tblname" ], "metrics":[ "sqoop_duration", "merge_duration", "concatenate_duration", "sqoop_mappers", "sqoop_rows", "sqoop_size" ], "granularity":"all", "pagingSpec":{ "threshold":16384, "fromNext":true }, "context":{ "druid.query.fetch":false } }
If I run a normal Json REST call to druid, this query returns the values correctly. But from Hive, I get the following error.
java.io.IOException: org.apache.hive.druid.com.fasterxml.jackson.core.JsonParseException: Unexpected character ('<' (code 60)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
If I look at the actual error message returned from the Druid Broker, it says the following.
<html> <head> <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/> <title>Error 500 </title> </head> <body> <h2>HTTP ERROR: 500</h2> <p>Problem accessing /druid/v2/datasources/druid_sqoop_import_02/candidates. Reason: <pre> java.lang.IllegalArgumentException: Invalid format: "1900-01-01T00:00:00.000 01:00" is malformed at " 01:00"</pre></p> <hr /><i><small>Powered by Jetty://</small></i> </body> </html>
So for some reason, it’s trying to add what I guess is the timezone as I’m living in UTC+1 to the end of the timestamp. I have also tried to do a where on __time in Hive, but it still adds the “01:00” at the end. Have anybody else had these problems, or can someone give me some pointers on how to solve this one?
Created 05-24-2017 06:29 AM
An update
What Hive is doing is calling the /druid/v2/datasources/<DATASOURCE_NAME>/candidates?intervals=1900-01-01T00:00:00.000+01:00/3000-01-01T00:00:00.000+01:00 interface to get a list of candidates. According to the broker documentation (http://druid.io/docs/latest/design/broker.html), the intervals should support ISO8601 standard for timestamps and that includes the “+00:01” at the end for timezone.
When I call the Broker directly with a simple curl command, I get the following response back from the broker.
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 500 </title>
</head>
<body>
<h2>HTTP ERROR: 500</h2>
<p>Problem accessing /druid/v2/datasources/<DATASOURCE_NAME>/candidates. Reason:
<pre> java.lang.IllegalArgumentException: Invalid format: "1900-01-01T00:00:00.000 01:00" is malformed at " 01:00"</pre></p>
<hr /><i><small>Powered by Jetty://</small></i>
</body>
</html>
If I remove the timezone in the request (/druid/v2/datasources/<DATASOURCE_NAME>/candidates?intervals=1900-01-01T00:00:00.000/3000-01-01T00:00:00.000), the candidates returns correctly.
Created 05-24-2017 06:52 AM
Issue reported on https://github.com/druid-io/druid/issues/4322
Created 06-08-2017 07:08 PM
https://issues.apache.org/jira/browse/HIVE-16576 should fix this.