Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Hive adds timezone to Druid interval question

Contributor

I’m struggling with getting Hive to work with Druid. So far, I’ve got a working connection between the two and I can create datasources in Druid from Hive, and I can also (with some workaround) query the data from Hive. But not using the “avg” aggregation on one of the measures. If I do an explain on the query in hive, I get the following Json (after making it a bit easier to read)

{ "queryType":"select", "dataSource":"druid_sqoop_import_02", "descending":false, "intervals":[ "1900-01-01T00:00:00.000/3000-01-01T00:00:00.000" ], "filter":{ "type":"selector", "dimension":"target_database", "value":"ABC" }, "dimensions":[ "import_type", "target_database", "target_tblname", "source_database", "source_tblschema", "source_tblname" ], "metrics":[ "sqoop_duration", "merge_duration", "concatenate_duration", "sqoop_mappers", "sqoop_rows", "sqoop_size" ], "granularity":"all", "pagingSpec":{ "threshold":16384, "fromNext":true }, "context":{ "druid.query.fetch":false } }

If I run a normal Json REST call to druid, this query returns the values correctly. But from Hive, I get the following error.

java.io.IOException: org.apache.hive.druid.com.fasterxml.jackson.core.JsonParseException: Unexpected character ('<' (code 60)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')

If I look at the actual error message returned from the Druid Broker, it says the following.

<html> <head> <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/> <title>Error 500 </title> </head> <body> <h2>HTTP ERROR: 500</h2> <p>Problem accessing /druid/v2/datasources/druid_sqoop_import_02/candidates. Reason: <pre> java.lang.IllegalArgumentException: Invalid format: "1900-01-01T00:00:00.000 01:00" is malformed at " 01:00"</pre></p> <hr /><i><small>Powered by Jetty://</small></i> </body> </html>

So for some reason, it’s trying to add what I guess is the timezone as I’m living in UTC+1 to the end of the timestamp. I have also tried to do a where on __time in Hive, but it still adds the “01:00” at the end. Have anybody else had these problems, or can someone give me some pointers on how to solve this one?

3 REPLIES 3

Contributor

An update

What Hive is doing is calling the /druid/v2/datasources/<DATASOURCE_NAME>/candidates?intervals=1900-01-01T00:00:00.000+01:00/3000-01-01T00:00:00.000+01:00 interface to get a list of candidates. According to the broker documentation (http://druid.io/docs/latest/design/broker.html), the intervals should support ISO8601 standard for timestamps and that includes the “+00:01” at the end for timezone.

When I call the Broker directly with a simple curl command, I get the following response back from the broker.

<html>

<head>

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>

<title>Error 500 </title>

</head>

<body>

<h2>HTTP ERROR: 500</h2>

<p>Problem accessing /druid/v2/datasources/<DATASOURCE_NAME>/candidates. Reason:

<pre> java.lang.IllegalArgumentException: Invalid format: "1900-01-01T00:00:00.000 01:00" is malformed at " 01:00"</pre></p>

<hr /><i><small>Powered by Jetty://</small></i>

</body>

</html>

If I remove the timezone in the request (/druid/v2/datasources/<DATASOURCE_NAME>/candidates?intervals=1900-01-01T00:00:00.000/3000-01-01T00:00:00.000), the candidates returns correctly.

Contributor

Explorer
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.