Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar

Issue:

Below is a sample test

CREATE DATABASE IF NOT EXISTS tmp; 
DROP TABLE IF EXISTS tmp.orders; 
CREATE TABLE tmp.orders (username STRING, order_creation_date TIMESTAMP, amount DOUBLE); 
INSERT INTO TABLE tmp.orders VALUES ("jack", '2017-02-26 13:45:12', 88.2), ("jones", '2017-02-28 15:28:14', 92.4); 

HDP 2.4.2

SELECT username FROM tmp.orders WHERE order_creation_date > '2017-02-27'; 
OK 
jones
select * from tmp.orders; 
OK 
jack 2017-02-26 13:45:12 88.2 
jones 2017-02-28 15:28:14 92.4 
HDP 2.5.0 & 2.5.3
SELECT username FROM tmp.orders WHERE order_creation_date > '2017-02-27'; 
OK

SELECT username FROM tmp.orders WHERE order_creation_date > TO_DATE('2017-02-27'); 
OK 

You can notice the above 2 queries did not display any result

Resolution:

Workaround: Use cast function in hive or include hh:mm:ss as below

SELECT username FROM tmp.orders WHERE order_creation_date > '2017-02-27 00:00:00'; 
OK 
jones

This is a known bug in hive

Reference: Apache, Hortonworks

49,717 Views
Comments
avatar

May I have apache jira ID?

avatar

This is interesting. Thank you for posting this resolution to this comparison as I was able to use a similar approach to my issue!

I am on HDP 2.6.1 and I had some similar unexpected behavior. When selecting a timestamp column, I wanted to grab all columns where the timestamp was within a certain date. However, when it returned the rows.... it "zeroed" the timestamp column to "yyyy-mm-dd 00:00:00" where ymd was the actual day. I found it bizarre since I only applied a function to my where clause and not my selected column. Here is an example:

-- Unexpected query behavior
SELECT time_stamp_column
FROM my_table
WHERE date(time_stamp_column) = '2017-09-15'
limit 4;

--Unexpected results (no timestamps in table with values "00:00:00 0")
2017-09-15 00:00:00 0
2017-09-15 00:00:00 0
2017-09-15 00:00:00 0
2017-09-15 00:00:00 0

--Query using cast
SELECT time_stamp_column
FROM my_table
WHERE time_stamp_column >= '2017-09-15 00:00:00' AND
      time_stamp_column < '2017-09-16 00:00:00' 
limit 4;

--Expected/desired results (this is the actual data, as no columns have exactly the 00:00:00 timestamp)
2017-09-15 17:25:28.766248
2017-09-15 17:29:05.427199
2017-09-15 17:29:08.219565
2017-09-15 17:33:20.907088

Thanks @knarendran !

Hopefully someone else will find this useful.