Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive select from ORC involving row_number and FROM_UTC_TIMESTAMP giving error

Highlighted

Hive select from ORC involving row_number and FROM_UTC_TIMESTAMP giving error

New Contributor

This error is happening for the big tables where the map join is not possible and the optimizer is choosing the merge join. To mimic it I have given the below steps and I am using Hive version 1.2.1. This issue is only happening for the "tez" session and not producible in "mr".

set hive.execution.engine=tez;

create table default.asim_test1 (col1 STRING, col2 string, col3 string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' ;

insert into default.asim_test1 values('105','test','2017-01-01 12:30:30');

--- As I am forcing the optimizer to choose the merge join I am executing below command I dont need to in real case as my table is very big to fit in memory and optimizer is choosing merge join instead of map join.

set hive.auto.convert.join=false;

---Below is the query which is erroring out.

select A.col1,A.col2, A.lastmodifiedtimestamp current_modified, B.col2 , B.lastmodifiedtimestamp hist_modified from ( select * from (select col1,col2, FROM_UTC_TIMESTAMP(CAST(from_unixtime(unix_timestamp(regexp_replace(substr(col3,1,19),'T',' '))) AS TIMESTAMP),'CST') lastmodifiedtimestamp, ROW_NUMBER()over(PARTITION BY col1 ORDER BY col3 desc ) rnm from default.asim_test1 )A WHERE rnm=1 ) A INNER JOIN ( select * from (select col1,col2, FROM_UTC_TIMESTAMP(CAST(from_unixtime(unix_timestamp(regexp_replace(substr(col3,1,19),'T',' '))) AS TIMESTAMP),'CST') lastmodifiedtimestamp, ROW_NUMBER()over(PARTITION BY col1 ORDER BY col3 desc ) rnm from default.asim_test1 )A WHERE rnm=1 ) B ON A.col1=B.col1 limit 10;

Below is error I am getting:-

Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators: org.apache.hadoop.hive.serde2.io.TimestampWritable cannot be cast to java.sql.Timestamp at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:313) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:164)

Any advice will be appreciated.