Support Questions

Find answers, ask questions, and share your expertise

Inconsistency by 1 * (NULL), and causes AnalysisException

avatar
Frequent Visitor
0: jdbc:hive2://localhost:21050/defalt> select 1 * (NULL);
null
+-------------+--+
| 1 * (null)  |
+-------------+--+
+-------------+--+

vs.

0: jdbc:hive2://localhost:21050/defalt> select 1 * (CASE WHEN '1' IS NULL THEN NULL ELSE NULL END);
Error: AnalysisException: Arithmetic operation requires numeric operands: 1 * (CASE WHEN '1' IS NULL THEN NULL ELSE NULL END) (state=HY000,code=0)

Is this a bug in Impala?

Impala version is 2.11 (CDH 5.14.4)

 

Same statement works with Hive.

1 ACCEPTED SOLUTION

avatar

Yeah I agree there is some inconsistency in behaviour here - the casting rules, especially around NULL, are too complex and inconsistent.

View solution in original post

6 REPLIES 6

avatar

I see what you're saying there. The different from Hive isn't deliberate, but I believe this only happens in the very special case when all of the values in the CASE are NULL, without any cast. If any of them have a type, the query will execute. E.g. the below query runs and returns NULL.

 

select 1 * (CASE WHEN '1' IS NULL THEN NULL ELSE cast(NULL as integer) END);

Interestingly postgres behaves the same way as Impala

 

psql (9.5.17, server 9.4.8)
Type "help" for help.

postgres=# select 1 * (CASE WHEN '1' IS NULL THEN NULL ELSE NULL END);
ERROR:  operator does not exist: integer * text
LINE 1: select 1 * (CASE WHEN '1' IS NULL THEN NULL ELSE NULL END);
                 ^
HINT:  No operator matches the given name and argument type(s). You might need to add explicit type casts.
postgres=#  select 1 * (NULL);
 ?column? 
----------
         
(1 row)

postgres=# select 1 * (NULL);
 ?column? 
----------
         
(1 row)

 

We could probably file a bug if this is causing issues for you, there no specific reason it shouldn't work that I know of.

avatar
Frequent Visitor

Thank you for your reply, @Tim Armstrong !

If Impala doesn't throw Exception by CASE, that would be very nice.

 

But now I found more interesting issue 🙂

 

Old Impala 2.5 => No error

[root@quickstart ~]# beeline -u "jdbc:hive2://`hostname -f`:21050/;auth=noSasl"
2019-06-12 07:27:40,131 WARN  [main] mapreduce.TableMapReduceUtil: The hbase-prefix-tree module jar containing PrefixTreeCodec is not present.  Continuing without it.
scan complete in 2ms
Connecting to jdbc:hive2://quickstart.cloudera:21050/;auth=noSasl
Connected to: Impala (version 2.5.0-cdh5.7.0)
Driver: Hive JDBC (version 1.1.0-cdh5.7.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.1.0-cdh5.7.0 by Apache Hive
0: jdbc:hive2://quickstart.cloudera:21050/> SELECT (-1 * CASE WHEN CAST(
0: jdbc:hive2://quickstart.cloudera:21050/>    SUM(
0: jdbc:hive2://quickstart.cloudera:21050/>            CASE when dummy_t.col1 = 1 then dummy_t.col2 else 0 END
0: jdbc:hive2://quickstart.cloudera:21050/>        ) AS DECIMAL(32, 2)
0: jdbc:hive2://quickstart.cloudera:21050/> ) IS NULL THEN NULL WHEN COALESCE(CAST(NULL AS DOUBLE), 0) = 0 THEN NULL ELSE CAST(
0: jdbc:hive2://quickstart.cloudera:21050/>    CASE WHEN CAST(NULL AS DOUBLE) = 0 THEN NULL ELSE (
0: jdbc:hive2://quickstart.cloudera:21050/>            CAST(
0: jdbc:hive2://quickstart.cloudera:21050/>                    SUM(
0: jdbc:hive2://quickstart.cloudera:21050/>                            CASE when dummy_t.col1 = 1 then dummy_t.col2 else 0 END
0: jdbc:hive2://quickstart.cloudera:21050/>                        ) AS DECIMAL(32, 2)
0: jdbc:hive2://quickstart.cloudera:21050/>                ) / CAST(NULL AS DOUBLE)
0: jdbc:hive2://quickstart.cloudera:21050/>        ) END AS DOUBLE
0: jdbc:hive2://quickstart.cloudera:21050/> ) END) FROM (SELECT 1 as col1, 2 as col2) dummy_t;
null
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| (-1 * 1 * case when cast(sum(case when dummy_t.col1 = 1 then dummy_t.col2 else 0 end) as decimal(32,2)) is null then null when coalesce(cast(null as double), 0) = 0 then null else cast(case when cast(null as double) = 0 then null else (cast(sum(case when dummy_t.col1 = 1 then dummy_t.col2 else 0 end) as decimal(32,2)) / cast(null as double)) end as double) end)  |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
No rows selected (0.093 seconds)
0: jdbc:hive2://quickstart.cloudera:21050/>

Newer Impala 2.11 and 3.0 doesn't like above:

[root@test /]# beeline -u "jdbc:hive2://`hostname -f`:21050/;auth=noSasl"
scan complete in 2ms
Connecting to jdbc:hive2://test.impala.com:21050/;auth=noSasl
Connected to: Impala (version 2.11.0-cdh5.14.4)
Driver: Hive JDBC (version 1.1.0-cdh5.14.4)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.1.0-cdh5.14.4 by Apache Hive
0: jdbc:hive2://hadoop-node-01.docker.infra.a> SELECT (-1 * CASE WHEN CAST(
. . . . . . . . . . . . . . . . . . . . . . .>    SUM(
. . . . . . . . . . . . . . . . . . . . . . .>            CASE when dummy_t.col1 = 1 then dummy_t.col2 else 0 END
. . . . . . . . . . . . . . . . . . . . . . .>        ) AS DECIMAL(32, 2)
. . . . . . . . . . . . . . . . . . . . . . .> ) IS NULL THEN NULL WHEN COALESCE(CAST(NULL AS DOUBLE), 0) = 0 THEN NULL ELSE CAST(
. . . . . . . . . . . . . . . . . . . . . . .>    CASE WHEN CAST(NULL AS DOUBLE) = 0 THEN NULL ELSE (
. . . . . . . . . . . . . . . . . . . . . . .>            CAST(
. . . . . . . . . . . . . . . . . . . . . . .>                    SUM(
. . . . . . . . . . . . . . . . . . . . . . .>                            CASE when dummy_t.col1 = 1 then dummy_t.col2 else 0 END
. . . . . . . . . . . . . . . . . . . . . . .>                        ) AS DECIMAL(32, 2)
. . . . . . . . . . . . . . . . . . . . . . .>                ) / CAST(NULL AS DOUBLE)
. . . . . . . . . . . . . . . . . . . . . . .>        ) END AS DOUBLE
. . . . . . . . . . . . . . . . . . . . . . .> ) END) FROM (SELECT 1 as col1, 2 as col2) dummy_t;
Error: AnalysisException: Arithmetic operation requires numeric operands: (-1 * CASE WHEN CAST(sum(CASE WHEN dummy_t.col1 = 1 THEN dummy_t.col2 ELSE 0 END) AS DECIMAL(32,2)) IS NULL THEN NULL ELSE NULL END) (state=HY000,code=0)

Did I find a bug this time? 😉

avatar
Frequent Visitor

I mean, the output/result of the following query is different by Impala version:

SELECT (-1 * CASE WHEN CAST(
   SUM(
           CASE when dummy_t.col1 = 1 then dummy_t.col2 else 0 END
       ) AS DECIMAL(32, 2)
) IS NULL THEN NULL WHEN COALESCE(CAST(NULL AS DOUBLE), 0) = 0 THEN NULL ELSE CAST(
   CASE WHEN CAST(NULL AS DOUBLE) = 0 THEN NULL ELSE (
           CAST(
                   SUM(
                           CASE when dummy_t.col1 = 1 then dummy_t.col2 else 0 END
                       ) AS DECIMAL(32, 2)
               ) / CAST(NULL AS DOUBLE)
       ) END AS DOUBLE
) END) FROM (SELECT 1 as col1, 2 as col2) dummy_t;

 

avatar

We have made some changes to decimal type resolution behaviour in recent versions. Most of them are controlled by the DECIMAL_V2 option and the default was changed starting in Impala 3.0.

 

The best way forward is to report a bug in the Impala issue tracker - https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala with the steps required to reproduce the symptoms you see. Then it can be tracked and triaged properly. This forum is not great for doing that.

avatar
Frequent Visitor

Hi @Tim Armstrong 

 

Thanks again for your reply. 

I submitted https://issues.apache.org/jira/browse/IMPALA-8664 . Hope it makes sense.

 

After reading your response, I thought DECIMAL would be the issue, but it seems not.

Using Impala 2.11 (CDH 5.14.4), no DECIMAL, no CAST, but still fails:

 

0: jdbc:hive2://test> SELECT (-1 * CASE WHEN SUM(1) IS NULL THEN NULL WHEN COALESCE(NULL, 0) = 0 THEN NULL ELSE 0 END) FROM (SELECT 1 as ytd_id, 2 as derv_alw_amt) dummy_t;
Error: AnalysisException: Arithmetic operation requires numeric operands: (-1 * CASE WHEN sum(1) IS NULL THEN NULL ELSE NULL END) (state=HY000,code=0)

No COALESCE, works!

 

 

0: jdbc:hive2://test> SELECT (-1 * CASE WHEN SUM(1) IS NULL THEN NULL WHEN 1 = 0 THEN NULL ELSE 0 END) FROM (SELECT 1 as ytd_id, 2 as derv_alw_amt) dummy_t;
0

 

 

BTW, because of the inconsistency I mentioned in the first comment, faliing below MAKES SENSE (because CASE returns NULL):

 

0: jdbc:hive2://test> SELECT (-1 * CASE WHEN SUM(1) IS NULL THEN NULL WHEN 0 = 0 THEN NULL ELSE 0 END) FROM (SELECT 1 as ytd_id, 2 as derv_alw_amt) dummy_t;
Error: AnalysisException: Arithmetic operation requires numeric operands: (-1 * CASE WHEN sum(1) IS NULL THEN NULL ELSE NULL END) (state=HY000,code=0)

 

HOWEVER, removing SUM returns different result:

 

0: jdbc:hive2://test> SELECT (-1 * CASE WHEN COALESCE(NULL, 0) = 0 THEN NULL ELSE 0 END) FROM (SELECT 1 as ytd_id, 2 as derv_alw_amt) dummy_t;
null

 

 

So this is a bit too confusing for me, and I think this would be an Impala bug.

 

avatar

Yeah I agree there is some inconsistency in behaviour here - the casting rules, especially around NULL, are too complex and inconsistent.