Support Questions
Find answers, ask questions, and share your expertise

UDAF and count distinct leads to exception

UDAF and count distinct leads to exception

New Contributor

on Hive (version 3.1.0.3.0.1.0-187) I get an exception when i use a udaf along with a count distinct. I have tried it with multiple udaf so i think it is not limited to my udaf implementation.

NOT WORKING

SELECT
hll_estimate(c1) c_2,
count(distinct c1) c_3
FROM
(
select 1 as c1
union all
select 1 as c1
union all
select 2 as c1
)t1

EXCEPTION

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"_col0":1},"value":{"_col0":"abc.0.68:s24_AAAYAQEAAAACd9wX"}}
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:372)
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:294)
	... 18 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public boolean com.zzz.qqq.yyy.xxx$Evaluator.merge(java.lang.String) with arguments {abc.0.68:s24_AAAYAQEAAAACd9wX}:argument type mismatch
	at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1111)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.merge(GenericUDAFBridge.java:176)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:216)
	at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641)
	at org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:880)
	at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:724)
	at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:790)
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:363)
	... 19 more
Caused by: java.lang.IllegalArgumentException: argument type mismatch
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1091)
	... 26 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1543959275889_0020_4_03 [Reducer 3] killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 4, vertexId=vertex_1543959275889_0020_4_04, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex vertex_1543959275889_0020_4_04 [Reducer 4] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1 (state=08S01,code=2)


PLAN

INFO  : Starting task [Stage-3:EXPLAIN] in serial mode
INFO  : Completed executing command(queryId=hive_20181206174558_f56287a1-5438-4598-9f44-6e3e22a6d176); Time taken: 0.004 seconds
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
| Plan optimized by CBO.                             |
|                                                    |
| Vertex dependency in root stage                    |
| Map 1 <- Union 2 (CONTAINS)                        |
| Map 5 <- Union 2 (CONTAINS)                        |
| Map 6 <- Union 2 (CONTAINS)                        |
| Reducer 3 <- Union 2 (SIMPLE_EDGE)                 |
| Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)        |
|                                                    |
| Stage-0                                            |
|   Fetch Operator                                   |
|     limit:-1                                       |
|     Stage-1                                        |
|       Reducer 4                                    |
|       File Output Operator [FS_13]                 |
|         Group By Operator [GBY_19] (rows=1 width=16) |
|           Output:["_col0","_col1"],aggregations:["hll_estimate(VALUE._col0)","count(VALUE._col1)"] |
|         <-Reducer 3 [CUSTOM_SIMPLE_EDGE]           |
|           PARTITION_ONLY_SHUFFLE [RS_18]           |
|             Group By Operator [GBY_17] (rows=1 width=192) |
|               Output:["_col0","_col1"],aggregations:["hll_estimate(_col1)","count(_col0)"] |
|               Group By Operator [GBY_16] (rows=1 width=188) |
|                 Output:["_col0","_col1"],aggregations:["hll_estimate(VALUE._col0)"],keys:KEY._col0 |
|               <-Union 2 [SIMPLE_EDGE]              |
|                 <-Map 1 [CONTAINS]                 |
|                   Reduce Output Operator [RS_24]   |
|                     PartitionCols:_col0            |
|                     Group By Operator [GBY_23] (rows=1 width=188) |
|                       Output:["_col0","_col1"],aggregations:["hll_estimate(_col0)"],keys:_col0 |
|                       Select Operator [SEL_21] (rows=1 width=4) |
|                         Output:["_col0"]           |
|                         TableScan [TS_20] (rows=1 width=10) |
|                 <-Map 5 [CONTAINS]                 |
|                   Reduce Output Operator [RS_29]   |
|                     PartitionCols:_col0            |
|                     Group By Operator [GBY_28] (rows=1 width=188) |
|                       Output:["_col0","_col1"],aggregations:["hll_estimate(_col0)"],keys:_col0 |
|                       Select Operator [SEL_26] (rows=1 width=4) |
|                         Output:["_col0"]           |
|                         TableScan [TS_25] (rows=1 width=10) |
|                 <-Map 6 [CONTAINS]                 |
|                   Reduce Output Operator [RS_34]   |
|                     PartitionCols:_col0            |
|                     Group By Operator [GBY_33] (rows=1 width=188) |
|                       Output:["_col0","_col1"],aggregations:["hll_estimate(_col0)"],keys:_col0 |
|                       Select Operator [SEL_31] (rows=1 width=4) |
|                         Output:["_col0"]           |
|                         TableScan [TS_30] (rows=1 width=10) |
|                                                    |
+----------------------------------------------------+

A query without distinct works fine

0: jdbc:hive2://localhost:10000> SELECT
. . . . . . . . . . . . . . . .> hll_estimate(c1) c_2,
. . . . . . . . . . . . . . . .> count ( c1) c_3
. . . . . . . . . . . . . . . .> FROM
. . . . . . . . . . . . . . . .> (
. . . . . . . . . . . . . . . .> select 1 as c1
. . . . . . . . . . . . . . . .> union all
. . . . . . . . . . . . . . . .> select 1 as c1
. . . . . . . . . . . . . . . .> union all
. . . . . . . . . . . . . . . .> select 2 as c1
. . . . . . . . . . . . . . . .> )t1;

INFO  : Compiling command(queryId=hive_20181206182613_b986f93e-fc9e-4fdb-a814-dd5e59ce5365): SELECT
hll_estimate(c1) c_2,
count ( c1) c_3
FROM
(
select 1 as c1
union all
select 1 as c1
union all
select 2 as c1
)t1

INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:c_2, type:bigint, comment:null), FieldSchema(name:c_3, type:bigint, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20181206182613_b986f93e-fc9e-4fdb-a814-dd5e59ce5365); Time taken: 0.243 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20181206182613_b986f93e-fc9e-4fdb-a814-dd5e59ce5365): SELECT
hll_estimate(c1) c_2,
count ( c1) c_3
FROM
(
select 1 as c1
union all
select 1 as c1
union all
select 2 as c1
)t1

INFO  : Query ID = hive_20181206182613_b986f93e-fc9e-4fdb-a814-dd5e59ce5365
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Subscribed to counters: [] for queryId: hive_20181206182613_b986f93e-fc9e-4fdb-a814-dd5e59ce5365
INFO  : Tez session hasn't been created yet. Opening session
INFO  : Dag name: SELECT
hll_estimate(c1) c_2,
count ...c1
)t1 (Stage-1)

INFO  : Status: Running (Executing on YARN cluster with App id application_1543959275889_0021)
----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 3 ...... container     SUCCEEDED      1          1        0        0       0       0  
Map 4 .......... container     SUCCEEDED      1          1        0        0       0       0  
Map 5 .......... container     SUCCEEDED      1          1        0        0       0       0  
----------------------------------------------------------------------------------------------
VERTICES: 04/04  [==========================>>] 100%  ELAPSED TIME: 8.73 s     
----------------------------------------------------------------------------------------------

INFO  : Completed executing command(queryId=hive_20181206182613_b986f93e-fc9e-4fdb-a814-dd5e59ce5365); Time taken: 13.706 seconds
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
+------+------+
| c_2  | c_3  |
+------+------+
| 2    | 3    |
+------+------+