Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

UDAF and count distinct leads to exception

UDAF and count distinct leads to exception

New Contributor

on Hive (version 3.1.0.3.0.1.0-187) I get an exception when i use a udaf along with a count distinct. I have tried it with multiple udaf so i think it is not limited to my udaf implementation.

NOT WORKING

SELECT
hll_estimate(c1) c_2,
count(distinct c1) c_3
FROM
(
select 1 as c1
union all
select 1 as c1
union all
select 2 as c1
)t1

EXCEPTION

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"_col0":1},"value":{"_col0":"abc.0.68:s24_AAAYAQEAAAACd9wX"}}
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:372)
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:294)
	... 18 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public boolean com.zzz.qqq.yyy.xxx$Evaluator.merge(java.lang.String) with arguments {abc.0.68:s24_AAAYAQEAAAACd9wX}:argument type mismatch
	at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1111)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.merge(GenericUDAFBridge.java:176)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:216)
	at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641)
	at org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:880)
	at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:724)
	at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:790)
	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:363)
	... 19 more
Caused by: java.lang.IllegalArgumentException: argument type mismatch
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1091)
	... 26 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1543959275889_0020_4_03 [Reducer 3] killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 4, vertexId=vertex_1543959275889_0020_4_04, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex vertex_1543959275889_0020_4_04 [Reducer 4] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1 (state=08S01,code=2)


PLAN

INFO  : Starting task [Stage-3:EXPLAIN] in serial mode
INFO  : Completed executing command(queryId=hive_20181206174558_f56287a1-5438-4598-9f44-6e3e22a6d176); Time taken: 0.004 seconds
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
| Plan optimized by CBO.                             |
|                                                    |
| Vertex dependency in root stage                    |
| Map 1 <- Union 2 (CONTAINS)                        |
| Map 5 <- Union 2 (CONTAINS)                        |
| Map 6 <- Union 2 (CONTAINS)                        |
| Reducer 3 <- Union 2 (SIMPLE_EDGE)                 |
| Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)        |
|                                                    |
| Stage-0                                            |
|   Fetch Operator                                   |
|     limit:-1                                       |
|     Stage-1                                        |
|       Reducer 4                                    |
|       File Output Operator [FS_13]                 |
|         Group By Operator [GBY_19] (rows=1 width=16) |
|           Output:["_col0","_col1"],aggregations:["hll_estimate(VALUE._col0)","count(VALUE._col1)"] |
|         <-Reducer 3 [CUSTOM_SIMPLE_EDGE]           |
|           PARTITION_ONLY_SHUFFLE [RS_18]           |
|             Group By Operator [GBY_17] (rows=1 width=192) |
|               Output:["_col0","_col1"],aggregations:["hll_estimate(_col1)","count(_col0)"] |
|               Group By Operator [GBY_16] (rows=1 width=188) |
|                 Output:["_col0","_col1"],aggregations:["hll_estimate(VALUE._col0)"],keys:KEY._col0 |
|               <-Union 2 [SIMPLE_EDGE]              |
|                 <-Map 1 [CONTAINS]                 |
|                   Reduce Output Operator [RS_24]   |
|                     PartitionCols:_col0            |
|                     Group By Operator [GBY_23] (rows=1 width=188) |
|                       Output:["_col0","_col1"],aggregations:["hll_estimate(_col0)"],keys:_col0 |
|                       Select Operator [SEL_21] (rows=1 width=4) |
|                         Output:["_col0"]           |
|                         TableScan [TS_20] (rows=1 width=10) |
|                 <-Map 5 [CONTAINS]                 |
|                   Reduce Output Operator [RS_29]   |
|                     PartitionCols:_col0            |
|                     Group By Operator [GBY_28] (rows=1 width=188) |
|                       Output:["_col0","_col1"],aggregations:["hll_estimate(_col0)"],keys:_col0 |
|                       Select Operator [SEL_26] (rows=1 width=4) |
|                         Output:["_col0"]           |
|                         TableScan [TS_25] (rows=1 width=10) |
|                 <-Map 6 [CONTAINS]                 |
|                   Reduce Output Operator [RS_34]   |
|                     PartitionCols:_col0            |
|                     Group By Operator [GBY_33] (rows=1 width=188) |
|                       Output:["_col0","_col1"],aggregations:["hll_estimate(_col0)"],keys:_col0 |
|                       Select Operator [SEL_31] (rows=1 width=4) |
|                         Output:["_col0"]           |
|                         TableScan [TS_30] (rows=1 width=10) |
|                                                    |
+----------------------------------------------------+

A query without distinct works fine

0: jdbc:hive2://localhost:10000> SELECT
. . . . . . . . . . . . . . . .> hll_estimate(c1) c_2,
. . . . . . . . . . . . . . . .> count ( c1) c_3
. . . . . . . . . . . . . . . .> FROM
. . . . . . . . . . . . . . . .> (
. . . . . . . . . . . . . . . .> select 1 as c1
. . . . . . . . . . . . . . . .> union all
. . . . . . . . . . . . . . . .> select 1 as c1
. . . . . . . . . . . . . . . .> union all
. . . . . . . . . . . . . . . .> select 2 as c1
. . . . . . . . . . . . . . . .> )t1;

INFO  : Compiling command(queryId=hive_20181206182613_b986f93e-fc9e-4fdb-a814-dd5e59ce5365): SELECT
hll_estimate(c1) c_2,
count ( c1) c_3
FROM
(
select 1 as c1
union all
select 1 as c1
union all
select 2 as c1
)t1

INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:c_2, type:bigint, comment:null), FieldSchema(name:c_3, type:bigint, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20181206182613_b986f93e-fc9e-4fdb-a814-dd5e59ce5365); Time taken: 0.243 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20181206182613_b986f93e-fc9e-4fdb-a814-dd5e59ce5365): SELECT
hll_estimate(c1) c_2,
count ( c1) c_3
FROM
(
select 1 as c1
union all
select 1 as c1
union all
select 2 as c1
)t1

INFO  : Query ID = hive_20181206182613_b986f93e-fc9e-4fdb-a814-dd5e59ce5365
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Subscribed to counters: [] for queryId: hive_20181206182613_b986f93e-fc9e-4fdb-a814-dd5e59ce5365
INFO  : Tez session hasn't been created yet. Opening session
INFO  : Dag name: SELECT
hll_estimate(c1) c_2,
count ...c1
)t1 (Stage-1)

INFO  : Status: Running (Executing on YARN cluster with App id application_1543959275889_0021)
----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 3 ...... container     SUCCEEDED      1          1        0        0       0       0  
Map 4 .......... container     SUCCEEDED      1          1        0        0       0       0  
Map 5 .......... container     SUCCEEDED      1          1        0        0       0       0  
----------------------------------------------------------------------------------------------
VERTICES: 04/04  [==========================>>] 100%  ELAPSED TIME: 8.73 s     
----------------------------------------------------------------------------------------------

INFO  : Completed executing command(queryId=hive_20181206182613_b986f93e-fc9e-4fdb-a814-dd5e59ce5365); Time taken: 13.706 seconds
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
+------+------+
| c_2  | c_3  |
+------+------+
| 2    | 3    |
+------+------+
Don't have an account?
Coming from Hortonworks? Activate your account here