Created 05-14-2018 09:22 PM
Hello,
I am trying to better understand hive explain and hive performance. I have a simple count(*) that is failing with OOM. Rather than just increasing tez container size memory, im trying to understand why its failing. Explain plan below:
It looks like its grouping by _col0. What is the real name of this column? Is there a way to find out?
+----------------------------------------------------+ | Explain | +----------------------------------------------------+ | Plan optimized by CBO. | | | | Vertex dependency in root stage | | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) | | | | Stage-0 | | Fetch Operator | | limit:-1 | | Stage-1 | | Reducer 2 | | File Output Operator [FS_6] | | Group By Operator [GBY_4] (rows=1 width=8) | | Output:["_col0"],aggregations:["count(VALUE._col0)"] | | <-Map 1 [CUSTOM_SIMPLE_EDGE] | | PARTITION_ONLY_SHUFFLE [RS_3] | | Group By Operator [GBY_2] (rows=1 width=8) | | Output:["_col0"],aggregations:["count()"] | | Select Operator [SEL_1] (rows=1 width=53922865152) | | TableScan [TS_0] (rows=1 width=53922865152) | | db1@tb1,tb1,Tbl:COMPLETE,Col:COMPLETE |
Created 05-15-2018 12:31 PM
Hi @Nick Xu. _col0 is the first column in the table definition. You can find out what it is by running "DESCRIBE <tablename>"
Created 05-15-2018 05:03 PM
Thanks Scott. That is what I had assumed but whats interesting is i have 2 tables: Table2 is much larger than 1 based off TableScan.
Table 1:
and Table 2:
When I do a count(*) group by col0 on table 1. There is only one ID.
When I do a count(*) group by col0 on table 2 there are about 5-6 ID's but still one that is heavily skewed.
Table 1 gets OOM error while table 2 doesnt. Any ideas on why this would be happening?
Created 05-15-2018 06:36 PM
"_col0" is an internal/generated column. If you run a "select count(*) from foo" hive does not have to read any columns from the table, it just needs to count the records. What the explain plan says is that the table scan and select operator first generate empty records from the table. Then the group by counts those and stores that in a generated column "_col0". (There is a second group by because Hive has to aggregate all the results from the different "mappers".)
If you just count all rows it's odd to run out of memory. What's the actual query and failure you're seeing?
Created 05-15-2018 09:02 PM
See error below:
0: jdbc:hive2://localhost:10000> select count(*) from db1.table1; Waiting to acquire compile lock. Acquired the compile lock. ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 ......... container RUNNING 1034 1010 0 24 39 0 Reducer 2 container INITED 1 0 0 1 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 00/02 [=========================>>-] 97% ELAPSED TIME: 53.36 s ---------------------------------------------------------------------------------------------- Error: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1525980884509_57715_1_00, diagnostics=[Task failed, taskId=task_1525980884509_57715_1_00_000565, diagnostics=[TaskAttempt 0 failed, info=[Container container_1525980884509_57715_01_000003 finished with diagnostics set to [Container failed, exitCode=-104. Container [pid=14308,containerID=container_1525980884509_57715_01_000003] is running beyond physical memory limits. Current usage: 1.6 GB of 1.5 GB physical memory used; 3.5 GB of 7.5 GB virtual memory used. Killing container. Dump of the process-tree for container_1525980884509_57715_01_000003 :
I too agree that its weird a simple count(*) is getting OOM. Interestingly, if i do something like the below, it counts fine:
0: jdbc:hive2://localhost:10000> select sum(cnt) from (select count(*) cnt, run_detail_id from db1.table1 group by run_detail_id) a; Waiting to acquire compile lock. Acquired the compile lock. ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... container SUCCEEDED 1034 1034 0 0 0 0 Reducer 2 ...... container SUCCEEDED 211 211 0 0 0 0 Reducer 3 ...... container SUCCEEDED 1 1 0 0 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 03/03 [==========================>>] 100% ELAPSED TIME: 47.64 s ---------------------------------------------------------------------------------------------- +------------+ | _c0 | +------------+ | 227394274 | +------------+
Created 05-29-2018 05:53 PM
@Gunther Hagleitner @Scott Shaw
any ideas?
Created 05-29-2018 06:03 PM
@Nick Xu Have you generated stats on the table?