Member since
11-04-2014
13
Posts
0
Kudos Received
0
Solutions
06-20-2018
04:49 PM
@Paul Hernandez Hey Paul - did you find a solution to this? It looks like its only parquet thats affected..csv doesnt have this problem. I too have data in subdirectories and spark sql returns null
... View more
05-15-2018
09:02 PM
@Gunther Hagleitner See error below: 0: jdbc:hive2://localhost:10000> select count(*) from db1.table1;
Waiting to acquire compile lock.
Acquired the compile lock.
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 ......... container RUNNING 1034 1010 0 24 39 0
Reducer 2 container INITED 1 0 0 1 0 0
----------------------------------------------------------------------------------------------
VERTICES: 00/02 [=========================>>-] 97% ELAPSED TIME: 53.36 s
----------------------------------------------------------------------------------------------
Error: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1525980884509_57715_1_00, diagnostics=[Task failed, taskId=task_1525980884509_57715_1_00_000565, diagnostics=[TaskAttempt 0 failed, info=[Container container_1525980884509_57715_01_000003 finished with diagnostics set to [Container failed, exitCode=-104. Container [pid=14308,containerID=container_1525980884509_57715_01_000003] is running beyond physical memory limits. Current usage: 1.6 GB of 1.5 GB physical memory used; 3.5 GB of 7.5 GB virtual memory used. Killing container.
Dump of the process-tree for container_1525980884509_57715_01_000003 :
I too agree that its weird a simple count(*) is getting OOM. Interestingly, if i do something like the below, it counts fine: 0: jdbc:hive2://localhost:10000> select sum(cnt) from (select count(*) cnt, run_detail_id from db1.table1 group by run_detail_id) a;
Waiting to acquire compile lock.
Acquired the compile lock.
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1034 1034 0 0 0 0
Reducer 2 ...... container SUCCEEDED 211 211 0 0 0 0
Reducer 3 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 03/03 [==========================>>] 100% ELAPSED TIME: 47.64 s
----------------------------------------------------------------------------------------------
+------------+
| _c0 |
+------------+
| 227394274 |
+------------+
... View more
05-15-2018
05:03 PM
@Scott Shaw Thanks Scott. That is what I had assumed but whats interesting is i have 2 tables: Table2 is much larger than 1 based off TableScan. Table 1:
|TableScan[TS_0](rows=1 width=53922865152)| and Table 2:
|TableScan[TS_0](rows=1 width=83922865152)| When I do a count(*) group by col0 on table 1. There is only one ID. When I do a count(*) group by col0 on table 2 there are about 5-6 ID's but still one that is heavily skewed. Table 1 gets OOM error while table 2 doesnt. Any ideas on why this would be happening?
... View more
05-14-2018
09:22 PM
Hello, I am trying to better understand hive explain and hive performance. I have a simple count(*) that is failing with OOM. Rather than just increasing tez container size memory, im trying to understand why its failing. Explain plan below: It looks like its grouping by _col0. What is the real name of this column? Is there a way to find out? +----------------------------------------------------+
| Explain |
+----------------------------------------------------+
| Plan optimized by CBO. |
| |
| Vertex dependency in root stage |
| Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) |
| |
| Stage-0 |
| Fetch Operator |
| limit:-1 |
| Stage-1 |
| Reducer 2 |
| File Output Operator [FS_6] |
| Group By Operator [GBY_4] (rows=1 width=8) |
| Output:["_col0"],aggregations:["count(VALUE._col0)"] |
| <-Map 1 [CUSTOM_SIMPLE_EDGE] |
| PARTITION_ONLY_SHUFFLE [RS_3] |
| Group By Operator [GBY_2] (rows=1 width=8) |
| Output:["_col0"],aggregations:["count()"] |
| Select Operator [SEL_1] (rows=1 width=53922865152) |
| TableScan [TS_0] (rows=1 width=53922865152) |
| db1@tb1,tb1,Tbl:COMPLETE,Col:COMPLETE |
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez
04-18-2018
10:54 PM
Hello, I have thousands of databases/tables/partitions in my metastore and noticing HS2 takes almost 20-25min to be available. I've turned on debug and notived hive-server2.log spends all the time doing the below across all of my tables and views. 2018-04-18T22:47:07,941 DEBUG [main([])]: sqlstd.SQLStdHiveAuthorizationValidator (SQLStdHiveAuthorizationValidator.java:filterListCmdObjects(148)) - Obtained following objects in filterListCmdObjects [Object [type=TABLE_OR_VIEW, name=ued_idp_psa.stg_taxml], Object [type=TABLE_OR_VIEW, name=table1], Object [type=TABLE_OR_VIEW, name=table1], Object [type=TABLE_OR_VIEW, name=table2], Object [type=TABLE_OR_VIEW, name=table3], Object [type=TABLE_OR_VIEW, name=table4]] for user hive. Context Info: QueryContext [commandString=null, forwardedAddresses=null] Is there a way to disable this validator or other solutions? After ~25min, i get the below message and am able to connect via beeline. 2018-04-18T 22:05:37,205 INFO [main([])]: server.HiveServer2 (HiveServer2.java:start(508)) - Web UI has started on port 10002 Thanks,
... View more
Labels:
- Labels:
-
Apache Hive