About crayonml

crayonml · ‎06-20-2018

@Paul Hernandez Hey Paul - did you find a solution to this? It looks like its only parquet thats affected..csv doesnt have this problem. I too have data in subdirectories and spark sql returns null

crayonml · ‎05-29-2018

@Gunther Hagleitner @Scott Shaw any ideas?

crayonml · ‎05-15-2018

@Gunther Hagleitner See error below: 0: jdbc:hive2://localhost:10000> select count(*) from db1.table1; Waiting to acquire compile lock. Acquired the compile lock. ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 ......... container RUNNING 1034 1010 0 24 39 0 Reducer 2 container INITED 1 0 0 1 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 00/02 [=========================>>-] 97% ELAPSED TIME: 53.36 s ---------------------------------------------------------------------------------------------- Error: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1525980884509_57715_1_00, diagnostics=[Task failed, taskId=task_1525980884509_57715_1_00_000565, diagnostics=[TaskAttempt 0 failed, info=[Container container_1525980884509_57715_01_000003 finished with diagnostics set to [Container failed, exitCode=-104. Container [pid=14308,containerID=container_1525980884509_57715_01_000003] is running beyond physical memory limits. Current usage: 1.6 GB of 1.5 GB physical memory used; 3.5 GB of 7.5 GB virtual memory used. Killing container. Dump of the process-tree for container_1525980884509_57715_01_000003 : I too agree that its weird a simple count(*) is getting OOM. Interestingly, if i do something like the below, it counts fine: 0: jdbc:hive2://localhost:10000> select sum(cnt) from (select count(*) cnt, run_detail_id from db1.table1 group by run_detail_id) a; Waiting to acquire compile lock. Acquired the compile lock. ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... container SUCCEEDED 1034 1034 0 0 0 0 Reducer 2 ...... container SUCCEEDED 211 211 0 0 0 0 Reducer 3 ...... container SUCCEEDED 1 1 0 0 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 03/03 [==========================>>] 100% ELAPSED TIME: 47.64 s ---------------------------------------------------------------------------------------------- +------------+ | _c0 | +------------+ | 227394274 | +------------+

crayonml · ‎05-15-2018

@Scott Shaw Thanks Scott. That is what I had assumed but whats interesting is i have 2 tables: Table2 is much larger than 1 based off TableScan. Table 1: |TableScan[TS_0](rows=1 width=53922865152)| and Table 2: |TableScan[TS_0](rows=1 width=83922865152)| When I do a count(*) group by col0 on table 1. There is only one ID. When I do a count(*) group by col0 on table 2 there are about 5-6 ID's but still one that is heavily skewed. Table 1 gets OOM error while table 2 doesnt. Any ideas on why this would be happening?

crayonml · ‎05-14-2018

Hello, I am trying to better understand hive explain and hive performance. I have a simple count(*) that is failing with OOM. Rather than just increasing tez container size memory, im trying to understand why its failing. Explain plan below: It looks like its grouping by _col0. What is the real name of this column? Is there a way to find out? +----------------------------------------------------+ | Explain | +----------------------------------------------------+ | Plan optimized by CBO. | | | | Vertex dependency in root stage | | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) | | | | Stage-0 | | Fetch Operator | | limit:-1 | | Stage-1 | | Reducer 2 | | File Output Operator [FS_6] | | Group By Operator [GBY_4] (rows=1 width=8) | | Output:["_col0"],aggregations:["count(VALUE._col0)"] | | <-Map 1 [CUSTOM_SIMPLE_EDGE] | | PARTITION_ONLY_SHUFFLE [RS_3] | | Group By Operator [GBY_2] (rows=1 width=8) | | Output:["_col0"],aggregations:["count()"] | | Select Operator [SEL_1] (rows=1 width=53922865152) | | TableScan [TS_0] (rows=1 width=53922865152) | | db1@tb1,tb1,Tbl:COMPLETE,Col:COMPLETE |

crayonml · ‎04-18-2018

Hello, I have thousands of databases/tables/partitions in my metastore and noticing HS2 takes almost 20-25min to be available. I've turned on debug and notived hive-server2.log spends all the time doing the below across all of my tables and views. 2018-04-18T22:47:07,941 DEBUG [main([])]: sqlstd.SQLStdHiveAuthorizationValidator (SQLStdHiveAuthorizationValidator.java:filterListCmdObjects(148)) - Obtained following objects in filterListCmdObjects [Object [type=TABLE_OR_VIEW, name=ued_idp_psa.stg_taxml], Object [type=TABLE_OR_VIEW, name=table1], Object [type=TABLE_OR_VIEW, name=table1], Object [type=TABLE_OR_VIEW, name=table2], Object [type=TABLE_OR_VIEW, name=table3], Object [type=TABLE_OR_VIEW, name=table4]] for user hive. Context Info: QueryContext [commandString=null, forwardedAddresses=null] Is there a way to disable this validator or other solutions? After ~25min, i get the below message and am able to connect via beeline. 2018-04-18T 22:05:37,205 INFO [main([])]: server.HiveServer2 (HiveServer2.java:start(508)) - Web UI has started on port 10002 Thanks,

Online	Offline
Last Visited	‎11-09-2015 11:38 AM

Member Since	‎11-04-2014 12:23 PM
Last Visited	‎11-09-2015 11:38 AM
Posts	13

Cloudera Community

Re: Hive External Table with Parquet Format produc...

Re: What is _col0 in hive explain plan?

Re: What is _col0 in hive explain plan?

Re: What is _col0 in hive explain plan?

What is _col0 in hive explain plan?

HiveServer2 takes long time (25+min) to startup wi...