Created 03-07-2018 02:04 PM
Problem container : container-e144-1518080001967-51536-01-000047.txt
Created 03-07-2018 02:19 PM
I see frequent full GC in logs. Can you share yarn application log and the query that you executed?
Also, share explain plan for the query.
Created 03-07-2018 04:25 PM
Application log very big:
[yarn@serv03 ~]$ ls -hl /tmp/application_1518080001967_51536.log -rw-r--r-- 1 yarn hadoop 89M Mar 7 18:10 /tmp/application_1518080001967_51536.log
Created 03-07-2018 04:30 PM
[yarn@ks-dmp03 ~]$ hive log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAp Logging initialized using configuration in file:/etc/hive/2.6.3.0-235/0/hive-log hive> explain WITH DPI AS (SELECT SERVICE_NAME, trim(IP_ADDRESS) AS IP_ADDR FROM OTHER_SOURCES.DIM_DPI_IP_SERVICE), DSD AS (SELECT MSISDN, NVL(PROT_TYPE,0) AS PROTOCOL_ID, NVL(PROT_CATEGORY,0) AS PROTOCOL_CATEGORY_ID, TETHERING_FLAG, RAT, NETWORK_UL_TRAFFIC, NETWORK_DL_TRAFFIC, END_TIME_N, BEGIN_TIME_N, SID, trim(SERVER_IP) AS SERV_IP FROM OTHER_SOURCES.DPI_OTHER_DETAIL_UFDR WHERE HDAY="2018-03-06"), JN AS (SELECT DSD.MSISDN, DSD.PROTOCOL_ID, DSD.PROTOCOL_CATEGORY_ID, NVL(DPI.SERVICE_NAME,"Other") AS CUSTOMIZED_CATEGORY, DSD.TETHERING_FLAG, DSD.RAT, DSD.NETWORK_UL_TRAFFIC, DSD.NETWORK_DL_TRAFFIC, DSD.END_TIME_N, DSD.BEGIN_TIME_N, DSD.SID FROM DSD LEFT JOIN DPI ON DSD.SERV_IP = DPI.IP_ADDR) INSERT INTO AGG_DPI.AGG_DPI_TRAFFIC_DAILY_l PARTITION(HDAY="2018-03-06",SRC="OTHER_DETAIL") SELECT JN.MSISDN, JN.PROTOCOL_ID, JN.PROTOCOL_CATEGORY_ID, JN.CUSTOMIZED_CATEGORY, JN.TETHERING_FLAG, ROUND(SUM(CASE WHEN JN.RAT = 2 THEN JN.NETWORK_UL_TRAFFIC+JN.NETWORK_DL_TRAFFIC ELSE 0 END)/ 1024, 2) AS DATA_VOL_2G_AMT, ROUND(SUM(CASE WHEN JN.RAT = 1 THEN JN.NETWORK_UL_TRAFFIC+JN.NETWORK_DL_TRAFFIC ELSE 0 END)/ 1024, 2) AS DATA_VOL_3G_AMT, ROUND(SUM(CASE WHEN JN.RAT = 0 THEN JN.NETWORK_UL_TRAFFIC+JN.NETWORK_DL_TRAFFIC ELSE 0 END)/ 1024, 2) AS DATA_VOL_0_AMT, ROUND(SUM(JN.NETWORK_UL_TRAFFIC+JN.NETWORK_DL_TRAFFIC)/ 1024, 2) DATA_VOL_TOTAL_AMT, SUM(CASE WHEN JN.RAT = 2 THEN END_TIME_N - BEGIN_TIME_N ELSE 0 END) DATA_SESSIONS_DUR_2G, SUM(CASE WHEN JN.RAT = 1 THEN END_TIME_N - BEGIN_TIME_N ELSE 0 END) DATA_SESSIONS_DUR_3G, SUM(CASE WHEN JN.RAT = 0 THEN END_TIME_N - BEGIN_TIME_N ELSE 0 END) DATA_SESSIONS_DUR_0, SUM(END_TIME_N - BEGIN_TIME_N) DATA_SESSIONS_DUR, COUNT(DISTINCT CASE WHEN JN.RAT = 2 THEN JN.SID END) AS DATA_SESSIONS_2G_CNT, COUNT(DISTINCT CASE WHEN JN.RAT = 1 THEN JN.SID END) AS DATA_SESSIONS_3G_CNT, COUNT(DISTINCT CASE WHEN JN.RAT = 0 THEN JN.SID END) AS DATA_SESSIONS_0_CNT, COUNT(DISTINCT JN.SID) AS DATA_SESSIONS_CNT, 1 AS LOAD_ID, CURRENT_TIMESTAMP AS INS_DT FROM JN GROUP BY JN.MSISDN, JN.PROTOCOL_ID, JN.PROTOCOL_CATEGORY_ID, JN.TETHERING_FLAG, JN.CUSTOMIZED_CATEGORY; OK Plan not optimized by CBO due to missing statistics. Please check log for more details.
Created 03-07-2018 04:31 PM
Vertex dependency in root stage Map 1 <- Map 3 (BROADCAST_EDGE) Reducer 2 <- Map 1 (SIMPLE_EDGE) Stage-3 Stats-Aggr Operator Stage-0 Move Operator partition:{"hday":"2018-03-06","src":"OTHER_DETAIL"} table:{"name:":"agg_dpi.agg_dpi_traffic_daily_l","input format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde"} Stage-2 Dependency Collection{} Stage-1 Reducer 2 File Output Operator [FS_15] compressed:true Statistics:Num rows: 419376708 Data size: 435732400043 Basic stats: COMPLETE Column stats: NONE table:{"name:":"agg_dpi.agg_dpi_traffic_daily_l","input format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde"} Select Operator [SEL_13] outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17","_col18"] Statistics:Num rows: 419376708 Data size: 435732400043 Basic stats: COMPLETE Column stats: NONE Group By Operator [GBY_12] | aggregations:["sum(VALUE._col0)","sum(VALUE._col1)","sum(VALUE._col2)","sum(VALUE._col3)","sum(VALUE._col4)","sum(VALUE._col5)","sum(VALUE._col6)","sum(VALUE._col7)","count(DISTINCT KEY._col5:0._col0)","count(DISTINCT KEY._col5:1._col0)","count(DISTINCT KEY._col5:2._col0)","count(DISTINCT KEY._col5:3._col0)"] | keys:KEY._col0 (type: varchar(16)), KEY._col1 (type: decimal(10,0)), KEY._col2 (type: decimal(10,0)), KEY._col3 (type: decimal(1,0)), KEY._col4 (type: string) | outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16"] | Statistics:Num rows: 419376708 Data size: 435732400043 Basic stats: COMPLETE Column stats: NONE |<-Map 1 [SIMPLE_EDGE] Reduce Output Operator [RS_11] key expressions:_col0 (type: varchar(16)), _col1 (type: decimal(10,0)), _col2 (type: decimal(10,0)), _col3 (type: decimal(1,0)), _col4 (type: string), _col5 (type: decimal(20,0)), _col6 (type: decimal(20,0)), _col7 (type: decimal(20,0)), _col8 (type: decimal(20,0)) Map-reduce partition columns:_col0 (type: varchar(16)), _col1 (type: decimal(10,0)), _col2 (type: decimal(10,0)), _col3 (type: decimal(1,0)), _col4 (type: string) sort order:+++++++++ Statistics:Num rows: 838753416 Data size: 871464800087 Basic stats: COMPLETE Column stats: NONE value expressions:_col9 (type: decimal(31,0)), _col10 (type: decimal(31,0)), _col11 (type: decimal(31,0)), _col12 (type: decimal(31,0)), _col13 (type: decimal(21,0)), _col14 (type: decimal(21,0)), _col15 (type: decimal(21,0)), _col16 (type: decimal(21,0)) Group By Operator [GBY_10] aggregations:["sum(CASE WHEN ((_col5 = 2)) THEN ((_col6 + _col7)) ELSE (0) END)","sum(CASE WHEN ((_col5 = 1)) THEN ((_col6 + _col7)) ELSE (0) END)","sum(CASE WHEN ((_col5 = 0)) THEN ((_col6 + _col7)) ELSE (0) END)","sum((_col6 + _col7))","sum(CASE WHEN ((_col5 = 2)) THEN ((_col8 - _col9)) ELSE (0) END)","sum(CASE WHEN ((_col5 = 1)) THEN ((_col8 - _col9)) ELSE (0) END)","sum(CASE WHEN ((_col5 = 0)) THEN ((_col8 - _col9)) ELSE (0) END)","sum((_col8 - _col9))","count(DISTINCT CASE WHEN ((_col5 = 2)) THEN (_col10) END)","count(DISTINCT CASE WHEN ((_col5 = 1)) THEN (_col10) END)","count(DISTINCT CASE WHEN ((_col5 = 0)) THEN (_col10) END)","count(DISTINCT _col10)"] keys:_col0 (type: varchar(16)), _col1 (type: decimal(10,0)), _col2 (type: decimal(10,0)), _col4 (type: decimal(1,0)), _col3 (type: string), CASE WHEN ((_col5 = 2)) THEN (_col10) END (type: decimal(20,0)), CASE WHEN ((_col5 = 1)) THEN (_col10) END (type: decimal(20,0)), CASE WHEN ((_col5 = 0)) THEN (_col10) END (type: decimal(20,0)), _col10 (type: decimal(20,0)) outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17","_col18","_col19","_col20"] Statistics:Num rows: 838753416 Data size: 871464800087 Basic stats: COMPLETE Column stats: NONE Select Operator [SEL_8] outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10"] Statistics:Num rows: 838753416 Data size: 871464800087 Basic stats: COMPLETE Column stats: NONE Map Join Operator [MAPJOIN_19] | condition map:[{"":"Left Outer Join0 to 1"}] | HybridGraceHashJoin:true | keys:{"Map 3":"_col1 (type: string)","Map 1":"_col10 (type: string)"} | outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col11"] | Statistics:Num rows: 838753416 Data size: 871464800087 Basic stats: COMPLETE Column stats: NONE |<-Map 3 [BROADCAST_EDGE] vectorized | Reduce Output Operator [RS_22] | key expressions:_col1 (type: string) | Map-reduce partition columns:_col1 (type: string) | sort order:+ | Statistics:Num rows: 65114 Data size: 3581293 Basic stats: COMPLETE Column stats: NONE | value expressions:_col0 (type: varchar(40)) | Select Operator [OP_21] | outputColumnNames:["_col0","_col1"] | Statistics:Num rows: 65114 Data size: 3581293 Basic stats: COMPLETE Column stats: NONE | TableScan [TS_3] |alias:dim_dpi_ip_service |Statistics:Num rows: 65114 Data size: 3581293 Basic stats: COMPLETE Column stats: NONE |<-Select Operator [SEL_2] outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10"] Statistics:Num rows: 762503089 Data size: 792240710181 Basic stats: COMPLETE Column stats: NONE TableScan [TS_0] ACID table:true alias:dpi_other_detail_ufdr Statistics:Num rows: 762503089 Data size: 792240710181 Basic stats: COMPLETE Column stats: NONE Time taken: 3.479 seconds, Fetched: 70 row(s) hive>
Created 03-07-2018 04:43 PM
In log see next errors :
[yarn@ks-dmp03 ~]$ grep -i ERROR /tmp/application_1518080001967_51536.log | grep -v HADOOP_ROOT_LOGGER | grep -v Dtez.root.logger | awk '{print $1 $3 $4 $5 $6 $7 $8 $9 $10 $11 $12 $13 $14 $15 $16}' | sort -u 2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000034_0accordingtoscheduler.Taskreported 2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000035_0accordingtoscheduler.Taskreported 2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000036_0accordingtoscheduler.Taskreported 2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000037_0accordingtoscheduler.Taskreported 2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000038_0accordingtoscheduler.Taskreported 2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000039_0accordingtoscheduler.Taskreported 2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000040_0accordingtoscheduler.Taskreported ****************************** 2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000107_0accordingtoscheduler.Taskreported 2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000108_0accordingtoscheduler.Taskreported 2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000109_0accordingtoscheduler.Taskreported 2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000110_0accordingtoscheduler.Taskreported 2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000111_0accordingtoscheduler.Taskreported 2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000112_0accordingtoscheduler.Taskreported 2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000114_0accordingtoscheduler.Taskreported 2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000115_0accordingtoscheduler.Taskreported 2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000116_0accordingtoscheduler.Taskreported 2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000117_0accordingtoscheduler.Taskreported 2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000118_0accordingtoscheduler.Taskreported 2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000119_0accordingtoscheduler.Taskreported 2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000120_0accordingtoscheduler.Taskreported 2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000121_0accordingtoscheduler.Taskreported 2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000122_0accordingtoscheduler.Taskreported 2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000124_0accordingtoscheduler.Taskreported 2018-03-07[ERROR][TezChild]|tez.ReduceRecordProcessor|:Hiterrorwhileclosingoperators-failingtree 2018-03-07[ERROR][TezChild]|tez.TezProcessor|:java.lang.InterruptedException [yarn@ks-dmp03 ~]$
Created 03-07-2018 02:32 PM
@Dmitro Vasilenko The only thing your log is telling is GC information. Can you please share some more details about your query and job logs from yarn?
Created 03-08-2018 03:49 AM
Can you try this query without map join . Looks like data size is huge for hybrid grace hash join.
set hive.auto.convert.join=false;
And then execute this query.