Support Questions
Find answers, ask questions, and share your expertise

Failure one of Reducer (on join two tables in hive2)

Contributor
7 REPLIES 7

Contributor
@Dmitro Vasilenko

I see frequent full GC in logs. Can you share yarn application log and the query that you executed?

Also, share explain plan for the query.

Contributor

Application log very big:

[yarn@serv03 ~]$ ls -hl /tmp/application_1518080001967_51536.log -rw-r--r-- 1 yarn hadoop 89M Mar 7 18:10 /tmp/application_1518080001967_51536.log

Contributor
[yarn@ks-dmp03 ~]$ hive
log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAp


Logging initialized using configuration in file:/etc/hive/2.6.3.0-235/0/hive-log
hive> explain WITH DPI AS (SELECT SERVICE_NAME, trim(IP_ADDRESS) AS IP_ADDR FROM OTHER_SOURCES.DIM_DPI_IP_SERVICE), DSD AS (SELECT MSISDN, NVL(PROT_TYPE,0) AS PROTOCOL_ID, NVL(PROT_CATEGORY,0) AS PROTOCOL_CATEGORY_ID, TETHERING_FLAG, RAT, NETWORK_UL_TRAFFIC, NETWORK_DL_TRAFFIC, END_TIME_N, BEGIN_TIME_N, SID, trim(SERVER_IP) AS SERV_IP FROM OTHER_SOURCES.DPI_OTHER_DETAIL_UFDR WHERE HDAY="2018-03-06"), JN AS (SELECT DSD.MSISDN, DSD.PROTOCOL_ID, DSD.PROTOCOL_CATEGORY_ID, NVL(DPI.SERVICE_NAME,"Other") AS CUSTOMIZED_CATEGORY, DSD.TETHERING_FLAG, DSD.RAT, DSD.NETWORK_UL_TRAFFIC, DSD.NETWORK_DL_TRAFFIC, DSD.END_TIME_N, DSD.BEGIN_TIME_N, DSD.SID FROM DSD LEFT JOIN DPI ON DSD.SERV_IP = DPI.IP_ADDR) INSERT INTO AGG_DPI.AGG_DPI_TRAFFIC_DAILY_l PARTITION(HDAY="2018-03-06",SRC="OTHER_DETAIL") SELECT JN.MSISDN, JN.PROTOCOL_ID, JN.PROTOCOL_CATEGORY_ID, JN.CUSTOMIZED_CATEGORY, JN.TETHERING_FLAG, ROUND(SUM(CASE WHEN JN.RAT = 2 THEN JN.NETWORK_UL_TRAFFIC+JN.NETWORK_DL_TRAFFIC ELSE 0 END)/ 1024, 2) AS DATA_VOL_2G_AMT, ROUND(SUM(CASE WHEN JN.RAT = 1 THEN JN.NETWORK_UL_TRAFFIC+JN.NETWORK_DL_TRAFFIC ELSE 0 END)/ 1024, 2) AS DATA_VOL_3G_AMT, ROUND(SUM(CASE WHEN JN.RAT = 0 THEN JN.NETWORK_UL_TRAFFIC+JN.NETWORK_DL_TRAFFIC ELSE 0 END)/ 1024, 2) AS DATA_VOL_0_AMT, ROUND(SUM(JN.NETWORK_UL_TRAFFIC+JN.NETWORK_DL_TRAFFIC)/ 1024, 2) DATA_VOL_TOTAL_AMT, SUM(CASE WHEN JN.RAT = 2 THEN END_TIME_N - BEGIN_TIME_N ELSE 0 END) DATA_SESSIONS_DUR_2G, SUM(CASE WHEN JN.RAT = 1 THEN END_TIME_N - BEGIN_TIME_N ELSE 0 END) DATA_SESSIONS_DUR_3G, SUM(CASE WHEN JN.RAT = 0 THEN END_TIME_N - BEGIN_TIME_N ELSE 0 END) DATA_SESSIONS_DUR_0, SUM(END_TIME_N - BEGIN_TIME_N) DATA_SESSIONS_DUR, COUNT(DISTINCT CASE WHEN JN.RAT = 2 THEN JN.SID END) AS DATA_SESSIONS_2G_CNT, COUNT(DISTINCT CASE WHEN JN.RAT = 1 THEN JN.SID END) AS DATA_SESSIONS_3G_CNT, COUNT(DISTINCT CASE WHEN JN.RAT = 0 THEN JN.SID END) AS DATA_SESSIONS_0_CNT, COUNT(DISTINCT JN.SID) AS DATA_SESSIONS_CNT, 1 AS LOAD_ID, CURRENT_TIMESTAMP AS INS_DT FROM JN GROUP BY JN.MSISDN, JN.PROTOCOL_ID, JN.PROTOCOL_CATEGORY_ID, JN.TETHERING_FLAG, JN.CUSTOMIZED_CATEGORY;
OK
Plan not optimized by CBO due to missing statistics. Please check log for more details.



Contributor

Vertex dependency in root stage
Map 1 <- Map 3 (BROADCAST_EDGE)
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Stage-3
   Stats-Aggr Operator
  Stage-0
  Move Operator
 partition:{"hday":"2018-03-06","src":"OTHER_DETAIL"}
 table:{"name:":"agg_dpi.agg_dpi_traffic_daily_l","input format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde"}
 Stage-2
 Dependency Collection{}
Stage-1
Reducer 2
File Output Operator [FS_15]
   compressed:true
   Statistics:Num rows: 419376708 Data size: 435732400043 Basic stats: COMPLETE Column stats: NONE
   table:{"name:":"agg_dpi.agg_dpi_traffic_daily_l","input format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde"}
   Select Operator [SEL_13]
  outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17","_col18"]
  Statistics:Num rows: 419376708 Data size: 435732400043 Basic stats: COMPLETE Column stats: NONE
  Group By Operator [GBY_12]
  |  aggregations:["sum(VALUE._col0)","sum(VALUE._col1)","sum(VALUE._col2)","sum(VALUE._col3)","sum(VALUE._col4)","sum(VALUE._col5)","sum(VALUE._col6)","sum(VALUE._col7)","count(DISTINCT KEY._col5:0._col0)","count(DISTINCT KEY._col5:1._col0)","count(DISTINCT KEY._col5:2._col0)","count(DISTINCT KEY._col5:3._col0)"]
  |  keys:KEY._col0 (type: varchar(16)), KEY._col1 (type: decimal(10,0)), KEY._col2 (type: decimal(10,0)), KEY._col3 (type: decimal(1,0)), KEY._col4 (type: string)
  |  outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16"]
  |  Statistics:Num rows: 419376708 Data size: 435732400043 Basic stats: COMPLETE Column stats: NONE
  |<-Map 1 [SIMPLE_EDGE]
  Reduce Output Operator [RS_11]
 key expressions:_col0 (type: varchar(16)), _col1 (type: decimal(10,0)), _col2 (type: decimal(10,0)), _col3 (type: decimal(1,0)), _col4 (type: string), _col5 (type: decimal(20,0)), _col6 (type: decimal(20,0)), _col7 (type: decimal(20,0)), _col8 (type: decimal(20,0))
 Map-reduce partition columns:_col0 (type: varchar(16)), _col1 (type: decimal(10,0)), _col2 (type: decimal(10,0)), _col3 (type: decimal(1,0)), _col4 (type: string)
 sort order:+++++++++
 Statistics:Num rows: 838753416 Data size: 871464800087 Basic stats: COMPLETE Column stats: NONE
 value expressions:_col9 (type: decimal(31,0)), _col10 (type: decimal(31,0)), _col11 (type: decimal(31,0)), _col12 (type: decimal(31,0)), _col13 (type: decimal(21,0)), _col14 (type: decimal(21,0)), _col15 (type: decimal(21,0)), _col16 (type: decimal(21,0))
 Group By Operator [GBY_10]
 aggregations:["sum(CASE WHEN ((_col5 = 2)) THEN ((_col6 + _col7)) ELSE (0) END)","sum(CASE WHEN ((_col5 = 1)) THEN ((_col6 + _col7)) ELSE (0) END)","sum(CASE WHEN ((_col5 = 0)) THEN ((_col6 + _col7)) ELSE (0) END)","sum((_col6 + _col7))","sum(CASE WHEN ((_col5 = 2)) THEN ((_col8 - _col9)) ELSE (0) END)","sum(CASE WHEN ((_col5 = 1)) THEN ((_col8 - _col9)) ELSE (0) END)","sum(CASE WHEN ((_col5 = 0)) THEN ((_col8 - _col9)) ELSE (0) END)","sum((_col8 - _col9))","count(DISTINCT CASE WHEN ((_col5 = 2)) THEN (_col10) END)","count(DISTINCT CASE WHEN ((_col5 = 1)) THEN (_col10) END)","count(DISTINCT CASE WHEN ((_col5 = 0)) THEN (_col10) END)","count(DISTINCT _col10)"]
 keys:_col0 (type: varchar(16)), _col1 (type: decimal(10,0)), _col2 (type: decimal(10,0)), _col4 (type: decimal(1,0)), _col3 (type: string), CASE WHEN ((_col5 = 2)) THEN (_col10) END (type: decimal(20,0)), CASE WHEN ((_col5 = 1)) THEN (_col10) END (type: decimal(20,0)), CASE WHEN ((_col5 = 0)) THEN (_col10) END (type: decimal(20,0)), _col10 (type: decimal(20,0))
 outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17","_col18","_col19","_col20"]
 Statistics:Num rows: 838753416 Data size: 871464800087 Basic stats: COMPLETE Column stats: NONE
 Select Operator [SEL_8]
outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10"]
Statistics:Num rows: 838753416 Data size: 871464800087 Basic stats: COMPLETE Column stats: NONE
Map Join Operator [MAPJOIN_19]
|  condition map:[{"":"Left Outer Join0 to 1"}]
|  HybridGraceHashJoin:true
|  keys:{"Map 3":"_col1 (type: string)","Map 1":"_col10 (type: string)"}
|  outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col11"]
|  Statistics:Num rows: 838753416 Data size: 871464800087 Basic stats: COMPLETE Column stats: NONE
|<-Map 3 [BROADCAST_EDGE] vectorized
|  Reduce Output Operator [RS_22]
| key expressions:_col1 (type: string)
| Map-reduce partition columns:_col1 (type: string)
| sort order:+
| Statistics:Num rows: 65114 Data size: 3581293 Basic stats: COMPLETE Column stats: NONE
| value expressions:_col0 (type: varchar(40))
| Select Operator [OP_21]
| outputColumnNames:["_col0","_col1"]
| Statistics:Num rows: 65114 Data size: 3581293 Basic stats: COMPLETE Column stats: NONE
| TableScan [TS_3]
|alias:dim_dpi_ip_service
|Statistics:Num rows: 65114 Data size: 3581293 Basic stats: COMPLETE Column stats: NONE
|<-Select Operator [SEL_2]
   outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10"]
   Statistics:Num rows: 762503089 Data size: 792240710181 Basic stats: COMPLETE Column stats: NONE
   TableScan [TS_0]
  ACID table:true
  alias:dpi_other_detail_ufdr
  Statistics:Num rows: 762503089 Data size: 792240710181 Basic stats: COMPLETE Column stats: NONE


Time taken: 3.479 seconds, Fetched: 70 row(s)
hive>



Contributor

In log see next errors :

[yarn@ks-dmp03 ~]$ grep -i ERROR /tmp/application_1518080001967_51536.log | grep -v HADOOP_ROOT_LOGGER | grep -v Dtez.root.logger | awk '{print $1 $3 $4 $5 $6 $7 $8 $9 $10 $11 $12 $13 $14 $15 $16}' | sort -u

2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000034_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000035_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000036_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000037_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000038_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000039_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000040_0accordingtoscheduler.Taskreported
******************************
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000107_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000108_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000109_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000110_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000111_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000112_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000114_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000115_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000116_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000117_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000118_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000119_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000120_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000121_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000122_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000124_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TezChild]|tez.ReduceRecordProcessor|:Hiterrorwhileclosingoperators-failingtree
2018-03-07[ERROR][TezChild]|tez.TezProcessor|:java.lang.InterruptedException
[yarn@ks-dmp03 ~]$



@Dmitro Vasilenko The only thing your log is telling is GC information. Can you please share some more details about your query and job logs from yarn?

Contributor
@Dmitro Vasilenko

Can you try this query without map join . Looks like data size is huge for hybrid grace hash join.

set hive.auto.convert.join=false;

And then execute this query.