Support Questions

Find answers, ask questions, and share your expertise

Failure one of Reducer (on join two tables in hive2)

avatar
Expert Contributor
7 REPLIES 7

avatar
Expert Contributor
@Dmitro Vasilenko

I see frequent full GC in logs. Can you share yarn application log and the query that you executed?

Also, share explain plan for the query.

avatar
Expert Contributor

Application log very big:

[yarn@serv03 ~]$ ls -hl /tmp/application_1518080001967_51536.log -rw-r--r-- 1 yarn hadoop 89M Mar 7 18:10 /tmp/application_1518080001967_51536.log

avatar
Expert Contributor
[yarn@ks-dmp03 ~]$ hive
log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAp


Logging initialized using configuration in file:/etc/hive/2.6.3.0-235/0/hive-log
hive> explain WITH DPI AS (SELECT SERVICE_NAME, trim(IP_ADDRESS) AS IP_ADDR FROM OTHER_SOURCES.DIM_DPI_IP_SERVICE), DSD AS (SELECT MSISDN, NVL(PROT_TYPE,0) AS PROTOCOL_ID, NVL(PROT_CATEGORY,0) AS PROTOCOL_CATEGORY_ID, TETHERING_FLAG, RAT, NETWORK_UL_TRAFFIC, NETWORK_DL_TRAFFIC, END_TIME_N, BEGIN_TIME_N, SID, trim(SERVER_IP) AS SERV_IP FROM OTHER_SOURCES.DPI_OTHER_DETAIL_UFDR WHERE HDAY="2018-03-06"), JN AS (SELECT DSD.MSISDN, DSD.PROTOCOL_ID, DSD.PROTOCOL_CATEGORY_ID, NVL(DPI.SERVICE_NAME,"Other") AS CUSTOMIZED_CATEGORY, DSD.TETHERING_FLAG, DSD.RAT, DSD.NETWORK_UL_TRAFFIC, DSD.NETWORK_DL_TRAFFIC, DSD.END_TIME_N, DSD.BEGIN_TIME_N, DSD.SID FROM DSD LEFT JOIN DPI ON DSD.SERV_IP = DPI.IP_ADDR) INSERT INTO AGG_DPI.AGG_DPI_TRAFFIC_DAILY_l PARTITION(HDAY="2018-03-06",SRC="OTHER_DETAIL") SELECT JN.MSISDN, JN.PROTOCOL_ID, JN.PROTOCOL_CATEGORY_ID, JN.CUSTOMIZED_CATEGORY, JN.TETHERING_FLAG, ROUND(SUM(CASE WHEN JN.RAT = 2 THEN JN.NETWORK_UL_TRAFFIC+JN.NETWORK_DL_TRAFFIC ELSE 0 END)/ 1024, 2) AS DATA_VOL_2G_AMT, ROUND(SUM(CASE WHEN JN.RAT = 1 THEN JN.NETWORK_UL_TRAFFIC+JN.NETWORK_DL_TRAFFIC ELSE 0 END)/ 1024, 2) AS DATA_VOL_3G_AMT, ROUND(SUM(CASE WHEN JN.RAT = 0 THEN JN.NETWORK_UL_TRAFFIC+JN.NETWORK_DL_TRAFFIC ELSE 0 END)/ 1024, 2) AS DATA_VOL_0_AMT, ROUND(SUM(JN.NETWORK_UL_TRAFFIC+JN.NETWORK_DL_TRAFFIC)/ 1024, 2) DATA_VOL_TOTAL_AMT, SUM(CASE WHEN JN.RAT = 2 THEN END_TIME_N - BEGIN_TIME_N ELSE 0 END) DATA_SESSIONS_DUR_2G, SUM(CASE WHEN JN.RAT = 1 THEN END_TIME_N - BEGIN_TIME_N ELSE 0 END) DATA_SESSIONS_DUR_3G, SUM(CASE WHEN JN.RAT = 0 THEN END_TIME_N - BEGIN_TIME_N ELSE 0 END) DATA_SESSIONS_DUR_0, SUM(END_TIME_N - BEGIN_TIME_N) DATA_SESSIONS_DUR, COUNT(DISTINCT CASE WHEN JN.RAT = 2 THEN JN.SID END) AS DATA_SESSIONS_2G_CNT, COUNT(DISTINCT CASE WHEN JN.RAT = 1 THEN JN.SID END) AS DATA_SESSIONS_3G_CNT, COUNT(DISTINCT CASE WHEN JN.RAT = 0 THEN JN.SID END) AS DATA_SESSIONS_0_CNT, COUNT(DISTINCT JN.SID) AS DATA_SESSIONS_CNT, 1 AS LOAD_ID, CURRENT_TIMESTAMP AS INS_DT FROM JN GROUP BY JN.MSISDN, JN.PROTOCOL_ID, JN.PROTOCOL_CATEGORY_ID, JN.TETHERING_FLAG, JN.CUSTOMIZED_CATEGORY;
OK
Plan not optimized by CBO due to missing statistics. Please check log for more details.



avatar
Expert Contributor

Vertex dependency in root stage
Map 1 <- Map 3 (BROADCAST_EDGE)
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Stage-3
   Stats-Aggr Operator
  Stage-0
  Move Operator
 partition:{"hday":"2018-03-06","src":"OTHER_DETAIL"}
 table:{"name:":"agg_dpi.agg_dpi_traffic_daily_l","input format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde"}
 Stage-2
 Dependency Collection{}
Stage-1
Reducer 2
File Output Operator [FS_15]
   compressed:true
   Statistics:Num rows: 419376708 Data size: 435732400043 Basic stats: COMPLETE Column stats: NONE
   table:{"name:":"agg_dpi.agg_dpi_traffic_daily_l","input format:":"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat","output format:":"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat","serde:":"org.apache.hadoop.hive.ql.io.orc.OrcSerde"}
   Select Operator [SEL_13]
  outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17","_col18"]
  Statistics:Num rows: 419376708 Data size: 435732400043 Basic stats: COMPLETE Column stats: NONE
  Group By Operator [GBY_12]
  |  aggregations:["sum(VALUE._col0)","sum(VALUE._col1)","sum(VALUE._col2)","sum(VALUE._col3)","sum(VALUE._col4)","sum(VALUE._col5)","sum(VALUE._col6)","sum(VALUE._col7)","count(DISTINCT KEY._col5:0._col0)","count(DISTINCT KEY._col5:1._col0)","count(DISTINCT KEY._col5:2._col0)","count(DISTINCT KEY._col5:3._col0)"]
  |  keys:KEY._col0 (type: varchar(16)), KEY._col1 (type: decimal(10,0)), KEY._col2 (type: decimal(10,0)), KEY._col3 (type: decimal(1,0)), KEY._col4 (type: string)
  |  outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16"]
  |  Statistics:Num rows: 419376708 Data size: 435732400043 Basic stats: COMPLETE Column stats: NONE
  |<-Map 1 [SIMPLE_EDGE]
  Reduce Output Operator [RS_11]
 key expressions:_col0 (type: varchar(16)), _col1 (type: decimal(10,0)), _col2 (type: decimal(10,0)), _col3 (type: decimal(1,0)), _col4 (type: string), _col5 (type: decimal(20,0)), _col6 (type: decimal(20,0)), _col7 (type: decimal(20,0)), _col8 (type: decimal(20,0))
 Map-reduce partition columns:_col0 (type: varchar(16)), _col1 (type: decimal(10,0)), _col2 (type: decimal(10,0)), _col3 (type: decimal(1,0)), _col4 (type: string)
 sort order:+++++++++
 Statistics:Num rows: 838753416 Data size: 871464800087 Basic stats: COMPLETE Column stats: NONE
 value expressions:_col9 (type: decimal(31,0)), _col10 (type: decimal(31,0)), _col11 (type: decimal(31,0)), _col12 (type: decimal(31,0)), _col13 (type: decimal(21,0)), _col14 (type: decimal(21,0)), _col15 (type: decimal(21,0)), _col16 (type: decimal(21,0))
 Group By Operator [GBY_10]
 aggregations:["sum(CASE WHEN ((_col5 = 2)) THEN ((_col6 + _col7)) ELSE (0) END)","sum(CASE WHEN ((_col5 = 1)) THEN ((_col6 + _col7)) ELSE (0) END)","sum(CASE WHEN ((_col5 = 0)) THEN ((_col6 + _col7)) ELSE (0) END)","sum((_col6 + _col7))","sum(CASE WHEN ((_col5 = 2)) THEN ((_col8 - _col9)) ELSE (0) END)","sum(CASE WHEN ((_col5 = 1)) THEN ((_col8 - _col9)) ELSE (0) END)","sum(CASE WHEN ((_col5 = 0)) THEN ((_col8 - _col9)) ELSE (0) END)","sum((_col8 - _col9))","count(DISTINCT CASE WHEN ((_col5 = 2)) THEN (_col10) END)","count(DISTINCT CASE WHEN ((_col5 = 1)) THEN (_col10) END)","count(DISTINCT CASE WHEN ((_col5 = 0)) THEN (_col10) END)","count(DISTINCT _col10)"]
 keys:_col0 (type: varchar(16)), _col1 (type: decimal(10,0)), _col2 (type: decimal(10,0)), _col4 (type: decimal(1,0)), _col3 (type: string), CASE WHEN ((_col5 = 2)) THEN (_col10) END (type: decimal(20,0)), CASE WHEN ((_col5 = 1)) THEN (_col10) END (type: decimal(20,0)), CASE WHEN ((_col5 = 0)) THEN (_col10) END (type: decimal(20,0)), _col10 (type: decimal(20,0))
 outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14","_col15","_col16","_col17","_col18","_col19","_col20"]
 Statistics:Num rows: 838753416 Data size: 871464800087 Basic stats: COMPLETE Column stats: NONE
 Select Operator [SEL_8]
outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10"]
Statistics:Num rows: 838753416 Data size: 871464800087 Basic stats: COMPLETE Column stats: NONE
Map Join Operator [MAPJOIN_19]
|  condition map:[{"":"Left Outer Join0 to 1"}]
|  HybridGraceHashJoin:true
|  keys:{"Map 3":"_col1 (type: string)","Map 1":"_col10 (type: string)"}
|  outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col11"]
|  Statistics:Num rows: 838753416 Data size: 871464800087 Basic stats: COMPLETE Column stats: NONE
|<-Map 3 [BROADCAST_EDGE] vectorized
|  Reduce Output Operator [RS_22]
| key expressions:_col1 (type: string)
| Map-reduce partition columns:_col1 (type: string)
| sort order:+
| Statistics:Num rows: 65114 Data size: 3581293 Basic stats: COMPLETE Column stats: NONE
| value expressions:_col0 (type: varchar(40))
| Select Operator [OP_21]
| outputColumnNames:["_col0","_col1"]
| Statistics:Num rows: 65114 Data size: 3581293 Basic stats: COMPLETE Column stats: NONE
| TableScan [TS_3]
|alias:dim_dpi_ip_service
|Statistics:Num rows: 65114 Data size: 3581293 Basic stats: COMPLETE Column stats: NONE
|<-Select Operator [SEL_2]
   outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10"]
   Statistics:Num rows: 762503089 Data size: 792240710181 Basic stats: COMPLETE Column stats: NONE
   TableScan [TS_0]
  ACID table:true
  alias:dpi_other_detail_ufdr
  Statistics:Num rows: 762503089 Data size: 792240710181 Basic stats: COMPLETE Column stats: NONE


Time taken: 3.479 seconds, Fetched: 70 row(s)
hive>



avatar
Expert Contributor

In log see next errors :

[yarn@ks-dmp03 ~]$ grep -i ERROR /tmp/application_1518080001967_51536.log | grep -v HADOOP_ROOT_LOGGER | grep -v Dtez.root.logger | awk '{print $1 $3 $4 $5 $6 $7 $8 $9 $10 $11 $12 $13 $14 $15 $16}' | sort -u

2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000034_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000035_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000036_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000037_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000038_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000039_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000040_0accordingtoscheduler.Taskreported
******************************
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000107_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000108_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000109_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000110_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000111_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000112_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000114_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000115_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000116_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000117_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000118_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000119_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000120_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000121_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000122_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TaskSchedulerEventHandlerThread]|rm.TaskSchedulerEventHandler|:Nocontainerallocatedtotask:attempt_1518080001967_51536_3_02_000124_0accordingtoscheduler.Taskreported
2018-03-07[ERROR][TezChild]|tez.ReduceRecordProcessor|:Hiterrorwhileclosingoperators-failingtree
2018-03-07[ERROR][TezChild]|tez.TezProcessor|:java.lang.InterruptedException
[yarn@ks-dmp03 ~]$



avatar

@Dmitro Vasilenko The only thing your log is telling is GC information. Can you please share some more details about your query and job logs from yarn?

avatar
Expert Contributor
@Dmitro Vasilenko

Can you try this query without map join . Looks like data size is huge for hybrid grace hash join.

set hive.auto.convert.join=false;

And then execute this query.