Created 06-05-2017 12:31 PM
Hi,
I am facing Broken pipe issue in Hive while streaming larger number of rows (33000) into python after multiple joins. Same script works fine till 7656 number of rows.
0: jdbc:hive2://xxx..xx.xx.xxx:10000> insert overwrite table test.transform_inner_temp select * from test.transform_inner_temp_view2 limit 7656; INFO : Session is already open INFO : Dag name: insert overwrite table test.transf...7656(Stage-1) INFO : INFO : Status: Running (Executing on YARN cluster with App id application_xxxxxxxxxxx_xxxx) INFO : Map 1: 0(+2)/2 Reducer 2: 0/1 INFO : Map 1: 1(+1)/2 Reducer 2: 0(+1)/1 INFO : Map 1: 1(+1)/2 Reducer 2: 0(+1)/1 INFO : Map 1: 1(+1)/2 Reducer 2: 0(+1)/1 INFO : Map 1: 1(+1)/2 Reducer 2: 0(+1)/1 INFO : Map 1: 1(+1)/2 Reducer 2: 0(+1)/1 INFO : Map 1: 2/2 Reducer 2: 0(+1)/1 INFO : Map 1: 2/2 Reducer 2: 1/1 INFO : Loading data to table test.transform_inner_temp from hdfs://az-xxxxxxxxxxx.com:8020/apps/hive/warehouse/test.db/transform_inner_temp/.hive-staging_hive_2017-06-05_10-58-09_863_348122508386369397-929/-ext-10000 INFO : Table test.transform_inner_temp stats: [numFiles=1, numRows=7656, totalSize=4447368, rawDataSize=4439712] No rows affected (19.867 seconds) 0: jdbc:hive2://xxx.xx.xxx.xxxx:10000> insert overwrite table test.transform_inner_temp select * from test.transform_inner_temp_view2; INFO : Session is already open INFO : Dag name: insert overwrite tabl...orm_inner_temp_view2(Stage-1) INFO : Tez session was closed. Reopening... INFO : Session re-established. INFO : INFO : Status: Running (Executing on YARN cluster with App id application_xxxxxxxxxx_xxxxx) INFO : Map 1: -/- INFO : Map 1: 0/2 INFO : Map 1: 0/2 INFO : Map 1: 0(+2)/2 INFO : Map 1: 0(+2)/2 INFO : Map 1: 1(+1)/2 INFO : Map 1: 1(+1)/2 INFO : Map 1: 1(+1)/2 INFO : Map 1: 1(+1)/2 INFO : Map 1: 1(+1)/2 INFO : Map 1: 1(+1,-1)/2 INFO : Map 1: 1(+1,-1)/2 INFO : Map 1: 1(+1,-1)/2 INFO : Map 1: 1(+1,-1)/2 INFO : Map 1: 1(+1,-1)/2 INFO : Map 1: 1(+1,-1)/2 INFO : Map 1: 1(+1,-1)/2 INFO : Map 1: 1(+1,-1)/2 INFO : Map 1: 1(+1,-2)/2 INFO : Map 1: 1(+1,-2)/2 INFO : Map 1: 1(+1,-2)/2 INFO : Map 1: 1(+1,-2)/2 INFO : Map 1: 1(+1,-2)/2 INFO : Map 1: 1(+1,-2)/2 INFO : Map 1: 1(+1,-2)/2 INFO : Map 1: 1(+1,-2)/2 INFO : Map 1: 1(+1,-3)/2 INFO : Map 1: 1(+1,-3)/2 INFO : Map 1: 1(+1,-3)/2 INFO : Map 1: 1(+1,-3)/2 INFO : Map 1: 1(+1,-3)/2 INFO : Map 1: 1(+1,-3)/2 INFO : Map 1: 1(+1,-3)/2 INFO : Map 1: 1(+1,-3)/2 ERROR : Status: Failed ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1491563674918_58393_1_00, diagnostics=[Task failed, taskId=task_1491563674918_58393_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:565) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83) ... 17 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: An error occurred while reading or writing to your custom script. It may have crashed with an error. at org.apache.hadoop.hive.ql.exec.ScriptOperator.process(ScriptOperator.java:456) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555) ... 18 more Caused by: java.io.IOException: Broken pipe at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) at java.io.DataOutputStream.write(DataOutputStream.java:107) at org.apache.hadoop.hive.ql.exec.TextRecordWriter.write(TextRecordWriter.java:53) at org.apache.hadoop.hive.ql.exec.ScriptOperator.process(ScriptOperator.java:425) ... 24 more ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1491563674918_58393_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2)
I could see same kind of issue only with join condition alone in below link.
https://community.hortonworks.com/questions/11025/hive-query-issue.html
I tried the solution which suggested in above link by setting hive.vectorized.execution.enabled=false; But issue is same in my case. I thought issue might be in multiple joins so I have written all the join results in a single table and tried to stream into python. Still issue is same.
If the script is wrong then it shouldn't work fine with smaller rows. Hope the script is fine and the issue is something with hive set up. I have put lot of effort in internet but couldn't figure it out similar kind of issue. Please provide your suggestions/solutions.
Apache Hive (version 1.2.1000.2.5.3.0-37) with beeline.
Created 06-12-2017 07:07 AM
Check the application log for the query run and also, Resource Manager entries for that application.
The issue could be more than the 'Broken pipe'.