Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive error: Vertex failed

avatar
Explorer

When I try to insert data from a table into a partitioned bucketed table, I am getting this error:

Status: Failed Vertex failed, vertexName=Map 1, vertexId=vertex_1490155524314_0037_1_00, diagnostics=[Task failed, taskId=task_1490155524314_0037_1_00_000007, diagnostics=[TaskAttempt 0 failed, info=[attempt_1490155524314_0037_1_00_000007_0 being failed for too many output errors. failureFraction=0.125, MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10, MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0], TaskAttempt 1 failed, info=[attempt_1490155524314_0037_1_00_000007_1 being failed for too many output errors. failureFraction=0.125, MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10, MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0], TaskAttempt 2 failed, info=[attempt_1490155524314_0037_1_00_000007_2 being failed for too many output errors. failureFraction=0.125, MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10, MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0], TaskAttempt 3 failed, info=[attempt_1490155524314_0037_1_00_000007_3 being failed for too many output errors. failureFraction=0.125, MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10, MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:14, Vertex vertex_1490155524314_0037_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE] Vertex killed, vertexName=Reducer 2, vertexId=vertex_1490155524314_0037_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:8, Vertex vertex_1490155524314_0037_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE] DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1490155524314_0037_1_00, diagnostics=[Task failed, taskId=task_1490155524314_0037_1_00_000007, diagnostics=[TaskAttempt 0 failed, info=[attempt_1490155524314_0037_1_00_000007_0 being failed for too many output errors. failureFraction=0.125, MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10, MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0], TaskAttempt 1 failed, info=[attempt_1490155524314_0037_1_00_000007_1 being failed for too many output errors. failureFraction=0.125, MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10, MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0], TaskAttempt 2 failed, info=[attempt_1490155524314_0037_1_00_000007_2 being failed for too many output errors. failureFraction=0.125, MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10, MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0], TaskAttempt 3 failed, info=[attempt_1490155524314_0037_1_00_000007_3 being failed for too many output errors. failureFraction=0.125, MAX_ALLOWED_OUTPUT_FAILURES_FRACTION=0.1, uniquefailedOutputReports=1, MAX_ALLOWED_OUTPUT_FAILURES=10, MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC=300, readErrorTimespan=0]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:14, Vertex vertex_1490155524314_0037_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1490155524314_0037_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:8, Vertex vertex_1490155524314_0037_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1

14 REPLIES 14

avatar
Contributor

Hi

Were you able to find the solution? I am facing the same issue.

When i run the query from hive shell or zeppelin as a hive user, then the query works fine.

But if i run it with other user, sometime it works and most of the times i get the Vertex error.

Thanks & Regards

avatar

Hi @kerra I am also having the issue when I get vertex failed error when I tried to insert into table with 'Tez' engine in hive. Could you please mention what is the user permission that needs to be set.

I am running hive with hdfs user.

Thank you.

avatar
Expert Contributor

hi guys, i am having same problem, but when i ran a query (select count(*) from table_name) where table is small it runs successfully, but when table is big i get this error, i checked yarn logs and its seems that the problem occurs while data shuffling, so i traced the problem to the node which received the task , i found this error in (/var/log/hadoop-yarn/yarn/hadoop-yarn-nodemanager-myhost.com.log )


/var/lib/hadoop/yarn/local/usercache/hive/appcache/application_1557491114054_0010/output/attempt_1557491114054_0010_1_03_000000_1_10002/file.out not found


although for other attempts for same application, this file exists normally,

and in the yarn application log, after exiting the beeline session , this error appears


2019-05-14 16:19:58,442 [WARN] [Fetcher_B {Map_1} #0] |shuffle.Fetcher|: copyInputs failed for tasks [InputAttemptIdentifier [inputIdentifier=0, attemptNumber=0, pathComponent=attempt_1557754551780_0155_5_00_000000_0_10003, spillType=0, spillId=-1]]
2019-05-14 16:19:58,442 [INFO] [Fetcher_B {Map_1} #0] |impl.ShuffleManager|: Map_1: Fetch failed for src: InputAttemptIdentifier [inputIdentifier=0, attemptNumber=0, pathComponent=attempt_1557754551780_0155_5_00_000000_0_10003, spillType=0, spillId=-1]InputIdentifier: InputAttemptIdentifier [inputIdentifier=0, attemptNumber=0, pathComponent=attempt_1557754551780_0155_5_00_000000_0_10003, spillType=0, spillId=-1], connectFailed: false
2019-05-14 16:19:58,443 [INFO] [Fetcher_B {Map_1} #1] |HttpConnection.url|: for url=http://myhost_name:13562/mapOutput?job=job_1557754551780_0155&dag=5&reduce=0&map=attempt_1557754551780_0155_5_00_000000_0_10003 sent hash and receievd reply 0 ms
2019-05-14 16:19:58,443 [INFO] [Fetcher_B {Map_1} #1] |shuffle.Fetcher|: Failed to read data to memory for InputAttemptIdentifier [inputIdentifier=0, attemptNumber=0, pathComponent=attempt_1557754551780_0155_5_00_000000_0_10003, spillType=0, spillId=-1]. len=28, decomp=14. ExceptionMessage=Not a valid ifile header
2019-05-14 16:19:58,443 [WARN] [Fetcher_B {Map_1} #1] |shuffle.Fetcher|: Failed to shuffle output of InputAttemptIdentifier [inputIdentifier=0, attemptNumber=0, pathComponent=attempt_1557754551780_0155_5_00_000000_0_10003, spillType=0, spillId=-1] from myhost_name
java.io.IOException: Not a valid ifile header
        at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.verifyHeaderMagic(IFile.java:859)
        at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.isCompressedFlagEnabled(IFile.java:866)
        at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.readToMemory(IFile.java:616)
        at org.apache.tez.runtime.library.common.shuffle.ShuffleUtils.shuffleToMemory(ShuffleUtils.java:121)
        at org.apache.tez.runtime.library.common.shuffle.Fetcher.fetchInputs(Fetcher.java:950)
        at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:599)
        at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:486)
        at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:284)
        at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:76)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)


i am using HDP 3.1

so any suggestions what might be the error ?

Thanks


avatar
Contributor

I'm having same issue with HDP3.1 (Tez 0.9.1).

I can reproduce it with:

1) create two files - file1.csv and file2.csv
2) add two fields to the csv files as below
one,two
one,two
one,two
3) create external table
use testdb;
create external table test1(s1 string, s2 string) row format delimited fields terminated by ',' stored as textfile location '/user/usera/test1';
4) Copy one csv file to hdfs - /user/usera/test1
hdfs dfs -put ./file1.csv /user/usera/test1/
5) select count(*) from testdb.test1;
=> works fine.
6) copy the second csv file to HDFS
hdfs dfs -put ./file2.csv /user/usera/test1/
7) select * from testdb.test1;
=> Can see the data in both hdfs files.
8) select count(*) form testdb.test1;
=> Get this problem.

And we can see following error in the mapper task's log.

2019-05-17 10:08:10,317 [INFO] [Fetcher_B {Map_1} #1] |shuffle.Fetcher|: Failed to read data to memory for InputAttemptIdentifier [inputIdentifier=1, attemptNumber=0, pathComponent=attempt_1557383221332_0289_1_00_000001_0_10003, spillType=0, spillId=-1]. len=25, decomp=11. ExceptionMessage=Not a valid ifile header

2019-05-17 10:08:10,317 [WARN] [Fetcher_B {Map_1} #1] |shuffle.Fetcher|: Failed to shuffle output of InputAttemptIdentifier [inputIdentifier=1, attemptNumber=0, pathComponent=attempt_1557383221332_0289_1_00_000001_0_10003, spillType=0, spillId=-1] from XXXXX

java.io.IOException: Not a valid ifile header

at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.verifyHeaderMagic(IFile.java:859)

at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.isCompressedFlagEnabled(IFile.java:866)

at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.readToMemory(IFile.java:616)

at org.apache.tez.runtime.library.common.shuffle.ShuffleUtils.shuffleToMemory(ShuffleUtils.java:121)

at org.apache.tez.runtime.library.common.shuffle.Fetcher.fetchInputs(Fetcher.java:950)

at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:599)

at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:486)

at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:284)

at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:76)

at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)

at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)

at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)

at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)


I think, it's similar to https://issues.apache.org/jira/browse/TEZ-3699
I've confirmed the patch already applied to tez with HDP 3.1.

So I guess, it's new bug with Tez 0.9.x
(I confirmed there is no problem with HDP2.6/Tez 0.7.0).

Any idea?

avatar
Contributor

Looks, it's tez issue comes from "fs.permissions.umask-mode" setting.

https://community.hortonworks.com/questions/246302/hive-tez-vertex-failed-error-during-reduce-phase-...