<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Tez: vertex failed due to it's own failure;DAG did not succeed due to VERTEX_FAILURE. in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Tez-vertex-failed-due-to-it-s-own-failure-DAG-did-not/m-p/219564#M181452</link>
    <description>&lt;P&gt;hi, &lt;/P&gt;&lt;P&gt;i am having the same problem after upgrading from HDP 2.6.2 to HDp 3.1, although i have alot of resources in the cluster, when i ran a query (select count(*) from table) if the table is small (3k records) it ran successfully, if the table is larger (50k) records i am getting the same vertex failure error, i checked the yarn application log for the failed query i get the below error &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;PRE&gt;2019-05-14 11:58:14,823 [INFO] [TezChild] |tez.ReduceRecordProcessor|: Starting Output: out_Reducer 2
2019-05-14 11:58:14,828 [INFO] [TezChild] |compress.CodecPool|: Got brand-new decompressor [.snappy]
2019-05-14 11:58:18,466 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Routing events from heartbeat response to task, currentTaskAttemptId=attempt_1557754551780_0137_1_01_000000_0, eventCount=1 fromEventId=1 nextFromEventId=2
2019-05-14 11:58:18,488 [INFO] [Fetcher_B {Map_1} #1] |HttpConnection.url|: for url=http://myhost_name.com:13562/mapOutput?job=job_1557754551780_0137&amp;amp;dag=1&amp;amp;reduce=0&amp;amp;map=attempt_1557754551780_0137_1_00_000000_0_10002 sent hash and receievd reply 0 ms
2019-05-14 11:58:18,491 [INFO] [Fetcher_B {Map_1} #1] |shuffle.Fetcher|: Failed to read data to memory for InputAttemptIdentifier [inputIdentifier=0, attemptNumber=0, pathComponent=attempt_1557754551780_0137_1_00_000000_0_10002, spillType=0, spillId=-1]. len=28, decomp=14. ExceptionMessage=Not a valid ifile header
2019-05-14 11:58:18,492 [WARN] [Fetcher_B {Map_1} #1] |shuffle.Fetcher|: Failed to shuffle output of InputAttemptIdentifier [inputIdentifier=0, attemptNumber=0, pathComponent=attempt_1557754551780_0137_1_00_000000_0_10002, spillType=0, spillId=-1] from myhost_name.com
java.io.IOException: Not a valid ifile header
	at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.verifyHeaderMagic(IFile.java:859)
	at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.isCompressedFlagEnabled(IFile.java:866)
	at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.readToMemory(IFile.java:616)
	at org.apache.tez.runtime.library.common.shuffle.ShuffleUtils.shuffleToMemory(ShuffleUtils.java:121)
	at org.apache.tez.runtime.library.common.shuffle.Fetcher.fetchInputs(Fetcher.java:950)
	at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:599)
	at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:486)
	at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:284)
	at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:76)
	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;both queries were working fine before upgrade, the only change i made after the upgrade is increasing the heapsize of the data nodes, i also followed &lt;A rel="user" href="https://community.cloudera.com/users/1271/sheltong.html" nodeid="1271"&gt;@Geoffrey Shelton Okot&lt;/A&gt; configuration but still same error. &lt;/P&gt;&lt;P&gt;thanks &lt;/P&gt;</description>
    <pubDate>Wed, 15 May 2019 21:59:27 GMT</pubDate>
    <dc:creator>tarekabouzeid91</dc:creator>
    <dc:date>2019-05-15T21:59:27Z</dc:date>
  </channel>
</rss>

