<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hive - tez , vertex failed error  during reduce phase - HDP 3.1.0.0-78 in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Hive-tez-vertex-failed-error-during-reduce-phase-HDP-3-1-0-0/m-p/240573#M202377</link>
    <description>&lt;P&gt;Thanks for letting me know!&lt;BR /&gt;&lt;BR /&gt;Is there any estimate/timeline when HDP 3.1 will allow to upgrade the shipped version of TEZ 0.9.1. to a newer release? I don't want to upgrade/patch one component myself because I am afraid I will look the upgradeability of the entire HDP Stack when future releases surface....&lt;/P&gt;</description>
    <pubDate>Thu, 23 May 2019 14:06:36 GMT</pubDate>
    <dc:creator>maurice_knopp</dc:creator>
    <dc:date>2019-05-23T14:06:36Z</dc:date>
    <item>
      <title>Hive - tez , vertex failed error  during reduce phase - HDP 3.1.0.0-78</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-tez-vertex-failed-error-during-reduce-phase-HDP-3-1-0-0/m-p/240565#M202369</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;i am running HDP 3.1 (3.1.0.0-78) , i have 10 data nodes , Hive execution engine is TEZ, when i run a query i get this error&lt;/P&gt;&lt;PRE&gt;ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex re-running, vertexName=Map 1, vertexId=vertex_1557754551780_1091_2_00Vertex re-running, vertexName=Map 1, vertexId=vertex_1557754551780_1091_2_00Vertex re-running, vertexName=Map 1, vertexId=vertex_1557754551780_1091_2_00Vertex failed, vertexName=Map 1, vertexId=vertex_1557754551780_1091_2_00, diagnostics=[Vertex vertex_1557754551780_1091_2_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE, Vertex vertex_1557754551780_1091_2_00 [Map 1] failed as task task_1557754551780_1091_2_00_000001 failed after vertex succeeded.]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
INFO  : Completed executing command(queryId=hive_20190516161715_09090e6d-e513-4fcc-9c96-0b48e9b43822); Time taken: 17.935 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex re-running, vertexName=Map 1, vertexId=vertex_1557754551780_1091_2_00Vertex re-running, vertexName=Map 1, vertexId=vertex_1557754551780_1091_2_00Vertex re-running, vertexName=Map 1, vertexId=vertex_1557754551780_1091_2_00Vertex failed, vertexName=Map 1, vertexId=vertex_1557754551780_1091_2_00, diagnostics=[Vertex vertex_1557754551780_1091_2_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE, Vertex vertex_1557754551780_1091_2_00 [Map 1] failed as task task_1557754551780_1091_2_00_000001 failed after vertex succeeded.]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2)&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;when i traced the logs, for example the application id is (&lt;STRONG&gt;application_1557754551780_1091&lt;/STRONG&gt;), &lt;/P&gt;&lt;P&gt;i checked the path where the output of the Map will be there in (&lt;STRONG&gt;/var/lib/hadoop/yarn/local/usercache/hive/appcache/application_1557754551780_1091/output/attempt_1557754551780_1091_2_00_000000_0_10003&lt;/STRONG&gt;), the below files are created with these permissions :&lt;/P&gt;&lt;PRE&gt;-rw-------. 1 hive hadoop 28 May 16 16:17 file.out
-rw-r-----. 1 hive hadoop 32 May 16 16:17 file.out.index&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;also in the Node manager logs i found this error:&lt;/P&gt;&lt;PRE&gt;2019-05-16 16:19:05,801 INFO  mapred.ShuffleHandler (ShuffleHandler.java:sendMapOutput(1268)) - /var/lib/hadoop/yarn/local/usercache/hive/appcache/application_1557754551780_1091/output/attempt_1557754551780_1091_2_00_000000_0_10003/file.out not found
2019-05-16 16:19:05,818 INFO  mapred.ShuffleHandler (ShuffleHandler.java:sendMapOutput(1268)) - /var/lib/hadoop/yarn/local/usercache/hive/appcache/application_1557754551780_1091/output/attempt_1557754551780_1091_2_00_000000_0_10003/file.out not found
2019-05-16 16:19:05,821 INFO  mapred.ShuffleHandler (ShuffleHandler.java:sendMapOutput(1268)) - /var/lib/hadoop/yarn/local/usercache/hive/appcache/application_1557754551780_1091/output/attempt_1557754551780_1091_2_00_000000_0_10003/file.out not found
2019-05-16 16:19:05,822 INFO  mapred.ShuffleHandler (ShuffleHandler.java:sendMapOutput(1268)) - /var/lib/hadoop/yarn/local/usercache/hive/appcache/application_1557754551780_1091/output/attempt_1557754551780_1091_2_00_000000_0_10003/file.out not found
2019-05-16 16:19:05,824 INFO  mapred.ShuffleHandler (ShuffleHandler.java:sendMapOutput(1268)) - /var/lib/hadoop/yarn/local/usercache/hive/appcache/application_1557754551780_1091/output/attempt_1557754551780_1091_2_00_000000_0_10003/file.out not found
2019-05-16 16:19:05,826 INFO  mapred.ShuffleHandler (ShuffleHandler.java:sendMapOutput(1268)) - /var/lib/hadoop/yarn/local/usercache/hive/appcache/application_1557754551780_1091/output/attempt_1557754551780_1091_2_00_000000_0_10003/file.out not found&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;which means that &lt;STRONG&gt;file.out &lt;/STRONG&gt; wont be readable by the yarn user, which leads the whole task to fail&lt;/P&gt;&lt;P&gt;i also checked the parent directory permissions, i checked the umask for all users (0022), which means that the files inside the output directory should be readable by other users in same group&lt;/P&gt;&lt;PRE&gt;drwx--x---. 3 hive hadoop 16 May 16 16:16 filecache
drwxr-s---. 3 hive hadoop 60 May 16 16:16 output&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;I reran the whole scenario on different cluster, and i see that the &lt;STRONG&gt;file.out &lt;/STRONG&gt;has same permissions as &lt;STRONG&gt;file.out.index &lt;/STRONG&gt;, and the queries are running fine without any problems (cluster HDP version : 3.0.1.0-187), also when i switched to yarn user, and used vi to make sure that yarn user is able to read content of file.out and it was able to.&lt;/P&gt;&lt;PRE&gt;-rw-r-----. 1 hive hadoop 28 May 16 16:17 file.out
-rw-r-----. 1 hive hadoop 32 May 16 16:17 file.out.index&lt;/PRE&gt;&lt;P&gt;When i shutdown all the node managers and only 1 is up and running, all the queries are running fine, but also the &lt;STRONG&gt;file.out &lt;/STRONG&gt;is still being created with same permissions , but i guess as everything is running on same node then &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;N.B : we upgraded from HDP 2.6.2 to HDP 3.1.0.0-78 &lt;/P&gt;</description>
      <pubDate>Fri, 17 May 2019 01:49:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-tez-vertex-failed-error-during-reduce-phase-HDP-3-1-0-0/m-p/240565#M202369</guid>
      <dc:creator>tarekabouzeid91</dc:creator>
      <dc:date>2019-05-17T01:49:30Z</dc:date>
    </item>
    <item>
      <title>Re: Hive - tez , vertex failed error  during reduce phase - HDP 3.1.0.0-78</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-tez-vertex-failed-error-during-reduce-phase-HDP-3-1-0-0/m-p/240566#M202370</link>
      <description>&lt;P&gt;Hi, we are facing more or less exactly the same issue on HDP 3.1.0.0-78 on a Cluster with 11 nodes.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Maybe we can talk / chat and work out a solution. I contacted you on LinkedIn &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 17 May 2019 15:40:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-tez-vertex-failed-error-during-reduce-phase-HDP-3-1-0-0/m-p/240566#M202370</guid>
      <dc:creator>maurice_knopp</dc:creator>
      <dc:date>2019-05-17T15:40:16Z</dc:date>
    </item>
    <item>
      <title>Re: Hive - tez , vertex failed error  during reduce phase - HDP 3.1.0.0-78</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-tez-vertex-failed-error-during-reduce-phase-HDP-3-1-0-0/m-p/240567#M202371</link>
      <description>&lt;P&gt;yeah, sure will happily work with you to get this fixed&lt;/P&gt;</description>
      <pubDate>Fri, 17 May 2019 16:49:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-tez-vertex-failed-error-during-reduce-phase-HDP-3-1-0-0/m-p/240567#M202371</guid>
      <dc:creator>tarekabouzeid91</dc:creator>
      <dc:date>2019-05-17T16:49:35Z</dc:date>
    </item>
    <item>
      <title>Re: Hive - tez , vertex failed error  during reduce phase - HDP 3.1.0.0-78</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-tez-vertex-failed-error-during-reduce-phase-HDP-3-1-0-0/m-p/240568#M202372</link>
      <description>&lt;P&gt;So far, it seems that our issues were solved by setting the HDFS Setting "fs.permissions.umask-mode" to the value of "022". In our HDP 2.7 installation, this was the case out of the box. HDP 3.1 seems to have a default value of 077 - which doesn't work for us and yields the error mentioned above.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;We've done some intensive testing right now and the value 022 seems to work and has solved our problems, as far as I can tell now. It would be great if you guys could verify or falify the issue on your installation as well.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Let me know if I can help you with anything!&lt;/P&gt;</description>
      <pubDate>Fri, 17 May 2019 22:23:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-tez-vertex-failed-error-during-reduce-phase-HDP-3-1-0-0/m-p/240568#M202372</guid>
      <dc:creator>maurice_knopp</dc:creator>
      <dc:date>2019-05-17T22:23:13Z</dc:date>
    </item>
    <item>
      <title>Re: Hive - tez , vertex failed error  during reduce phase - HDP 3.1.0.0-78</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-tez-vertex-failed-error-during-reduce-phase-HDP-3-1-0-0/m-p/240569#M202373</link>
      <description>&lt;P&gt;glad to work with you and your team to get this issue fixed&lt;/P&gt;</description>
      <pubDate>Sat, 18 May 2019 05:12:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-tez-vertex-failed-error-during-reduce-phase-HDP-3-1-0-0/m-p/240569#M202373</guid>
      <dc:creator>tarekabouzeid91</dc:creator>
      <dc:date>2019-05-18T05:12:48Z</dc:date>
    </item>
    <item>
      <title>Re: Hive - tez , vertex failed error  during reduce phase - HDP 3.1.0.0-78</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-tez-vertex-failed-error-during-reduce-phase-HDP-3-1-0-0/m-p/240570#M202374</link>
      <description>&lt;DIV class="fr-view clearfix"&gt;&lt;P&gt;also for more documentation about how we found the solution, in this tez jira ticket &lt;A rel="noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer noopener noreferrer" href="https://issues.apache.org/jira/browse/TEZ-3894" target="_blank"&gt;https://issues.apache.org/jira/browse/TEZ-3894&lt;/A&gt; its mentioned that tez is getting its intermediate files permissions from "fs.permissions.umask-mode" in our dev environment it was set to 022 but 077 in prod and it was same for you as well so thats how we figured this out, also it was difficult as the file.out.index was created with the correct permission but not the file.out which was causing the result of map not readable by yarn user&lt;/P&gt;&lt;/DIV&gt;</description>
      <pubDate>Sat, 18 May 2019 05:27:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-tez-vertex-failed-error-during-reduce-phase-HDP-3-1-0-0/m-p/240570#M202374</guid>
      <dc:creator>tarekabouzeid91</dc:creator>
      <dc:date>2019-05-18T05:27:19Z</dc:date>
    </item>
    <item>
      <title>Re: Hive - tez , vertex failed error  during reduce phase - HDP 3.1.0.0-78</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-tez-vertex-failed-error-during-reduce-phase-HDP-3-1-0-0/m-p/240571#M202375</link>
      <description>&lt;P&gt;I had same issue, and we are using HDP 3.1.0.0-78 .&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/release-notes/content/patch_tez.html"&gt;https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/release-notes/content/patch_tez.html&lt;/A&gt; &lt;/P&gt;&lt;P&gt;TEZ-3894 seems to be already applied to HDP 3.1. (Also, I've checked the source code a little, yes, it looks already applied.)&lt;/P&gt;&lt;P&gt;But I still have this issue...&lt;/P&gt;&lt;P&gt;I can avoid this issue by changing fs.permissions.umask-mode from "077" to "022" in a HS2 session.&lt;/P&gt;&lt;TABLE style="width:100%"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD style="width:100%"&gt;0: jdbc:hive2://XXXX &amp;gt; set fs.permissions.umask-mode=022;&lt;BR /&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;So I guess, this issue may not be fixed completely with TEZ-3894 (with HDP 3.1.0.0-78)...&lt;/P&gt;</description>
      <pubDate>Thu, 23 May 2019 00:54:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-tez-vertex-failed-error-during-reduce-phase-HDP-3-1-0-0/m-p/240571#M202375</guid>
      <dc:creator>tomo_hirano</dc:creator>
      <dc:date>2019-05-23T00:54:18Z</dc:date>
    </item>
    <item>
      <title>Re: Hive - tez , vertex failed error  during reduce phase - HDP 3.1.0.0-78</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-tez-vertex-failed-error-during-reduce-phase-HDP-3-1-0-0/m-p/240572#M202376</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/113798/mauriceknopp.html" nodeid="113798"&gt;@Maurice Knopp&lt;/A&gt; We recently saw that TEZ-3894 only fixes the issue partially. If you job ends up spinning multiple mappers then you are likely to hit a variant of TEZ-3894 although on surface it appears to be same.&lt;BR /&gt;For permanent fix, you may want to get a patch for &lt;A href="https://issues.apache.org/jira/browse/TEZ-4057"&gt;https://issues.apache.org/jira/browse/TEZ-4057&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 23 May 2019 10:41:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-tez-vertex-failed-error-during-reduce-phase-HDP-3-1-0-0/m-p/240572#M202376</guid>
      <dc:creator>dineshc</dc:creator>
      <dc:date>2019-05-23T10:41:46Z</dc:date>
    </item>
    <item>
      <title>Re: Hive - tez , vertex failed error  during reduce phase - HDP 3.1.0.0-78</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-tez-vertex-failed-error-during-reduce-phase-HDP-3-1-0-0/m-p/240573#M202377</link>
      <description>&lt;P&gt;Thanks for letting me know!&lt;BR /&gt;&lt;BR /&gt;Is there any estimate/timeline when HDP 3.1 will allow to upgrade the shipped version of TEZ 0.9.1. to a newer release? I don't want to upgrade/patch one component myself because I am afraid I will look the upgradeability of the entire HDP Stack when future releases surface....&lt;/P&gt;</description>
      <pubDate>Thu, 23 May 2019 14:06:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-tez-vertex-failed-error-during-reduce-phase-HDP-3-1-0-0/m-p/240573#M202377</guid>
      <dc:creator>maurice_knopp</dc:creator>
      <dc:date>2019-05-23T14:06:36Z</dc:date>
    </item>
    <item>
      <title>Re: Hive - tez , vertex failed error  during reduce phase - HDP 3.1.0.0-78</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-tez-vertex-failed-error-during-reduce-phase-HDP-3-1-0-0/m-p/240574#M202378</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/113798/mauriceknopp.html" nodeid="113798"&gt;@Maurice Knopp&lt;/A&gt; We do not yet have any planned dates yet. However, if you are an Enterprise Support customer, you can ask for a hotfix and you will be provided a patch jar which is very easy to replace on all machines with Tez.&lt;/P&gt;</description>
      <pubDate>Thu, 23 May 2019 20:26:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-tez-vertex-failed-error-during-reduce-phase-HDP-3-1-0-0/m-p/240574#M202378</guid>
      <dc:creator>dineshc</dc:creator>
      <dc:date>2019-05-23T20:26:32Z</dc:date>
    </item>
    <item>
      <title>Re: Hive - tez , vertex failed error  during reduce phase - HDP 3.1.0.0-78</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-tez-vertex-failed-error-during-reduce-phase-HDP-3-1-0-0/m-p/322134#M228690</link>
      <description>&lt;P&gt;Had the same issue on CDP 7.1.6, which comes with Tez 0.9.1.&lt;/P&gt;&lt;P&gt;Looks like this: &lt;A href="https://issues.apache.org/jira/browse/TEZ-4057" target="_blank"&gt;https://issues.apache.org/jira/browse/TEZ-4057&lt;/A&gt;&lt;/P&gt;&lt;P&gt;One workaround (probably not 100% secure) is to add the yarn user to the hive group:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;usermod -a -G hive yarn&lt;/LI-CODE&gt;&lt;P&gt;This needs to be done on all nodes and requires Yarn services restart.&lt;/P&gt;&lt;P&gt;After that the issue has gone, no more random errors for Hive on Tez anymore.&lt;/P&gt;</description>
      <pubDate>Fri, 06 Aug 2021 08:25:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-tez-vertex-failed-error-during-reduce-phase-HDP-3-1-0-0/m-p/322134#M228690</guid>
      <dc:creator>mzinal</dc:creator>
      <dc:date>2021-08-06T08:25:02Z</dc:date>
    </item>
  </channel>
</rss>

