<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: what is the default behavior of insert overwrite on external hdfs table? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/what-is-the-default-behavior-of-insert-overwrite-on-external/m-p/307719#M223338</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;We have exactly the same trouble.&lt;/P&gt;&lt;P&gt;At first, be sure that Insert Overwite does not create the folder of your Hive table.&lt;/P&gt;&lt;P&gt;Your folder has been created during the "CREATE TABLE..... LOCATION 'hdfs://titan/dev/10112/app/TC30/dataiku/CONFIG_ANOTHER_TEST/output'...."&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In fact, you are using an old version of DSS (Data Science Studio) from Dataiku. For us, we are using the 5.0.3 DSS version, knowing that at this time, 8.0.4 is the last release.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;As you said : " the vendor application first delete the folder then do the insert overwrite on the external table". That's true.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But you have to know that Hive commands, like INSERT OVERWRITE or ANALYZE TABLE...COMPUTE STATISTICS, create a temporary folder.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You have to look at this line of your log :&lt;/P&gt;&lt;P&gt;Loading data to table dev_tc30_dataiku.config_another_test_output from hdfs://titan/tmp/&lt;STRONG&gt;.hive-staging_hive_2018-11-21_10-45-41_452_43360044430205414-24417&lt;/STRONG&gt;/-ext-10000&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;An .hive_staging..... temporary folder has been created under /tmp directory.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please go into Ambari, and look at this Hive parameter &lt;STRONG&gt;hive.exec.stagingdir&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I am sure that this actual value is &lt;STRONG&gt;/tmp/.hive_staging&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Please modify this value into default value &lt;STRONG&gt;.hive_staging&lt;/STRONG&gt;, and test again using DSS.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Normally, Hive will now create a temporary folder under &lt;STRONG&gt;hdfs://titan/dev/10112/app/TC30/dataiku/CONFIG_ANOTHER_TEST/output/.hive-staging........&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Of course, this .hive-staging... directory is temporary, but it is important to notice that the &lt;STRONG&gt;output&lt;/STRONG&gt; folder has been recreated, and so the Insert Overwrite will be Ok.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For us, we have the same trouble because we did an upgrade of our HDP cluster from 2.6.1 into 2.6.5 release; and we have to rollback the &lt;STRONG&gt;hive.exec.stagingdir&lt;/STRONG&gt; parameter from &lt;STRONG&gt;/tmp/.hive_staging&lt;/STRONG&gt; into &lt;STRONG&gt;.hive_staging&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Your ticket has been created in 11-21-2018, so at this date, the latest release of DSS was 5.0.3&lt;/P&gt;&lt;P&gt;Please notice that in next releases, Dataiku has modified this behaviour, and I don't think that now the Hive folder is always deleted.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Gilles&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 15 Dec 2020 18:34:54 GMT</pubDate>
    <dc:creator>rouardg</dc:creator>
    <dc:date>2020-12-15T18:34:54Z</dc:date>
  </channel>
</rss>

