<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: With CDH6.3.1, the Impala command &amp;quot;Refresh&amp;quot; doesn't work until the HDFS files are closed in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/With-CDH6-3-1-the-Impala-command-quot-Refresh-quot-doesn-t/m-p/292757#M216296</link>
    <description>&lt;P&gt;Does the file modification timestamp change until you close the file? I am curious to know if this approach worked in any older version so that its easier to find what change in the code.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 27 Mar 2020 21:11:00 GMT</pubDate>
    <dc:creator>Vihangk</dc:creator>
    <dc:date>2020-03-27T21:11:00Z</dc:date>
    <item>
      <title>With CDH6.3.1, the Impala command "Refresh" doesn't work until the HDFS files are closed</title>
      <link>https://community.cloudera.com/t5/Support-Questions/With-CDH6-3-1-the-Impala-command-quot-Refresh-quot-doesn-t/m-p/292717#M216271</link>
      <description>&lt;P&gt;We have an application continuously writing data with CSV format in a directory of HDFS. In our scenario, the data keep coming but not large in a batch. So the application keeps the files open and continuous writing them. After each batch of writing, it does a hard flush to make the data visible in the files and also increase the size of the files. As a result, there won't be too many small files and with the Impala "Refresh" command, the latest data can be seen immediately with CDH 5.16.1.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;However, after the cluster is upgraded to CDH 6.3.1, the&amp;nbsp;Impala "Refresh" command doesn't work. When a new file was created and written some data, the new data could be seen after refreshing, But afterwards, for the new coming data, even though I could directly see them through HDFS command and the size of the file was increased, I couldn't see them through the Impala SQL "Select". Only if the file was closed(the application was terminated), I could see the latest data&amp;nbsp;&amp;nbsp;through the Impala SQL "Refresh" and "Select".&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The table is an external table partitioned by a timestamp column on monthly basis.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As per the document of the "Refresh" command, it should work for d&lt;SPAN&gt;eleting, adding, or modifying files. Is it a bug?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;By the way, I can see that the "invalidate metadata" SQL works. But it introduces 3-4 seconds of extra time for the next "SELECT" SQL. Indeed, the&amp;nbsp;"SELECT" SQL usually takes a few seconds, so&amp;nbsp;3-4 seconds of additional time degraded the performance.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I also tried "Alter table xxx recover partitions", "alter table ... drop partition... / alter table ... add partition...", but with no luck.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Except&amp;nbsp;the "invalidate metadata" SQL, is there a good way to work around this problem?&lt;/P&gt;</description>
      <pubDate>Fri, 27 Mar 2020 13:13:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/With-CDH6-3-1-the-Impala-command-quot-Refresh-quot-doesn-t/m-p/292717#M216271</guid>
      <dc:creator>AllenLee</dc:creator>
      <dc:date>2020-03-27T13:13:29Z</dc:date>
    </item>
    <item>
      <title>Re: With CDH6.3.1, the Impala command "Refresh" doesn't work until the HDFS files are closed</title>
      <link>https://community.cloudera.com/t5/Support-Questions/With-CDH6-3-1-the-Impala-command-quot-Refresh-quot-doesn-t/m-p/292757#M216296</link>
      <description>&lt;P&gt;Does the file modification timestamp change until you close the file? I am curious to know if this approach worked in any older version so that its easier to find what change in the code.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 27 Mar 2020 21:11:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/With-CDH6-3-1-the-Impala-command-quot-Refresh-quot-doesn-t/m-p/292757#M216296</guid>
      <dc:creator>Vihangk</dc:creator>
      <dc:date>2020-03-27T21:11:00Z</dc:date>
    </item>
    <item>
      <title>Re: With CDH6.3.1, the Impala command "Refresh" doesn't work until the HDFS files are closed</title>
      <link>https://community.cloudera.com/t5/Support-Questions/With-CDH6-3-1-the-Impala-command-quot-Refresh-quot-doesn-t/m-p/292902#M216363</link>
      <description>&lt;P&gt;No, the modification timestamp is not changed. But it worked with CDH5.16. After upgrading to CDH6.3, it didn't work again.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;By the way, our cluster is Kerberos enabled. It was upgraded from CDH 5.16.2 to CDH 6.3.2.&lt;/P&gt;</description>
      <pubDate>Mon, 30 Mar 2020 16:35:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/With-CDH6-3-1-the-Impala-command-quot-Refresh-quot-doesn-t/m-p/292902#M216363</guid>
      <dc:creator>AllenLee</dc:creator>
      <dc:date>2020-03-30T16:35:05Z</dc:date>
    </item>
    <item>
      <title>Re: With CDH6.3.1, the Impala command "Refresh" doesn't work until the HDFS files are closed</title>
      <link>https://community.cloudera.com/t5/Support-Questions/With-CDH6-3-1-the-Impala-command-quot-Refresh-quot-doesn-t/m-p/293006#M216425</link>
      <description>&lt;P&gt;You pointed a right direction. I added some codes to update the modification time of the files in HDFS, and the "Refresh" SQL worked now.&lt;/P&gt;</description>
      <pubDate>Tue, 31 Mar 2020 16:18:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/With-CDH6-3-1-the-Impala-command-quot-Refresh-quot-doesn-t/m-p/293006#M216425</guid>
      <dc:creator>AllenLee</dc:creator>
      <dc:date>2020-03-31T16:18:44Z</dc:date>
    </item>
  </channel>
</rss>

