<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Read consistency - Impala in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Read-consistency-Impala/m-p/38428#M22143</link>
    <description>&lt;P&gt;We have a use&amp;nbsp;case to reload all transactions data every month for defined set of years. We are going to use Spark and create required reporting tables.&amp;nbsp;Will use Impala for analytical workloads with BI tool.&lt;/P&gt;&lt;P&gt;how do we separate the data processing tables vs reporting tables and then swap tables in Impala? We want to minimise the impact to users in terms of availability of BI system and to ensure read consistency.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any ideas?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Ive come across couple of options - partitioning (but need to copy or move the files to the reporting directory location) so that for the next run we can make use of data processing tables&lt;/P&gt;&lt;P&gt;and other option is lock table - remove files, and move files from working to reporting directory (again will cause an impact to users during that file removal and movement duration).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;ideally it would work if we can have two databases and swap them based on the data load completion.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thanks.&lt;/P&gt;&lt;P&gt;Suresh&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 10:07:41 GMT</pubDate>
    <dc:creator>Suresh12</dc:creator>
    <dc:date>2022-09-16T10:07:41Z</dc:date>
    <item>
      <title>Read consistency - Impala</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Read-consistency-Impala/m-p/38428#M22143</link>
      <description>&lt;P&gt;We have a use&amp;nbsp;case to reload all transactions data every month for defined set of years. We are going to use Spark and create required reporting tables.&amp;nbsp;Will use Impala for analytical workloads with BI tool.&lt;/P&gt;&lt;P&gt;how do we separate the data processing tables vs reporting tables and then swap tables in Impala? We want to minimise the impact to users in terms of availability of BI system and to ensure read consistency.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any ideas?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Ive come across couple of options - partitioning (but need to copy or move the files to the reporting directory location) so that for the next run we can make use of data processing tables&lt;/P&gt;&lt;P&gt;and other option is lock table - remove files, and move files from working to reporting directory (again will cause an impact to users during that file removal and movement duration).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;ideally it would work if we can have two databases and swap them based on the data load completion.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thanks.&lt;/P&gt;&lt;P&gt;Suresh&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 10:07:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Read-consistency-Impala/m-p/38428#M22143</guid>
      <dc:creator>Suresh12</dc:creator>
      <dc:date>2022-09-16T10:07:41Z</dc:date>
    </item>
    <item>
      <title>Re: Read consistency - Impala</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Read-consistency-Impala/m-p/38443#M22144</link>
      <description>&lt;P&gt;Hi Suresh,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;even if your use case may be slightly different, I'd recomment you take a look at this blog post that presents best practices and may give you a few ideas:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="http://blog.cloudera.com/blog/2015/11/how-to-ingest-and-query-fast-data-with-impala-without-kudu/" target="_blank"&gt;http://blog.cloudera.com/blog/2015/11/how-to-ingest-and-query-fast-data-with-impala-without-kudu/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Alex&lt;/P&gt;</description>
      <pubDate>Tue, 08 Mar 2016 07:18:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Read-consistency-Impala/m-p/38443#M22144</guid>
      <dc:creator>alex.behm</dc:creator>
      <dc:date>2016-03-08T07:18:57Z</dc:date>
    </item>
    <item>
      <title>Re: Read consistency - Impala</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Read-consistency-Impala/m-p/38463#M22145</link>
      <description>&lt;P&gt;Thanks Alex. I can see some references to swapping tables/views and so on but it looks very complex to maintain I think.&amp;nbsp;&lt;/P&gt;&lt;P&gt;How about if we use two different HDFS locations say Location1 and Location2 - swap them to use the right location for the reporting tables once the data load is complete on one of the locations for the data processing? It looks like we can make use of Alter Table command to change the HDFS location for a table so that would nicely swap the location just a metadata operation. what do you think?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1st run:&lt;/P&gt;&lt;P&gt;Location1 - use for Data Processing&lt;/P&gt;&lt;P&gt;Location2 - use for Reporting&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2nd run:&lt;/P&gt;&lt;P&gt;Location1 - use for&amp;nbsp;Reporting&lt;/P&gt;&lt;P&gt;Location2 - use for&amp;nbsp;Data Processing&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thanks&lt;/P&gt;&lt;P&gt;Suresh&lt;/P&gt;</description>
      <pubDate>Tue, 08 Mar 2016 17:26:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Read-consistency-Impala/m-p/38463#M22145</guid>
      <dc:creator>Suresh12</dc:creator>
      <dc:date>2016-03-08T17:26:09Z</dc:date>
    </item>
    <item>
      <title>Re: Read consistency - Impala</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Read-consistency-Impala/m-p/38478#M22146</link>
      <description>&lt;P&gt;Hi Suresh,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;that solution seems fine to me. Changing the location of a single table with ALTER is atomic, but you won't be able to atomically change the locations of two tables simultaneously. Just something to be aware of.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Alex&lt;/P&gt;</description>
      <pubDate>Wed, 09 Mar 2016 07:06:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Read-consistency-Impala/m-p/38478#M22146</guid>
      <dc:creator>alex.behm</dc:creator>
      <dc:date>2016-03-09T07:06:38Z</dc:date>
    </item>
  </channel>
</rss>

