<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Impala performance with HDFS caching enabled in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-performance-with-HDFS-caching-enabled/m-p/49397#M51386</link>
    <description>&lt;P&gt;What kind of performance difference are we talking about? 5%? 100%?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It's helpful to look at execution summaries or profiles to drill down on where the difference is (if you're using impala-shell, you can get them with the summary; and profile; commands after running a query).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If the whole data set you're querying fits in memory, HDFS caching may not be that beneficial, since the OS buffer cache can be pretty effective at keeping the data in memory, especially if you're re-running the same query on the same data back-to-back. Also if the query is somewhat complex, it can get CPU-bound pretty quickly.&lt;/P&gt;</description>
    <pubDate>Thu, 12 Jan 2017 23:40:01 GMT</pubDate>
    <dc:creator>Tim Armstrong</dc:creator>
    <dc:date>2017-01-12T23:40:01Z</dc:date>
    <item>
      <title>Impala performance with HDFS caching enabled</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-performance-with-HDFS-caching-enabled/m-p/49343#M51385</link>
      <description>&lt;P&gt;&lt;SPAN&gt;I got a table of size ~1gb and I tried to setup hdfs caching as described in this 'Using HDFS caching with Impala" doc, with replication factor of 2&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;I noticed that without hdfs caching the queries seem to be performing better. I'm using CDH 5.8.2. Is there anything I might be missing or can check why that is the case?&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Thanks!&lt;/DIV&gt;</description>
      <pubDate>Fri, 16 Sep 2022 10:54:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-performance-with-HDFS-caching-enabled/m-p/49343#M51385</guid>
      <dc:creator>buntu</dc:creator>
      <dc:date>2022-09-16T10:54:26Z</dc:date>
    </item>
    <item>
      <title>Re: Impala performance with HDFS caching enabled</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-performance-with-HDFS-caching-enabled/m-p/49397#M51386</link>
      <description>&lt;P&gt;What kind of performance difference are we talking about? 5%? 100%?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It's helpful to look at execution summaries or profiles to drill down on where the difference is (if you're using impala-shell, you can get them with the summary; and profile; commands after running a query).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If the whole data set you're querying fits in memory, HDFS caching may not be that beneficial, since the OS buffer cache can be pretty effective at keeping the data in memory, especially if you're re-running the same query on the same data back-to-back. Also if the query is somewhat complex, it can get CPU-bound pretty quickly.&lt;/P&gt;</description>
      <pubDate>Thu, 12 Jan 2017 23:40:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-performance-with-HDFS-caching-enabled/m-p/49397#M51386</guid>
      <dc:creator>Tim Armstrong</dc:creator>
      <dc:date>2017-01-12T23:40:01Z</dc:date>
    </item>
    <item>
      <title>Re: Impala performance with HDFS caching enabled</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-performance-with-HDFS-caching-enabled/m-p/49398#M51387</link>
      <description>&lt;P&gt;Given the size of the dataset, I believe the data fits in memory and its not providing any additional performance improvement.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Fri, 13 Jan 2017 00:24:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-performance-with-HDFS-caching-enabled/m-p/49398#M51387</guid>
      <dc:creator>buntu</dc:creator>
      <dc:date>2017-01-13T00:24:04Z</dc:date>
    </item>
  </channel>
</rss>

