<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Impyla bad performance - rows fetch is very slow in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Impyla-bad-performance-rows-fetch-is-very-slow/m-p/87442#M11826</link>
    <description>&lt;P&gt;Yeah we need to make some changes in Impala to optimise this case (large SELECT result sets) better. We have some of that work in Impala.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If you're doing large extracts of data, it's often better to do a "CREATE TABLE AS SELECT" into a text table and download those files directly from the filesystem, if that's possible.&lt;/P&gt;</description>
    <pubDate>Thu, 07 Mar 2019 17:14:04 GMT</pubDate>
    <dc:creator>Tim Armstrong</dc:creator>
    <dc:date>2019-03-07T17:14:04Z</dc:date>
    <item>
      <title>Impyla bad performance - rows fetch is very slow</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Impyla-bad-performance-rows-fetch-is-very-slow/m-p/87210#M11823</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;There is a program that uses Impyla to retrieve data from the local Impala daemon.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;cursor.execute("select * from table;")
rows = cursor.fetchall()&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The table has 5 million rows, the number of columns is 9, the file size at the time of CSV conversion is about 200 MB.&lt;/P&gt;&lt;P&gt;There are four data nodes.Memory is 32 GB.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Despite just that much data, fetchall () takes over 200 seconds.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Query execution ends in 0.2 seconds&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Why is it so slow?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Do you have any ideas to speed up something?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 14:12:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Impyla-bad-performance-rows-fetch-is-very-slow/m-p/87210#M11823</guid>
      <dc:creator>uma66</dc:creator>
      <dc:date>2022-09-16T14:12:35Z</dc:date>
    </item>
    <item>
      <title>Re: Impyla bad performance - rows fetch is very slow</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Impyla-bad-performance-rows-fetch-is-very-slow/m-p/87246#M11824</link>
      <description>&lt;P&gt;Impala is a streaming SQL engine so query execution&amp;nbsp;can actually happen at the same time as rows are returned to the client. In your case, we don't scan the whole table, put the rows somewhere, then return the rows to the client. Rather Impala just returns rows to the client&amp;nbsp;at the same time as it's scanning the table.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The bottleneck is likely in the client or network. Impyla is not particularly fast at parsing&amp;nbsp;incoming rows and converting them into python objects. The Impala server is much much much faster. There's also a known issue that means that&amp;nbsp;latency between the client and network can&amp;nbsp;affect the time taken to return rows:&amp;nbsp;&lt;A href="https://issues.apache.org/jira/browse/IMPALA-1618.&amp;nbsp;" target="_blank"&gt;https://issues.apache.org/jira/browse/IMPALA-1618.&amp;nbsp;&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 05 Mar 2019 17:54:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Impyla-bad-performance-rows-fetch-is-very-slow/m-p/87246#M11824</guid>
      <dc:creator>Tim Armstrong</dc:creator>
      <dc:date>2019-03-05T17:54:10Z</dc:date>
    </item>
    <item>
      <title>Re: Impyla bad performance - rows fetch is very slow</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Impyla-bad-performance-rows-fetch-is-very-slow/m-p/87275#M11825</link>
      <description>&lt;P&gt;Thank you for answering.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;That means that it "cursor.fetchall()" contains hdfs scan time.&lt;/P&gt;&lt;P&gt;On the other hand, the bottleneck is not on "hdfs scan" but on the client or network.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I checked below, but I interpreted this problem as occurring in the case of specifying a size smaller than the default batch size.&lt;BR /&gt;&lt;A href="https://issues.apache.org/jira/browse/IMPALA-1618" target="_blank"&gt;https://issues.apache.org/jira/browse/IMPALA-1618&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It is questionable whether there is a possibility of occurrence even when using "&lt;SPAN&gt;cursor.fetchall()&lt;/SPAN&gt;".&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have found an issue that shows the same thing.&lt;BR /&gt;&lt;A href="https://github.com/cloudera/impyla/issues/239" target="_blank"&gt;https://github.com/cloudera/impyla/issues/239&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Wes McKinney says it is a problem of hs2client.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Somehow I understood that there was no solution....&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 06 Mar 2019 02:24:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Impyla-bad-performance-rows-fetch-is-very-slow/m-p/87275#M11825</guid>
      <dc:creator>uma66</dc:creator>
      <dc:date>2019-03-06T02:24:10Z</dc:date>
    </item>
    <item>
      <title>Re: Impyla bad performance - rows fetch is very slow</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Impyla-bad-performance-rows-fetch-is-very-slow/m-p/87442#M11826</link>
      <description>&lt;P&gt;Yeah we need to make some changes in Impala to optimise this case (large SELECT result sets) better. We have some of that work in Impala.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If you're doing large extracts of data, it's often better to do a "CREATE TABLE AS SELECT" into a text table and download those files directly from the filesystem, if that's possible.&lt;/P&gt;</description>
      <pubDate>Thu, 07 Mar 2019 17:14:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Impyla-bad-performance-rows-fetch-is-very-slow/m-p/87442#M11826</guid>
      <dc:creator>Tim Armstrong</dc:creator>
      <dc:date>2019-03-07T17:14:04Z</dc:date>
    </item>
  </channel>
</rss>

