<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: No change in overall performance when used gzip. in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/No-change-in-overall-performance-when-used-gzip/m-p/92726#M12364</link>
    <description>Hi,&lt;BR /&gt;&lt;BR /&gt;The purpose of compression is to save space, not speed up query time. Compression actually adds overhead to decompress the data before data can be read, so I would expect the query against compressed data will be slightly slower than uncompressed. So what you see is totally normal to me.&lt;BR /&gt;&lt;BR /&gt;Cheers&lt;BR /&gt;Eric</description>
    <pubDate>Mon, 15 Jul 2019 10:35:56 GMT</pubDate>
    <dc:creator>EricL</dc:creator>
    <dc:date>2019-07-15T10:35:56Z</dc:date>
    <item>
      <title>No change in overall performance when used gzip.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/No-change-in-overall-performance-when-used-gzip/m-p/92367#M12363</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I need to improve speed for my select query currently I am testing how best I can get from Impala with given hardware.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have created a parquet table with partitions and block size = 512mb.&lt;/P&gt;
&lt;P&gt;Here is some more details -&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;[quickstart.cloudera:21000] &amp;gt; show table stats tbl_parq_512m;
Query: show table stats tbl_parq_512m
+-----------+----------------+------+------------+-----------+--------+----------+--------------+-------------------+---------+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| source_ip | destination_ip | year | event_date | #Rows     | #Files | Size     | Bytes Cached | Cache Replication | Format  | Incremental stats | Location                                                                                                                                       |
+-----------+----------------+------+------------+-----------+--------+----------+--------------+-------------------+---------+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| 10.0.0.1  | 20.0.0.1       | 1990 | 1/1/1990   | 4999950   | 1      | 337.76MB | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_512m/source_ip=10.0.0.1/destination_ip=20.0.0.1/year=1990/event_date=1%2F1%2F1990 |
| 30.0.0.1  | 40.0.0.1       | 1993 | 1/1/1993   | 19999800  | 3      | 1.32GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_512m/source_ip=30.0.0.1/destination_ip=40.0.0.1/year=1993/event_date=1%2F1%2F1993 |
| 41.0.0.1  | 45.0.0.1       | 1994 | 1/1/1994   | 19999800  | 3      | 1.32GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_512m/source_ip=41.0.0.1/destination_ip=45.0.0.1/year=1994/event_date=1%2F1%2F1994 |
| 46.0.0.1  | 50.0.0.1       | 1995 | 1/1/1995   | 48999020  | 7      | 3.23GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_512m/source_ip=46.0.0.1/destination_ip=50.0.0.1/year=1995/event_date=1%2F1%2F1995 |
| 51.0.0.1  | 55.0.0.1       | 1996 | 1/1/1996   | 49999000  | 7      | 3.30GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_512m/source_ip=51.0.0.1/destination_ip=55.0.0.1/year=1996/event_date=1%2F1%2F1996 |
| 56.0.0.1  | 60.0.0.1       | 1997 | 1/1/1997   | 49999000  | 7      | 3.30GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_512m/source_ip=56.0.0.1/destination_ip=60.0.0.1/year=1997/event_date=1%2F1%2F1997 |
| 61.0.0.1  | 70.0.0.1       | 1998 | 1/1/1998   | 99998000  | 14     | 6.59GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_512m/source_ip=61.0.0.1/destination_ip=70.0.0.1/year=1998/event_date=1%2F1%2F1998 |
| 71.0.0.1  | 75.0.0.1       | 1999 | 1/1/1999   | 49999000  | 7      | 3.30GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_512m/source_ip=71.0.0.1/destination_ip=75.0.0.1/year=1999/event_date=1%2F1%2F1999 |
| 80.0.0.1  | 85.0.0.1       | 2000 | 1/1/2000   | 49999000  | 7      | 3.30GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_512m/source_ip=80.0.0.1/destination_ip=85.0.0.1/year=2000/event_date=1%2F1%2F2000 |
| 86.0.0.1  | 90.0.0.1       | 2001 | 1/1/2001   | 49999000  | 7      | 3.30GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_512m/source_ip=86.0.0.1/destination_ip=90.0.0.1/year=2001/event_date=1%2F1%2F2001 |
| 91.0.0.1  | 95.0.0.1       | 2002 | 1/1/2002   | 82998340  | 11     | 5.47GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_512m/source_ip=91.0.0.1/destination_ip=95.0.0.1/year=2002/event_date=1%2F1%2F2002 |
| Total     |                |      |            | 526989910 | 74     | 34.74GB  | 0B           |                   |         |                   |                                                                                                                                                |
+-----------+----------------+------+------------+-----------+--------+----------+--------------+-------------------+---------+-------------------+---------------------------------------------------------&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When I run the below query , it takes 28 seconds to complete -&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;[quickstart.cloudera:21000] &amp;gt; select source_ip,destination_ip,avg(volume),sum(duration) as avl from tbl_parq_512m group by source_ip,destination_ip order by avl limit 10;
Query: select source_ip,destination_ip,avg(volume),sum(duration) as avl from tbl_parq_512m group by source_ip,destination_ip order by avl limit 10
Query submitted at: 2019-07-08 18:32:31 (Coordinator: http://quickstart.cloudera:25000)
Query progress can be monitored at: http://quickstart.cloudera:25000/query_plan?query_id=88422271f161227b:fe860ef500000000
+-----------+----------------+-------------------+-----------+
| source_ip | destination_ip | avg(volume)       | avl       |
+-----------+----------------+-------------------+-----------+
| 10.0.0.1  | 20.0.0.1       | 4.499997599976    | 22500415  |
| 30.0.0.1  | 40.0.0.1       | 4.499744347443475 | 90003989  |
| 41.0.0.1  | 45.0.0.1       | 4.498248032480324 | 90023641  |
| 46.0.0.1  | 50.0.0.1       | 4.499559623845538 | 220487564 |
| 86.0.0.1  | 90.0.0.1       | 4.500191183823676 | 224966377 |
| 71.0.0.1  | 75.0.0.1       | 4.500268505370108 | 224977244 |
| 80.0.0.1  | 85.0.0.1       | 4.499946078921578 | 225002743 |
| 56.0.0.1  | 60.0.0.1       | 4.500426968539371 | 225003328 |
| 51.0.0.1  | 55.0.0.1       | 4.500039240784815 | 225013849 |
| 91.0.0.1  | 95.0.0.1       | 4.499926612990091 | 373535589 |
+-----------+----------------+-------------------+-----------+
Fetched 10 row(s) in 27.17s&lt;/PRE&gt;
&lt;P&gt;I am trying to speed up the query so I tried going with &lt;STRONG&gt;gzip compression.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;[quickstart.cloudera:21000] &amp;gt; show table stats tbl_gzip_512m;
Query: show table stats tbl_gzip_512m
+-----------+----------------+------+------------+-----------+--------+----------+--------------+-------------------+---------+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| source_ip | destination_ip | year | event_date | #Rows     | #Files | Size     | Bytes Cached | Cache Replication | Format  | Incremental stats | Location                                                                                                                                       |
+-----------+----------------+------+------------+-----------+--------+----------+--------------+-------------------+---------+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| 10.0.0.1  | 20.0.0.1       | 1990 | 1/1/1990   | 5000950   | 1      | 231.85MB | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_gzip_512m/source_ip=10.0.0.1/destination_ip=20.0.0.1/year=1990/event_date=1%2F1%2F1990 |
| 30.0.0.1  | 40.0.0.1       | 1993 | 1/1/1993   | 19999800  | 2      | 925.94MB | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_gzip_512m/source_ip=30.0.0.1/destination_ip=40.0.0.1/year=1993/event_date=1%2F1%2F1993 |
| 41.0.0.1  | 45.0.0.1       | 1994 | 1/1/1994   | 19999800  | 2      | 925.94MB | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_gzip_512m/source_ip=41.0.0.1/destination_ip=45.0.0.1/year=1994/event_date=1%2F1%2F1994 |
| 46.0.0.1  | 50.0.0.1       | 1995 | 1/1/1995   | 48999020  | 5      | 2.22GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_gzip_512m/source_ip=46.0.0.1/destination_ip=50.0.0.1/year=1995/event_date=1%2F1%2F1995 |
| 51.0.0.1  | 55.0.0.1       | 1996 | 1/1/1996   | 49999000  | 5      | 2.26GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_gzip_512m/source_ip=51.0.0.1/destination_ip=55.0.0.1/year=1996/event_date=1%2F1%2F1996 |
| 56.0.0.1  | 60.0.0.1       | 1997 | 1/1/1997   | 49999000  | 5      | 2.26GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_gzip_512m/source_ip=56.0.0.1/destination_ip=60.0.0.1/year=1997/event_date=1%2F1%2F1997 |
| 61.0.0.1  | 70.0.0.1       | 1998 | 1/1/1998   | 99998000  | 19     | 4.53GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_gzip_512m/source_ip=61.0.0.1/destination_ip=70.0.0.1/year=1998/event_date=1%2F1%2F1998 |
| 71.0.0.1  | 75.0.0.1       | 1999 | 1/1/1999   | 49999000  | 5      | 2.26GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_gzip_512m/source_ip=71.0.0.1/destination_ip=75.0.0.1/year=1999/event_date=1%2F1%2F1999 |
| 80.0.0.1  | 85.0.0.1       | 2000 | 1/1/2000   | 49999000  | 10     | 2.26GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_gzip_512m/source_ip=80.0.0.1/destination_ip=85.0.0.1/year=2000/event_date=1%2F1%2F2000 |
| 86.0.0.1  | 90.0.0.1       | 2001 | 1/1/2001   | 49999000  | 10     | 2.26GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_gzip_512m/source_ip=86.0.0.1/destination_ip=90.0.0.1/year=2001/event_date=1%2F1%2F2001 |
| 91.0.0.1  | 95.0.0.1       | 2002 | 1/1/2002   | 82998340  | 16     | 3.76GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_gzip_512m/source_ip=91.0.0.1/destination_ip=95.0.0.1/year=2002/event_date=1%2F1%2F2002 |
| Total     |                |      |            | 526990910 | 80     | 23.84GB  | 0B           |                   |         |                   |                                                                                                                                                |
+-----------+----------------+------+------------+-----------+--------+----------+--------------+-------------------+---------+-------------------+---------------------------------------------------------&lt;/PRE&gt;
&lt;P&gt;Size has dropped by 30% , it has reduced from &lt;STRONG&gt;34gb to 24 gb&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I was expecting similar behaviour with query performance as well hoping it will reduce some more time.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Below is my query query-&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;[quickstart.cloudera:21000] &amp;gt; select source_ip,destination_ip,avg(volume),sum(duration) as avl from tbl_gzip_512m group by source_ip,destination_ip order by avl limit 10;
Query: select source_ip,destination_ip,avg(volume),sum(duration) as avl from tbl_gzip_512m group by source_ip,destination_ip order by avl limit 10
Query submitted at: 2019-07-08 18:39:09 (Coordinator: http://quickstart.cloudera:25000)
Query progress can be monitored at: http://quickstart.cloudera:25000/query_plan?query_id=8d4bdddc207517ff:952aea9d00000000
+-----------+----------------+-------------------+-----------+
| source_ip | destination_ip | avg(volume)       | avl       |
+-----------+----------------+-------------------+-----------+
| 10.0.0.1  | 20.0.0.1       | 4.50001219768244  | 22504983  |
| 30.0.0.1  | 40.0.0.1       | 4.499744347443475 | 90003989  |
| 41.0.0.1  | 45.0.0.1       | 4.498248032480324 | 90023641  |
| 46.0.0.1  | 50.0.0.1       | 4.499559623845538 | 220487564 |
| 86.0.0.1  | 90.0.0.1       | 4.500191183823676 | 224966377 |
| 71.0.0.1  | 75.0.0.1       | 4.500268505370108 | 224977244 |
| 80.0.0.1  | 85.0.0.1       | 4.499946078921578 | 225002743 |
| 56.0.0.1  | 60.0.0.1       | 4.500426968539371 | 225003328 |
| 51.0.0.1  | 55.0.0.1       | 4.500039240784815 | 225013849 |
| 91.0.0.1  | 95.0.0.1       | 4.499926612990091 | 373535589 |
+-----------+----------------+-------------------+-----------+
Fetched 10 row(s) in 27.67s&lt;/PRE&gt;
&lt;P&gt;As you can see there isn't any improvement moreover it looks little slow despite decrease in data by 1/3rd.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So am I doing something wrong here when using gzip ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 14:29:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/No-change-in-overall-performance-when-used-gzip/m-p/92367#M12363</guid>
      <dc:creator>punshi</dc:creator>
      <dc:date>2022-09-16T14:29:36Z</dc:date>
    </item>
    <item>
      <title>Re: No change in overall performance when used gzip.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/No-change-in-overall-performance-when-used-gzip/m-p/92726#M12364</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;The purpose of compression is to save space, not speed up query time. Compression actually adds overhead to decompress the data before data can be read, so I would expect the query against compressed data will be slightly slower than uncompressed. So what you see is totally normal to me.&lt;BR /&gt;&lt;BR /&gt;Cheers&lt;BR /&gt;Eric</description>
      <pubDate>Mon, 15 Jul 2019 10:35:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/No-change-in-overall-performance-when-used-gzip/m-p/92726#M12364</guid>
      <dc:creator>EricL</dc:creator>
      <dc:date>2019-07-15T10:35:56Z</dc:date>
    </item>
  </channel>
</rss>

