<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Compression/Zipping of old unused HDFS files in Production Cluster in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Compression-Zipping-of-old-unused-HDFS-files-in-Production/m-p/311268#M224563</link>
    <description>&lt;P&gt;Hello there,&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;I understand your use-case to save up some HDFS space. Though I haven't tested zipping possibilities for hdfs level files[2].&amp;nbsp;Alternately, you may consider reviewing HDFS Erasure Coding[1] if that suits your requirement:&amp;nbsp;&lt;BR /&gt;ErasureCoding in HDFS significantly reduces storage overhead while achieving similar or better fault tolerance through the use of parity cells (similar to RAID5). Prior to the introduction of EC, HDFS used 3x replication for fault tolerance exclusively, meaning that a 1GB file would use 3 GB of raw disk space. With EC, the same level of fault tolerance can be achieved using only 1.5 GB of raw disk space.&lt;BR /&gt;Please refer the below article[1] for more insights on EC:&lt;/P&gt;&lt;P&gt;Ref[1]:&amp;nbsp;&lt;SPAN&gt;&lt;A href="https://blog.cloudera.com/hdfs-erasure-coding-in-production/" target="_blank" rel="noopener"&gt;https://blog.cloudera.com/hdfs-erasure-coding-in-production/&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;[2] &lt;A href="https://docs.cloudera.com/cloudera-manager/7.2.6/managing-clusters/topics/cm-choosing-configuring-data-compression.html" target="_blank" rel="noopener"&gt;https://docs.cloudera.com/cloudera-manager/7.2.6/managing-clusters/topics/cm-choosing-configuring-data-compression.html&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 10 Feb 2021 02:11:52 GMT</pubDate>
    <dc:creator>shsingh</dc:creator>
    <dc:date>2021-02-10T02:11:52Z</dc:date>
  </channel>
</rss>

