<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How will Erasure Coding affect the principle of data locality? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-will-Erasure-Coding-affect-the-principle-of-data/m-p/97978#M11492</link>
    <description>&lt;A rel="user" href="https://community.cloudera.com/users/438/pcoates.html" nodeid="438"&gt;@Peter Coates&lt;/A&gt;&lt;P&gt;Not sure if you have seen this &lt;A target="_blank" href="https://issues.apache.org/jira/browse/HDFS-8030"&gt;https://issues.apache.org/jira/browse/HDFS-8030&lt;/A&gt;&lt;/P&gt;&lt;P&gt;HDFS Erasure Coding Phase I (HDFS­7285) enables EC based on the striped format. It
achieves space saving but gives up data locality. Phase II of this project aims to support similar
space saving based on the contiguous block layout.&lt;/P&gt;</description>
    <pubDate>Sun, 06 Dec 2015 20:59:42 GMT</pubDate>
    <dc:creator>nsabharwal</dc:creator>
    <dc:date>2015-12-06T20:59:42Z</dc:date>
    <item>
      <title>How will Erasure Coding affect the principle of data locality?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-will-Erasure-Coding-affect-the-principle-of-data/m-p/97977#M11491</link>
      <description>&lt;P&gt;Hadoop has long stressed moving the code to the data, both because it's faster to move the code than to move the data, and more importantly because the network is a limited shared resource that can easily be swamped.  Erasure coding would seem to require that a large proportion of the data must move across the network because the contents of a single block will reside on multiple nodes. This would presumably apply not just the ToR switch, but the shared network as well, if the ability to tolerate the loss of a rack is preserved. Is this true and how are these principles reconciled?&lt;/P&gt;</description>
      <pubDate>Sun, 06 Dec 2015 08:03:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-will-Erasure-Coding-affect-the-principle-of-data/m-p/97977#M11491</guid>
      <dc:creator>pcoates</dc:creator>
      <dc:date>2015-12-06T08:03:33Z</dc:date>
    </item>
    <item>
      <title>Re: How will Erasure Coding affect the principle of data locality?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-will-Erasure-Coding-affect-the-principle-of-data/m-p/97978#M11492</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/438/pcoates.html" nodeid="438"&gt;@Peter Coates&lt;/A&gt;&lt;P&gt;Not sure if you have seen this &lt;A target="_blank" href="https://issues.apache.org/jira/browse/HDFS-8030"&gt;https://issues.apache.org/jira/browse/HDFS-8030&lt;/A&gt;&lt;/P&gt;&lt;P&gt;HDFS Erasure Coding Phase I (HDFS­7285) enables EC based on the striped format. It
achieves space saving but gives up data locality. Phase II of this project aims to support similar
space saving based on the contiguous block layout.&lt;/P&gt;</description>
      <pubDate>Sun, 06 Dec 2015 20:59:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-will-Erasure-Coding-affect-the-principle-of-data/m-p/97978#M11492</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2015-12-06T20:59:42Z</dc:date>
    </item>
    <item>
      <title>Re: How will Erasure Coding affect the principle of data locality?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-will-Erasure-Coding-affect-the-principle-of-data/m-p/97979#M11493</link>
      <description>&lt;P&gt;Note also you are going to get less IO bandwidth, as you move from 3 replicas (and hence 3 places to run code locally), to what is essentially a single replica, with the data spread across the network.&lt;/P&gt;&lt;P&gt;Erasure coding is for best storing cold data where the improvements in storage density is tangible: it will hurt performance through&lt;/P&gt;&lt;P&gt; -loss of locality (network layer)&lt;/P&gt;&lt;P&gt; -loss of replicas (disk IO layer)&lt;/P&gt;&lt;P&gt;-need to rebuild the raw data (CPU overhead)&lt;/P&gt;&lt;P&gt;I don't think we have any figures yet on the impact.&lt;/P&gt;&lt;P&gt;On a brighter note, 10GbE ToR switches are falling in price, so you could thing about going to 10 Gb on-rack, even if the backbone remains a bottleneck&lt;/P&gt;</description>
      <pubDate>Wed, 09 Dec 2015 21:42:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-will-Erasure-Coding-affect-the-principle-of-data/m-p/97979#M11493</guid>
      <dc:creator>stevel</dc:creator>
      <dc:date>2015-12-09T21:42:53Z</dc:date>
    </item>
    <item>
      <title>Re: How will Erasure Coding affect the principle of data locality?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-will-Erasure-Coding-affect-the-principle-of-data/m-p/97980#M11494</link>
      <description>&lt;P&gt;The documentation seems to suggest that the normal mode of use would be to have one reconstituted replica sitting around and that reconstituting an encoded block would be done only if this isn't the case.  Keeping a block by default would eliminate most of the space savings because the data would expand from 1.6 to 2.6 times the raw file size. Why not have a policy that for leaves a single size copy for a limited time after a block is used? A "working set" as it were, so if you've used a block in the last X hours the decoded block won't be deleted.&lt;/P&gt;</description>
      <pubDate>Thu, 10 Dec 2015 00:47:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-will-Erasure-Coding-affect-the-principle-of-data/m-p/97980#M11494</guid>
      <dc:creator>pcoates</dc:creator>
      <dc:date>2015-12-10T00:47:01Z</dc:date>
    </item>
  </channel>
</rss>

