<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Transparent Data Encryption (TDE) and Local Disks encryption for intermediate data in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Transparent-Data-Encryption-TDE-and-Local-Disks-encryption/m-p/95797#M9140</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/369/amcbarnett.html" nodeid="369"&gt;@amcbarnett@hortonworks.com&lt;/A&gt; The concern is around who can get access to keys even if you are encrypting the mapreduce shuffle. Local disk encryption is for scenarios where some can take the disk out and read the data. Customers should adopt other methods (OS level access) to prevent users from getting access to nodes where the intermediate data might be stored&lt;/P&gt;</description>
    <pubDate>Thu, 22 Oct 2015 04:31:00 GMT</pubDate>
    <dc:creator>bganesan</dc:creator>
    <dc:date>2015-10-22T04:31:00Z</dc:date>
    <item>
      <title>Transparent Data Encryption (TDE) and Local Disks encryption for intermediate data</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Transparent-Data-Encryption-TDE-and-Local-Disks-encryption/m-p/95795#M9138</link>
      <description>&lt;P&gt;There is the recommendation that local disks should be encrypted for intermediate Data; need more info on this.  Why is this so?  How do we proposed encrypting the disk?

Is this because Tez stores intermediate data on local disks?    

Also Map Reduce stores data in local disk with the "&lt;A target="_blank" href="https://hadoop.apache.org/docs/r2.7.1/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml"&gt;mapreduce.cluster.local.dir&lt;/A&gt;" parameter.  So this has to be encrypted right?&lt;/P&gt;&lt;P&gt;So what are the best practices to encrypt the local disks for intermediate data? What is the manual effort involved?

Is &lt;A href="https://hadoop.apache.org/docs/r2.7.1/hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html"&gt;Hadoop Encrypted Shuffle&lt;/A&gt; enough?
&lt;/P&gt;</description>
      <pubDate>Thu, 22 Oct 2015 01:57:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Transparent-Data-Encryption-TDE-and-Local-Disks-encryption/m-p/95795#M9138</guid>
      <dc:creator>amcbarnett</dc:creator>
      <dc:date>2015-10-22T01:57:44Z</dc:date>
    </item>
    <item>
      <title>Re: Transparent Data Encryption (TDE) and Local Disks encryption for intermediate data</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Transparent-Data-Encryption-TDE-and-Local-Disks-encryption/m-p/95796#M9139</link>
      <description>&lt;P&gt;This capability allows encryption of the intermediate files generated during the merge and shuffle phases. It can be enabled by setting the mapreduce.job.encrypted-intermediate-data job property to true.

Set in mapred-default.xml the following:&lt;/P&gt;&lt;PRE&gt;&amp;lt;property&amp;gt;
  &amp;lt;name&amp;gt;mapreduce.job.encrypted-intermediate-data&amp;lt;/name&amp;gt;
  &amp;lt;value&amp;gt;false&amp;lt;/value&amp;gt;
  &amp;lt;description&amp;gt;Encrypt intermediate MapReduce spill files or not
  default is false&amp;lt;/description&amp;gt;
&amp;lt;/property&amp;gt;

&amp;lt;property&amp;gt;
  &amp;lt;name&amp;gt;mapreduce.job.encrypted-intermediate-data-key-size-bits&amp;lt;/name&amp;gt;
  &amp;lt;value&amp;gt;128&amp;lt;/value&amp;gt;
  &amp;lt;description&amp;gt;Mapreduce encrypt data key size default is 128&amp;lt;/description&amp;gt;
&amp;lt;/property&amp;gt;

&amp;lt;property&amp;gt;
  &amp;lt;name&amp;gt;mapreduce.job.encrypted-intermediate-data.buffer.kb&amp;lt;/name&amp;gt;
  &amp;lt;value&amp;gt;128&amp;lt;/value&amp;gt;
  &amp;lt;description&amp;gt;Buffer size for intermediate encrypt data in kb
  default is 128&amp;lt;/description&amp;gt;
&amp;lt;/property&amp;gt;

&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;NOTE:&lt;/STRONG&gt; Currently, enabling encrypted intermediate data spills would restrict the number of attempts of the job to 1.  &lt;/P&gt;&lt;P&gt;It is only available in MR2 &lt;/P&gt;</description>
      <pubDate>Thu, 22 Oct 2015 02:25:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Transparent-Data-Encryption-TDE-and-Local-Disks-encryption/m-p/95796#M9139</guid>
      <dc:creator>amcbarnett</dc:creator>
      <dc:date>2015-10-22T02:25:14Z</dc:date>
    </item>
    <item>
      <title>Re: Transparent Data Encryption (TDE) and Local Disks encryption for intermediate data</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Transparent-Data-Encryption-TDE-and-Local-Disks-encryption/m-p/95797#M9140</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/369/amcbarnett.html" nodeid="369"&gt;@amcbarnett@hortonworks.com&lt;/A&gt; The concern is around who can get access to keys even if you are encrypting the mapreduce shuffle. Local disk encryption is for scenarios where some can take the disk out and read the data. Customers should adopt other methods (OS level access) to prevent users from getting access to nodes where the intermediate data might be stored&lt;/P&gt;</description>
      <pubDate>Thu, 22 Oct 2015 04:31:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Transparent-Data-Encryption-TDE-and-Local-Disks-encryption/m-p/95797#M9140</guid>
      <dc:creator>bganesan</dc:creator>
      <dc:date>2015-10-22T04:31:00Z</dc:date>
    </item>
    <item>
      <title>Re: Transparent Data Encryption (TDE) and Local Disks encryption for intermediate data</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Transparent-Data-Encryption-TDE-and-Local-Disks-encryption/m-p/95798#M9141</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/306/bganesan.html" nodeid="306"&gt;@bganesan@hortonworks.com&lt;/A&gt;&lt;P&gt;Trying to distill this to a best practice, is the following a correct understadning? &lt;/P&gt;&lt;P&gt;In order to ensure data isn't ever written unencrypted (even during shuffle), am I correct in recommending the best approach here is to ensure OS-level encryption is set up for the partitions that store mapreduce ($hadoop.tmp.dir) and tez temporary data? Then we ensure that the HDFS data directories are on a separate, unencrypted partition where we can let HDFS Native Encryption selectively encrypt specified zones.&lt;/P&gt;</description>
      <pubDate>Thu, 05 Nov 2015 02:06:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Transparent-Data-Encryption-TDE-and-Local-Disks-encryption/m-p/95798#M9141</guid>
      <dc:creator>bwilson</dc:creator>
      <dc:date>2015-11-05T02:06:39Z</dc:date>
    </item>
    <item>
      <title>Re: Transparent Data Encryption (TDE) and Local Disks encryption for intermediate data</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Transparent-Data-Encryption-TDE-and-Local-Disks-encryption/m-p/95799#M9142</link>
      <description>&lt;P&gt;Sounds right. @rvenkatesh@hortonworks.com &lt;A rel="user" href="https://community.cloudera.com/users/229/bdurai.html" nodeid="229"&gt;@bdurai@hortonworks.com&lt;/A&gt; can you confirm?&lt;/P&gt;</description>
      <pubDate>Thu, 05 Nov 2015 02:12:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Transparent-Data-Encryption-TDE-and-Local-Disks-encryption/m-p/95799#M9142</guid>
      <dc:creator>bganesan</dc:creator>
      <dc:date>2015-11-05T02:12:28Z</dc:date>
    </item>
  </channel>
</rss>

