<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question bulk upload to HFDS with limited access to cluster from client side in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/bulk-upload-to-HFDS-with-limited-access-to-cluster-from/m-p/2043#M341</link>
    <description>&lt;P&gt;&lt;SPAN style="color: #222222; font-family: arial, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: #ffffff; display: inline !important; float: none;"&gt;Hi, each day we will get 10-20 GB of binary files.&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;We need to upload these files into HDFS. Also we want to limit access to cluster from client side (side which delivers 10-20GB files)&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;What are the best approaches?&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;We have several ideas:&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;1. SFTP on our side (for example one of our data-nodes) and then hadoop fs -put&lt;/DIV&gt;&lt;DIV&gt;2. hadoop fs -put from client side (who delivers data). But we would like to forbid direct remote access to cluster.&lt;/DIV&gt;&lt;DIV&gt;3. WebHDFS (is it working???) the problem is the same, we don't want give access to cluster or its interface to the client.&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;*And we don't want to establish kerberos or stuff like that, we have private secure network for the cluster.&lt;/DIV&gt;</description>
    <pubDate>Mon, 07 Oct 2013 08:00:02 GMT</pubDate>
    <dc:creator>sergey.sheypak566881637</dc:creator>
    <dc:date>2013-10-07T08:00:02Z</dc:date>
    <item>
      <title>bulk upload to HFDS with limited access to cluster from client side</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/bulk-upload-to-HFDS-with-limited-access-to-cluster-from/m-p/2043#M341</link>
      <description>&lt;P&gt;&lt;SPAN style="color: #222222; font-family: arial, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: #ffffff; display: inline !important; float: none;"&gt;Hi, each day we will get 10-20 GB of binary files.&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;We need to upload these files into HDFS. Also we want to limit access to cluster from client side (side which delivers 10-20GB files)&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;What are the best approaches?&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;We have several ideas:&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;1. SFTP on our side (for example one of our data-nodes) and then hadoop fs -put&lt;/DIV&gt;&lt;DIV&gt;2. hadoop fs -put from client side (who delivers data). But we would like to forbid direct remote access to cluster.&lt;/DIV&gt;&lt;DIV&gt;3. WebHDFS (is it working???) the problem is the same, we don't want give access to cluster or its interface to the client.&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;*And we don't want to establish kerberos or stuff like that, we have private secure network for the cluster.&lt;/DIV&gt;</description>
      <pubDate>Mon, 07 Oct 2013 08:00:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/bulk-upload-to-HFDS-with-limited-access-to-cluster-from/m-p/2043#M341</guid>
      <dc:creator>sergey.sheypak566881637</dc:creator>
      <dc:date>2013-10-07T08:00:02Z</dc:date>
    </item>
    <item>
      <title>Re: bulk upload to HFDS with limited access to cluster from client side</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/bulk-upload-to-HFDS-with-limited-access-to-cluster-from/m-p/2265#M342</link>
      <description>&lt;P&gt;You could try using HttpFS, it acts as a trusted edge node between the cluster and external clients. It's basically a proxy for WebHDFS, so clients can't talk directly to the namenode / datanodes. This is lower performance, but it should be okay for 10-20GB of data.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;See:&lt;/P&gt;&lt;P&gt;&lt;A target="_blank" href="http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-hdfs-httpfs/"&gt;http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-hdfs-httpfs/&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Oct 2013 18:12:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/bulk-upload-to-HFDS-with-limited-access-to-cluster-from/m-p/2265#M342</guid>
      <dc:creator>andrew.wang</dc:creator>
      <dc:date>2013-10-16T18:12:24Z</dc:date>
    </item>
  </channel>
</rss>

