<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Security / Operational best practices question in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Security-Operational-best-practices-question/m-p/111496#M21956</link>
    <description>&lt;P&gt;Had some question about operational best practice:&lt;/P&gt;&lt;P&gt;a.) For external customers, what's the best way to allow them to upload large size data to HDFS. Uploading via flile browser web interface may not be safe, reason being if there are 10 users start uploading 20gb of data at the sametime. Server will choke ( Currently I have small 5 server cluster). &lt;/P&gt;&lt;P&gt;b.) Was thinking of having a jumpbox externally, where people can ssh to it and ftp their data and a cron job will then push the data to HDFS on a periodic basic. Once the upload, users can use web interface to program using Hiv/Pig&lt;/P&gt;&lt;P&gt;c.) Spark-Shell - Is there a way to have users initiate spark-shell from a web interface. &lt;/P&gt;&lt;P&gt;d.) Currently NameNode is a single point of failure. I was reading about federated service or use HA. What's recommended. I have a very small environment&lt;/P&gt;&lt;P&gt;e.) DataNode information, cluster information, spark job are all can be viewed from web. Is it a good practice to allow users see those information ? Issue is information is not restricted to just their information. It's open for all or none. &lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Prakash&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 10:07:17 GMT</pubDate>
    <dc:creator>prakashpunj</dc:creator>
    <dc:date>2022-09-16T10:07:17Z</dc:date>
    <item>
      <title>Security / Operational best practices question</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Security-Operational-best-practices-question/m-p/111496#M21956</link>
      <description>&lt;P&gt;Had some question about operational best practice:&lt;/P&gt;&lt;P&gt;a.) For external customers, what's the best way to allow them to upload large size data to HDFS. Uploading via flile browser web interface may not be safe, reason being if there are 10 users start uploading 20gb of data at the sametime. Server will choke ( Currently I have small 5 server cluster). &lt;/P&gt;&lt;P&gt;b.) Was thinking of having a jumpbox externally, where people can ssh to it and ftp their data and a cron job will then push the data to HDFS on a periodic basic. Once the upload, users can use web interface to program using Hiv/Pig&lt;/P&gt;&lt;P&gt;c.) Spark-Shell - Is there a way to have users initiate spark-shell from a web interface. &lt;/P&gt;&lt;P&gt;d.) Currently NameNode is a single point of failure. I was reading about federated service or use HA. What's recommended. I have a very small environment&lt;/P&gt;&lt;P&gt;e.) DataNode information, cluster information, spark job are all can be viewed from web. Is it a good practice to allow users see those information ? Issue is information is not restricted to just their information. It's open for all or none. &lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Prakash&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 10:07:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Security-Operational-best-practices-question/m-p/111496#M21956</guid>
      <dc:creator>prakashpunj</dc:creator>
      <dc:date>2022-09-16T10:07:17Z</dc:date>
    </item>
    <item>
      <title>Re: Security / Operational best practices question</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Security-Operational-best-practices-question/m-p/111497#M21957</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/1277/prakashpunj.html" nodeid="1277"&gt;@Prakash Punj&lt;/A&gt;&lt;/P&gt;&lt;OL&gt;
&lt;LI&gt;You can use NiFi to supervise a directory and ingest each new file to HDFS (GetFile and PutHDFS processors). &lt;A href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.GetFile/index.html" target="_blank"&gt;https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.GetFile/index.html&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;You can do Spark in a browser with Zeppelin. You can have it in Ambari with the Zeppelin view. Some tutorials here &lt;A href="http://hortonworks.com/hadoop/zeppelin/#tutorials" target="_blank"&gt;http://hortonworks.com/hadoop/zeppelin/#tutorials&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;To avoid a SPOF you need HDFS HA. Federation is having multiple NNs for managing very big clusters and reducing the stress on a single NN.&lt;/LI&gt;&lt;LI&gt;In Ambari you can have admin users and simple users. Simple users have less power in Ambari. &lt;/LI&gt;&lt;/OL&gt;</description>
      <pubDate>Sat, 05 Mar 2016 00:25:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Security-Operational-best-practices-question/m-p/111497#M21957</guid>
      <dc:creator>ahadjidj</dc:creator>
      <dc:date>2016-03-05T00:25:07Z</dc:date>
    </item>
  </channel>
</rss>

