Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Security / Operational best practices question

avatar
Expert Contributor

Had some question about operational best practice:

a.) For external customers, what's the best way to allow them to upload large size data to HDFS. Uploading via flile browser web interface may not be safe, reason being if there are 10 users start uploading 20gb of data at the sametime. Server will choke ( Currently I have small 5 server cluster).

b.) Was thinking of having a jumpbox externally, where people can ssh to it and ftp their data and a cron job will then push the data to HDFS on a periodic basic. Once the upload, users can use web interface to program using Hiv/Pig

c.) Spark-Shell - Is there a way to have users initiate spark-shell from a web interface.

d.) Currently NameNode is a single point of failure. I was reading about federated service or use HA. What's recommended. I have a very small environment

e.) DataNode information, cluster information, spark job are all can be viewed from web. Is it a good practice to allow users see those information ? Issue is information is not restricted to just their information. It's open for all or none.

Thanks

Prakash

1 ACCEPTED SOLUTION

avatar

Hi @Prakash Punj

  1. You can use NiFi to supervise a directory and ingest each new file to HDFS (GetFile and PutHDFS processors). https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.GetFile/index....
  2. You can do Spark in a browser with Zeppelin. You can have it in Ambari with the Zeppelin view. Some tutorials here http://hortonworks.com/hadoop/zeppelin/#tutorials
  3. To avoid a SPOF you need HDFS HA. Federation is having multiple NNs for managing very big clusters and reducing the stress on a single NN.
  4. In Ambari you can have admin users and simple users. Simple users have less power in Ambari.

View solution in original post

1 REPLY 1

avatar

Hi @Prakash Punj

  1. You can use NiFi to supervise a directory and ingest each new file to HDFS (GetFile and PutHDFS processors). https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.GetFile/index....
  2. You can do Spark in a browser with Zeppelin. You can have it in Ambari with the Zeppelin view. Some tutorials here http://hortonworks.com/hadoop/zeppelin/#tutorials
  3. To avoid a SPOF you need HDFS HA. Federation is having multiple NNs for managing very big clusters and reducing the stress on a single NN.
  4. In Ambari you can have admin users and simple users. Simple users have less power in Ambari.