Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Adding a second user on hadoop cluster

avatar
Contributor

Is it possible to add a second user on hadoop cluster like the spark user?

1 ACCEPTED SOLUTION

avatar
Master Mentor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
7 REPLIES 7

avatar
Master Mentor

@asmarz 

 

Your question is ambiguous can you elaborate? It's possible to add users to a cluster with all the necessary privileges to execute ie spark, hive in a kerberized cluster you can merge the different keytabs i.e hive, spark,oozie etc or control through Ranger.
But if you can elaborate on your use-case then we can try to find a technical solution.

 

 

avatar
Contributor

Thank you for your reply 🙂

 

I will explain my actual situation.

I have installed Hadoop Cluster using ambari HortonWorks.

One of nodes is an Edge node, I defined it as a spark master and I could then run spark-sumbit from this linux server (edge node) with spark user (spark) on 4 workers ( datanodes)

 

My question now, I will have developers and end users who should execute scripts from their local machine on the edge node. These users should not access directly the linux server (Edge node)  but they will have to launch scripts (spark-sumbit). How can I create accounts for them? How could they access the edge node ?

Thanks

Asma

avatar
Master Mentor

@asmarz 

 

Now I have a better understanding of your deployment, I think that is a wrong technical approach. Having an edge node it a great idea in that you can create and control access to the cluster from the edge node, and usually, you have only Client software ONLY YARN, HDFS , OOZIE, ZOOKEEPER, SPARK, SQOOP.PIG client etc but not a Master node.

 

Edge nodes run within the cluster allow for centralized management of all the Hadoop configuration entries on the cluster nodes which helps to reduce the amount of administration needed to update the config. When you configure a Linux box as an edge node during the deployment ambari configures update the conf files with the correct values so that all commands against the Cluster can be run from the edge node.


For security and good practice, edge nodes need to be multi-homed into the private subnet of the Hadoop cluster as well as into the corporate network. Keeping your Hadoop cluster in its own private subnet is an excellent practice, so these edge nodes serve as a controlled window inside the cluster.

 

In a nutshell, you don't need a Master process the edge node but Client to initiate communication with the Cluster

 

Hope that helps

avatar
Contributor

In a nutshell, you don't need a Master process the edge node but Client to initiate communication with the Cluster

 

=> Actually this is my question 🙂 Which client? Should I create a local user like spark and then ask the end users to use it in order to launch this command for example from their machines??

 

spark-sumbit  --master spark://edgenode:7077  calculPi.jar?

 

All that I need for now that the end  users could execute their scripts spark or Python ... from their side, how could we do this? do they need to access the edge server?

 

Thanks

Asma

avatar
Master Mentor

@asmarz 

HDP Client means a set of binaries and libraries to run commands and develop software for a particular Hadoop service. So, if you install Hive client you can run beeline, if you install HBase client you can run an HBase shell and if you install Spark Client your can run spark-shell etc.

 

But I would advise you install at least these clients so on edge node

  • zookeeper-client
  • sqoop-client
  • spark2-client
  • slider-client
  • spark-client
  • oozie-client
  • hbase-client

The local users created on the edge node can execute the spark-shell to run the spark submit , but the only difference is if you have a kerberized  cluster you will have to generate keytabs and copy them over to the edge node for every user 

Hope that answers your question

 

 

avatar
Contributor

Thank you again!!

 

1) I have installed these client on my edge node 🙂

 

2) For instance, we decide that the cluster is not configured with kerberos

 

3) Actually, I want that end users could submit spark-shell 🙂

 

4) For this , I created a local user called "sparkuser" on the edge node

 

5) with which tool could client use the spark-shell? API? application? All I want that users can use the sparkuser that I have created on edge nôde to submit their scripts but I don't not want that these users access the edge node server directly (like remotelx or via an API? or else? I hope that I explained more 😄

 

Many thanks for your help and patience

Asma

 

avatar
Master Mentor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login