Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Can i use edge nodes for mapreduce??

Solved Go to solution

Can i use edge nodes for mapreduce??

Contributor

if i configure my edge node and not as data node i cannot store data in that datanode . But can i configure node manager on edge node and can i bring the data to the edge node and run the task if all other nodes are busy??

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Can i use edge nodes for mapreduce??

@Shiva Nagesh

I agree with @hkropp. While you can, it does not mean you should AS-IS. You need to account for shortcomings, architecturally and resource management-wise, to not mention security concerns, bringing more services on the edge nodes than usually manageable. I get it that you have capacity on those edge nodes and would like to use them as a BURST in case of need. You could consider DOCKER containers on your EDGE SERVERS as such you can separate the true edge nodes from workers on demand. Those DOCKER containers would use a WORKER template and will be spinned-up quickly as an additional node, something similar with what you would do in a cloud.

4 REPLIES 4

Re: Can i use edge nodes for mapreduce??

Contributor

I have tested it that we can run the jobs on nodes where there is no data node daemon running and is configured as a edge node. correct me if i am wrong.

Highlighted

Re: Can i use edge nodes for mapreduce??

Expert Contributor

This does not sound like a good idea. Edge nodes by definition typically just hold client programs no services like datanode or nodemanager. YARN would manage the resource allocation based on data and utilization of the nodes, that is why it often also is not a good idea to run nodemanagers without datanodes on one machine.

Concerning your "But can i .. bring the data to the edge .. run the task if all other nodes are busy?" YARN does the resource negotiation and scheduling for distributed frameworks like MR and Spark. I would advise to not do this manually but let YARN do this for you.

I hope this helps?

Re: Can i use edge nodes for mapreduce??

Contributor

Yes, if all the nodes are busy or down state, your node manager will launch the container in client node and map tasks will read the data remotely from any available data node and then process it. Finally output will go back to the data node which is available.

But it's not advisable to configure node manager in client node

Re: Can i use edge nodes for mapreduce??

@Shiva Nagesh

I agree with @hkropp. While you can, it does not mean you should AS-IS. You need to account for shortcomings, architecturally and resource management-wise, to not mention security concerns, bringing more services on the edge nodes than usually manageable. I get it that you have capacity on those edge nodes and would like to use them as a BURST in case of need. You could consider DOCKER containers on your EDGE SERVERS as such you can separate the true edge nodes from workers on demand. Those DOCKER containers would use a WORKER template and will be spinned-up quickly as an additional node, something similar with what you would do in a cloud.