Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Can i use edge nodes for mapreduce??

avatar
Expert Contributor

if i configure my edge node and not as data node i cannot store data in that datanode . But can i configure node manager on edge node and can i bring the data to the edge node and run the task if all other nodes are busy??

1 ACCEPTED SOLUTION

avatar
Super Guru

@Shiva Nagesh

I agree with @hkropp. While you can, it does not mean you should AS-IS. You need to account for shortcomings, architecturally and resource management-wise, to not mention security concerns, bringing more services on the edge nodes than usually manageable. I get it that you have capacity on those edge nodes and would like to use them as a BURST in case of need. You could consider DOCKER containers on your EDGE SERVERS as such you can separate the true edge nodes from workers on demand. Those DOCKER containers would use a WORKER template and will be spinned-up quickly as an additional node, something similar with what you would do in a cloud.

View solution in original post

4 REPLIES 4

avatar
Expert Contributor

I have tested it that we can run the jobs on nodes where there is no data node daemon running and is configured as a edge node. correct me if i am wrong.

avatar
Super Collaborator

This does not sound like a good idea. Edge nodes by definition typically just hold client programs no services like datanode or nodemanager. YARN would manage the resource allocation based on data and utilization of the nodes, that is why it often also is not a good idea to run nodemanagers without datanodes on one machine.

Concerning your "But can i .. bring the data to the edge .. run the task if all other nodes are busy?" YARN does the resource negotiation and scheduling for distributed frameworks like MR and Spark. I would advise to not do this manually but let YARN do this for you.

I hope this helps?

avatar
Rising Star

Yes, if all the nodes are busy or down state, your node manager will launch the container in client node and map tasks will read the data remotely from any available data node and then process it. Finally output will go back to the data node which is available.

But it's not advisable to configure node manager in client node

avatar
Super Guru

@Shiva Nagesh

I agree with @hkropp. While you can, it does not mean you should AS-IS. You need to account for shortcomings, architecturally and resource management-wise, to not mention security concerns, bringing more services on the edge nodes than usually manageable. I get it that you have capacity on those edge nodes and would like to use them as a BURST in case of need. You could consider DOCKER containers on your EDGE SERVERS as such you can separate the true edge nodes from workers on demand. Those DOCKER containers would use a WORKER template and will be spinned-up quickly as an additional node, something similar with what you would do in a cloud.