I am new to Hadoop and started to learning. I want a sample program which need to run a task (Map) more than 5 minutes on data node. because I need to findout long running tasks, so that I can take a look on that particular node.
what will happen if I kill that task process? will it run on another node or whole job will fail?
You could run Teragen/Sort for this.
Here's a script on my gist.github.com page that can be run against an HDP cluster for this.
You can control the size, mappers and reducers from the commandline, even experiment with block sizes.