Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Copy script files from NFS to each node in the cluster

avatar
Expert Contributor

We have around 100 shell scripts .These files are in NFS. We want to copy all theses files in each node in HDFS cluster.

What is the best way to these files in each node in the cluster?

What are the best option to execute these shell scripts in each node?

1 ACCEPTED SOLUTION

avatar
Master Guru

My first question would be why do you want to do that. If you want to manage your cluster you would normally install something like pssh or ansible or puppet and use that to manage the cluster. You can put that on one control node define a list of servers and move data/execute scripts on all of them at the same time.

You can do something very simple like that with a one line ssh program

To execute a script on all nodes:

for i in server1 server2;do echo $i;  ssh $i $1;done

To copy files to all nodes:

for i in server1 server2;do scp $1 $i:$2;done

[all need keyless ssh from the control node to the cluster nodes]

If on the other hand you want to execute job dependencies, something like the distributed mapreduce cache is normally a good idea. Oozie provides the <file> tag to upload files from hdfs to the execution directory of the job.

So honestly if you go into more details what you ACTUALLY want we might be able to help more.

View solution in original post

1 REPLY 1

avatar
Master Guru

My first question would be why do you want to do that. If you want to manage your cluster you would normally install something like pssh or ansible or puppet and use that to manage the cluster. You can put that on one control node define a list of servers and move data/execute scripts on all of them at the same time.

You can do something very simple like that with a one line ssh program

To execute a script on all nodes:

for i in server1 server2;do echo $i;  ssh $i $1;done

To copy files to all nodes:

for i in server1 server2;do scp $1 $i:$2;done

[all need keyless ssh from the control node to the cluster nodes]

If on the other hand you want to execute job dependencies, something like the distributed mapreduce cache is normally a good idea. Oozie provides the <file> tag to upload files from hdfs to the execution directory of the job.

So honestly if you go into more details what you ACTUALLY want we might be able to help more.