- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Copy script files from NFS to each node in the cluster
- Labels:
-
HDFS
Created on ‎07-13-2016 10:55 AM - edited ‎09-16-2022 03:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have around 100 shell scripts .These files are in NFS. We want to copy all theses files in each node in HDFS cluster.
What is the best way to these files in each node in the cluster?
What are the best option to execute these shell scripts in each node?
Created ‎07-13-2016 11:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My first question would be why do you want to do that. If you want to manage your cluster you would normally install something like pssh or ansible or puppet and use that to manage the cluster. You can put that on one control node define a list of servers and move data/execute scripts on all of them at the same time.
You can do something very simple like that with a one line ssh program
To execute a script on all nodes:
for i in server1 server2;do echo $i; ssh $i $1;done
To copy files to all nodes:
for i in server1 server2;do scp $1 $i:$2;done
[all need keyless ssh from the control node to the cluster nodes]
If on the other hand you want to execute job dependencies, something like the distributed mapreduce cache is normally a good idea. Oozie provides the <file> tag to upload files from hdfs to the execution directory of the job.
So honestly if you go into more details what you ACTUALLY want we might be able to help more.
Created ‎07-13-2016 11:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My first question would be why do you want to do that. If you want to manage your cluster you would normally install something like pssh or ansible or puppet and use that to manage the cluster. You can put that on one control node define a list of servers and move data/execute scripts on all of them at the same time.
You can do something very simple like that with a one line ssh program
To execute a script on all nodes:
for i in server1 server2;do echo $i; ssh $i $1;done
To copy files to all nodes:
for i in server1 server2;do scp $1 $i:$2;done
[all need keyless ssh from the control node to the cluster nodes]
If on the other hand you want to execute job dependencies, something like the distributed mapreduce cache is normally a good idea. Oozie provides the <file> tag to upload files from hdfs to the execution directory of the job.
So honestly if you go into more details what you ACTUALLY want we might be able to help more.
