Support Questions

RajbMandal · ‎07-13-2016

We have around 100 shell scripts .These files are in NFS. We want to copy all theses files in each node in HDFS cluster.

What is the best way to these files in each node in the cluster?

What are the best option to execute these shell scripts in each node?

bleonhardi · ‎07-13-2016

My first question would be why do you want to do that. If you want to manage your cluster you would normally install something like pssh or ansible or puppet and use that to manage the cluster. You can put that on one control node define a list of servers and move data/execute scripts on all of them at the same time.

You can do something very simple like that with a one line ssh program

To execute a script on all nodes:

for i in server1 server2;do echo $i;  ssh $i $1;done

To copy files to all nodes:

for i in server1 server2;do scp $1 $i:$2;done

[all need keyless ssh from the control node to the cluster nodes]

If on the other hand you want to execute job dependencies, something like the distributed mapreduce cache is normally a good idea. Oozie provides the <file> tag to upload files from hdfs to the execution directory of the job.

So honestly if you go into more details what you ACTUALLY want we might be able to help more.

View solution in original post

bleonhardi · ‎07-13-2016

My first question would be why do you want to do that. If you want to manage your cluster you would normally install something like pssh or ansible or puppet and use that to manage the cluster. You can put that on one control node define a list of servers and move data/execute scripts on all of them at the same time.

You can do something very simple like that with a one line ssh program

To execute a script on all nodes:

for i in server1 server2;do echo $i;  ssh $i $1;done

To copy files to all nodes:

for i in server1 server2;do scp $1 $i:$2;done

[all need keyless ssh from the control node to the cluster nodes]

If on the other hand you want to execute job dependencies, something like the distributed mapreduce cache is normally a good idea. Oozie provides the <file> tag to upload files from hdfs to the execution directory of the job.

So honestly if you go into more details what you ACTUALLY want we might be able to help more.

Cloudera Community

Support Questions

Copy script files from NFS to each node in the cluster

Datanode Service Error Related to NFS Mount Issue

ExecuteGroovyScript Script File duplicated Script ...

How to identify in cdp cluster having small files ...

Hbase ExportSnapshot copy-to localFS(NFS)

HDFS NFS copy commands fails with cannot create re...

Comparison : Kudu Copy Command vs Spark backup uti...

How NIFI copy golden copy of flow.xml.gz into newl...

3 Node Cluster, Nodes Disconnect

Uploading Files for Cloudera Support - alternate m...

python script to fetch files in NiFi