02-21-2017 02:22 AM
I'm new to the Cloudera world. I have setup the Cloudera Quickstart VM and it looks good. I have written some mapreduce jobs in python and I have those scripts and input files.
How can I run my python scripts in Clouder Quickstart VM? Is there any tutorial or step by step instructions?
Once I test this in my Cloudera Quickstart VM, I want to set up a 3 to 5 node Cloudera Cluster and run the job using multiple nodes however all my scripts are are written in python. I have been looking for material to help how it can be done on a cloudera cluster but so far I had no luck.
Really appreciate your help.
02-21-2017 07:31 AM
You can create a Jar file and scp to your cluster and run as follows
$hadoop jar <jar> [mainClass] args...
$hadoop jar myJar.jar training.wordcount /user/root/inputfile.txt /user/root/output/