Reply
New Contributor
Posts: 1
Registered: ‎02-21-2017

Cloudera Quickstart VM to run python mapreduce scripts.

Hi,

 

I'm new to the Cloudera world. I have setup the Cloudera Quickstart VM and it looks good. I have written some mapreduce jobs in python and I have those scripts and input files.

How can I run my python scripts in Clouder Quickstart VM? Is there any tutorial or step by step instructions?

 

Once I test this in my Cloudera Quickstart VM, I want to set up a 3 to 5 node Cloudera Cluster and run the job using multiple nodes however all my scripts are are written in python. I have been looking for material to help how it can be done on a cloudera cluster but so far I had no luck.

 

Really appreciate your help.

Posts: 519
Topics: 14
Kudos: 92
Solutions: 45
Registered: ‎09-02-2016

Re: Cloudera Quickstart VM to run python mapreduce scripts.

@dino11092

 

You can create a Jar file and scp to your cluster and run as follows

 

Syntax:

$hadoop jar <jar> [mainClass] args...

 

Ex: 

$hadoop jar myJar.jar training.wordcount /user/root/inputfile.txt /user/root/output/

 

Some links:

https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/CommandsManual.html

 

http://stackoverflow.com/questions/13012511/how-to-run-a-jar-file-in-hadoop