Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

python client on Os X streaming & QuickStart VM

python client on Os X streaming & QuickStart VM

New Contributor

I would like to write mapreduce code – ideally using python – on my apple mac to streaming it on the QuickStart VM.

 

Ideally my development setup is using my Apple Mac python environment & the QuickStart VM (later to be expanded to a cluster).

 

While there are many description on how to connect or stream code from within a node of the hadoop cluster or sandbox (e.g. from the NameNode etc.), I am unclear on what to do to connect just as a client.

 

E.g. I assume I need to install some hadoop client libraries on my OsX to talk to the Sandbox HDFS? Where do I get these libraries from?

How do I install them?

What type of python package works best?

What IP address should I use to stream my python code?

Any help – and any link to a tutorial covering this – would be great!

6 REPLIES 6

Re: python client on Os X streaming & QuickStart VM

Expert Contributor
I assume I need to install some hadoop client libraries on my OsX to talk to the Sandbox HDFS? Where do I get these libraries from?
How do I install them?
Ans: Hadoop installation on Mac OS X http://wiki.apache.org/hadoop/Running_Hadoop_On_OS_X_10.5_64-bit_%28Single-Node_Cluster%29

What type of python package works best?
Ans: Mac OS does come with python default package, you can use it.

What IP address should I use to stream my python code?
Ans: IP address would be your destination (Namenode IP for your Sandbox)
Em Jay

Re: python client on Os X streaming & QuickStart VM

New Contributor

I do not intend to install hadoop on OSX.  I just would like to install the client libraries that - as I understand - are needed with certain packages etc.

 

My idea would be to write; test and debug the code on the mac, to then execute it on the VM, ideally launching it from OSX.

 

As far as python, I refer to libraries like MrJobs or pydoop and similar.

Re: python client on Os X streaming & QuickStart VM

Master Collaborator

I believe Manikumar is correct and you will have to install Hadoop on your Mac OS in order to be able to execute client application code that connects to the QuickStart VM.  If you prefer not to do this, you could easily create a second CentOS virtual machine and add it to your Quickstart VM as a gateway machine, which will set up all the necessary environmental properties to execute your code from there.

Re: python client on Os X streaming & QuickStart VM

New Contributor

Why would be necessary installing *all* of hadoop on the client?

 

My understanding is that intstalling these client files it all what I need on the client side (e.g. my apple mac)?

 

For example, Cloudera Manager provides client files - I assume for this use cases only?

 

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.5.3/Cloudera-Manager-Enterpr...

 

 

Re: python client on Os X streaming & QuickStart VM

Expert Contributor
Without installing hadoop on the host, you wouldn't able to deploy client configurations on that host using Cloudera Manager.
Please read: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.5.3/Cloudera-Manager-Enterpr...

You will have to install hadoop core packages on your machine to use it as hadoop client. You don't need to start any services on client you will just need to install the package and edit the configurations to point your hadoop namenode and jobtracker. That's it. This is a standard everywhere all over the world.

The only reason Windows Operating system cannot be turned up as Hadoop client because it does not have capability to install hadoop components.

If you are not okay to install hadoop components on your system, you have an option to use API's to make a call to the service using API's
you can use webHDFS REST API.

Thanks
Em Jay

Re: python client on Os X streaming & QuickStart VM

New Contributor

This is helpful.  To be clear I do not have issues in installing Client SW on the Apple.  I just do not want to use it as a cluster node.

 

If I understand Cloudera terminology, just as a Gateway.  I'm looking for WebHDFS (thanks for pointing that to me!) and looks great.  The only issue so far seems a good tool to to "things" in the cluster (like create directories / files etc.), but I haven't seen any example of using WebHDFS to launching a .jar file with code...

 

I've also started to research MrJobs that looks quite promising.

 

I wonder if anybody has used MrJobs from a Gateway-type (client-type) node and Cloudera...

 

Don't have an account?
Coming from Hortonworks? Activate your account here