I'm researching into which hadoop platform will be most useful for developing a small proof of concept network. I have looked at HortonWorks and have a couple of Python scripts that we can call. In HortonWorks its a simple case of putting the python into the UDF editor and adding a register in pig (REGISTER 'soundex.py' USING jython AS myfuncs;).
Is it possible to do something similar in Cloudera? General setup has been more difficult with Pig and HcatalogLoader so it is not as easy to just figure it out by playing.
The first 3 lines of the hortonworks code is:
REGISTER 'soundex.py' USING jython AS myfuncs; a = LOAD 'org' USING org.apache.hcatalog.pig.HCatLoader(); b = foreach a generate phrase, myfuncs.soundex(phrase) as sndx;