Hi! I'm working with Cloudera Manager 5.11.2 installed on Oracle VirtualBox (4 nodes). I wrote simple mapreduce python streaming example. It should count the words. All works fine. But adding string "import pymorphy2" leads to failed streaming on map stage. Looks like streaming interpreter cant work with this library. What can I do to fix this issue?
It's Ubuntu 14.4, Python 2.7 installed on VM.
PS pymorphy2 works with russian words, I need it to get word's initial forms.
Could you illustrate what error you observe on your failing map tasks?
Also, how are you ensuring that the 3rd party Python module is available across the cluster. Have you pre-installed it cluster-wide, or are you shipping it along as an egg/etc. via your job's distributed cache?