New Contributor
Posts: 1
Registered: ‎08-23-2017

Python streaming is not working

[ Edited ]

 Hi! I'm working with Cloudera Manager 5.11.2 installed on Oracle VirtualBox (4 nodes). I wrote simple mapreduce python streaming example. It should count the words. All works fine. But adding string "import pymorphy2" leads to failed streaming on map stage. Looks like streaming interpreter cant work with this library. What can I do to fix this issue?


It's Ubuntu 14.4, Python 2.7 installed on VM.  


PS pymorphy2 works with russian words, I need it to get word's initial forms.

Posts: 1,903
Kudos: 435
Solutions: 307
Registered: ‎07-31-2013

Re: Python streaming is not working

Could you illustrate what error you observe on your failing map tasks?

Also, how are you ensuring that the 3rd party Python module is available across the cluster. Have you pre-installed it cluster-wide, or are you shipping it along as an egg/etc. via your job's distributed cache?