Support Questions

Find answers, ask questions, and share your expertise

Who agreed with this topic

CDSW Error: No module named numpy???

avatar
Explorer

Hi

 

I've just installed Data Science Workbench 1.2 on a single Master Node (under VMWARE 6.5). From my understanding of the documentation adding Worker Nodes is optional. The service comes up under the cluster okay and on Cloudera Manager (5.13) it has Green Health. Although when I run the commend cdsw status on the master node CLI it reports 'Cloudera Data Science Workbench is not ready yet'. It says 'Status check failed for services: [docker, kubelet, cdsw-app, cdsw-host-controller]'.

 

I can open a project successfully and some example pyspark files work fine. But any pyspark script that uses numpy gives the error:

 

File "/opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354/lib/spark2/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 711, in subimport
    __import__(name)
ImportError: ('No module named numpy', <function subimport at 0x1e75cf8>, ('numpy',))

 

When I issue the commend pip list on the session terminal it lists numpy (1.12.1) as being installed. 

 

Any advice on fixing this would be much appreciated.

 

Thanks a lot.

 

Rob Sullivan (London)

Who agreed with this topic