Created on 11-06-2017 12:12 PM - edited 09-16-2022 05:29 AM
Hi
I've just installed Data Science Workbench 1.2 on a single Master Node (under VMWARE 6.5). From my understanding of the documentation adding Worker Nodes is optional. The service comes up under the cluster okay and on Cloudera Manager (5.13) it has Green Health. Although when I run the commend cdsw status on the master node CLI it reports 'Cloudera Data Science Workbench is not ready yet'. It says 'Status check failed for services: [docker, kubelet, cdsw-app, cdsw-host-controller]'.
I can open a project successfully and some example pyspark files work fine. But any pyspark script that uses numpy gives the error:
File "/opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354/lib/spark2/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 711, in subimport __import__(name) ImportError: ('No module named numpy', <function subimport at 0x1e75cf8>, ('numpy',))
When I issue the commend pip list on the session terminal it lists numpy (1.12.1) as being installed.
Any advice on fixing this would be much appreciated.
Thanks a lot.
Rob Sullivan (London)