Support Questions
Find answers, ask questions, and share your expertise

how to ship virtual envrionment with pyspark

how to ship virtual envrionment with pyspark


Am trying to ship a virtual envrionment with my pyspark job in order to run it in a cluster mode referencing to this and this after zipping my environment I run the below




PYSPARK_PYTHON=./venv/bin/python \
spark-submit \
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./venv/bin/python \
--conf spark.executorEnv.PYSPARK_PYTHON=./venv/bin/python \
--conf spark.yarn.dist.archives=hdfs:///user/sw/python-envs/ \
--master yarn \
--deploy-mode cluster \
--py-files hdfs:///user/sw/python-envs/ \




yet the job keeps failing due to from





sklearn.preprocessing import MinMaxScaler ModuleNotFoundError: No module named 'sklearn' 





which is required in script, what am doing wrong ?