Support Questions
Find answers, ask questions, and share your expertise

how to ship virtual envrionment with pyspark

how to ship virtual envrionment with pyspark

Explorer

Am trying to ship a virtual envrionment with my pyspark job in order to run it in a cluster mode referencing to this and this after zipping my environment I run the below

 

 

 

PYSPARK_PYTHON=./venv/bin/python \
spark-submit \
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./venv/bin/python \
--conf spark.executorEnv.PYSPARK_PYTHON=./venv/bin/python \
--conf spark.yarn.dist.archives=hdfs:///user/sw/python-envs/venv.zip#venv \
--master yarn \
--deploy-mode cluster \
--py-files hdfs:///user/sw/python-envs/ml.py \
train.py

 

 

 

yet the job keeps failing due to from

 

 

 

 

sklearn.preprocessing import MinMaxScaler ModuleNotFoundError: No module named 'sklearn' 

 

 

 

 

which is required in ml.py script, what am doing wrong ?