Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to install/manage third party packages with pyspark

Highlighted

How to install/manage third party packages with pyspark

Contributor

We have received a request from our development team to install 3rd party python packages (like phonenumbers) on all worker nodes because their application needs those. They are using pyspark.

We already have anaconda parcel deployed via CM, but it looks like those packages are not there in the anaconda package list. We are not sure how to manage such package install requests. 

Has anyone else come across such requirements? or Has any recommendation on how to handle it?