Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark-sklearn integration

Solved Go to solution
Highlighted

Spark-sklearn integration

Contributor

Hi,

We have a Hadoop on-premise cluster and are planning to integrate spark with scikit learn using the spark-sklearn package. Can you please let me know if we need to install sklearn and spark-sklearn package in all nodes or just the node where spark2-history server has been installed. We will be using yarn for resource allocation.

Thanks,

Chandra

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Spark-sklearn integration

@chandramouli muthukumaran

You'll want to install sklearn (pip install -U scikit-learn) and spark-sklearn on all datanodes of the cluster, as well as other relevant python packages such as numpy, scipy, etc. I'd also recommend using YARN as the resource manager, so you are on the right path there. Hope this helps!

View solution in original post

2 REPLIES 2

Re: Spark-sklearn integration

@chandramouli muthukumaran

You'll want to install sklearn (pip install -U scikit-learn) and spark-sklearn on all datanodes of the cluster, as well as other relevant python packages such as numpy, scipy, etc. I'd also recommend using YARN as the resource manager, so you are on the right path there. Hope this helps!

View solution in original post

Highlighted

Re: Spark-sklearn integration

Contributor

Thanks much for your response.

Don't have an account?
Coming from Hortonworks? Activate your account here